Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs

With the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, t...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed El-Mahdy, Marwa K. Elteir, Kholoud Shata
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11015443/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849467696560734208
author Ahmed El-Mahdy
Marwa K. Elteir
Kholoud Shata
author_facet Ahmed El-Mahdy
Marwa K. Elteir
Kholoud Shata
author_sort Ahmed El-Mahdy
collection DOAJ
description With the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, they generally abstract implementation details making it difficult for further performance exploitation. One such critical aspect is the texture memory hierarchy design parameters, which are primarily trade secrets. Unfortunately, standard cache hierarchy identification methods are not applicable, due to utilizing logical 2D-texel-to-memory mapping to exploit the 2D locality of access prior to physical cache access. This paper presents, for the first time, a parameterized model capable of describing the underlying multidimensional tiling layouts governing this mapping. Although it can be shown that reverse-engineering the model to obtain the exact parameters is an undecidable problem, we strive to obtain a corresponding set of parameters that results in the same cache behavior. In particular, such parameters define the texture order resulting in a linear order that traverses the caches in memory contiguous blocks. This study proposes a reverse-engineering observation method that exploits the contiguity of tiled cached regions and cache set conflicts to efficiently reveal such parameters. The complexity of the method is <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula>, where n is the number of bits in one of the texture buffer dimensions, ensuring practical applicability. Furthermore, optimization of a benchmark workload&#x2014;via input data and memory access pattern alignment with the inferred multidimensional tiling layout&#x2014;yields up to a <inline-formula> <tex-math notation="LaTeX">$2.22\times $ </tex-math></inline-formula> speedup.
format Article
id doaj-art-8db1880b6f874d088e6fcd70f600a528
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-8db1880b6f874d088e6fcd70f600a5282025-08-20T03:26:05ZengIEEEIEEE Access2169-35362025-01-0113950949512110.1109/ACCESS.2025.357366811015443Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUsAhmed El-Mahdy0Marwa K. Elteir1https://orcid.org/0000-0002-4627-9057Kholoud Shata2School of Information Technology and Computer Science (ITCS), Nile University, Giza, EgyptInformatics Research Institute (IRI), City of Scientific Research and Technological Applications (SRTA-City), New Borg El-Arab City, EgyptDepartment of Computer Science and Engineering, Egypt-Japan University of Science and Technology, New Borg El-Arab City, EgyptWith the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, they generally abstract implementation details making it difficult for further performance exploitation. One such critical aspect is the texture memory hierarchy design parameters, which are primarily trade secrets. Unfortunately, standard cache hierarchy identification methods are not applicable, due to utilizing logical 2D-texel-to-memory mapping to exploit the 2D locality of access prior to physical cache access. This paper presents, for the first time, a parameterized model capable of describing the underlying multidimensional tiling layouts governing this mapping. Although it can be shown that reverse-engineering the model to obtain the exact parameters is an undecidable problem, we strive to obtain a corresponding set of parameters that results in the same cache behavior. In particular, such parameters define the texture order resulting in a linear order that traverses the caches in memory contiguous blocks. This study proposes a reverse-engineering observation method that exploits the contiguity of tiled cached regions and cache set conflicts to efficiently reveal such parameters. The complexity of the method is <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula>, where n is the number of bits in one of the texture buffer dimensions, ensuring practical applicability. Furthermore, optimization of a benchmark workload&#x2014;via input data and memory access pattern alignment with the inferred multidimensional tiling layout&#x2014;yields up to a <inline-formula> <tex-math notation="LaTeX">$2.22\times $ </tex-math></inline-formula> speedup.https://ieeexplore.ieee.org/document/11015443/2D texture cachescache microbenchmarkingembedded GPUsreverse engineering
spellingShingle Ahmed El-Mahdy
Marwa K. Elteir
Kholoud Shata
Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
IEEE Access
2D texture caches
cache microbenchmarking
embedded GPUs
reverse engineering
title Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
title_full Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
title_fullStr Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
title_full_unstemmed Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
title_short Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
title_sort efficient cache performance equivalent 2d texel to memory mapping identification for embedded gpus
topic 2D texture caches
cache microbenchmarking
embedded GPUs
reverse engineering
url https://ieeexplore.ieee.org/document/11015443/
work_keys_str_mv AT ahmedelmahdy efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus
AT marwakelteir efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus
AT kholoudshata efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus