Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
With the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11015443/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849467696560734208 |
|---|---|
| author | Ahmed El-Mahdy Marwa K. Elteir Kholoud Shata |
| author_facet | Ahmed El-Mahdy Marwa K. Elteir Kholoud Shata |
| author_sort | Ahmed El-Mahdy |
| collection | DOAJ |
| description | With the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, they generally abstract implementation details making it difficult for further performance exploitation. One such critical aspect is the texture memory hierarchy design parameters, which are primarily trade secrets. Unfortunately, standard cache hierarchy identification methods are not applicable, due to utilizing logical 2D-texel-to-memory mapping to exploit the 2D locality of access prior to physical cache access. This paper presents, for the first time, a parameterized model capable of describing the underlying multidimensional tiling layouts governing this mapping. Although it can be shown that reverse-engineering the model to obtain the exact parameters is an undecidable problem, we strive to obtain a corresponding set of parameters that results in the same cache behavior. In particular, such parameters define the texture order resulting in a linear order that traverses the caches in memory contiguous blocks. This study proposes a reverse-engineering observation method that exploits the contiguity of tiled cached regions and cache set conflicts to efficiently reveal such parameters. The complexity of the method is <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula>, where n is the number of bits in one of the texture buffer dimensions, ensuring practical applicability. Furthermore, optimization of a benchmark workload—via input data and memory access pattern alignment with the inferred multidimensional tiling layout—yields up to a <inline-formula> <tex-math notation="LaTeX">$2.22\times $ </tex-math></inline-formula> speedup. |
| format | Article |
| id | doaj-art-8db1880b6f874d088e6fcd70f600a528 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-8db1880b6f874d088e6fcd70f600a5282025-08-20T03:26:05ZengIEEEIEEE Access2169-35362025-01-0113950949512110.1109/ACCESS.2025.357366811015443Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUsAhmed El-Mahdy0Marwa K. Elteir1https://orcid.org/0000-0002-4627-9057Kholoud Shata2School of Information Technology and Computer Science (ITCS), Nile University, Giza, EgyptInformatics Research Institute (IRI), City of Scientific Research and Technological Applications (SRTA-City), New Borg El-Arab City, EgyptDepartment of Computer Science and Engineering, Egypt-Japan University of Science and Technology, New Borg El-Arab City, EgyptWith the increasing trend of utilizing smart devices, exploiting embedded GPUs becomes plausible to provide for intense computations that approach workstation performance, opening the way for complex, server-grade applications on such devices. Although general-purpose development frameworks exist, they generally abstract implementation details making it difficult for further performance exploitation. One such critical aspect is the texture memory hierarchy design parameters, which are primarily trade secrets. Unfortunately, standard cache hierarchy identification methods are not applicable, due to utilizing logical 2D-texel-to-memory mapping to exploit the 2D locality of access prior to physical cache access. This paper presents, for the first time, a parameterized model capable of describing the underlying multidimensional tiling layouts governing this mapping. Although it can be shown that reverse-engineering the model to obtain the exact parameters is an undecidable problem, we strive to obtain a corresponding set of parameters that results in the same cache behavior. In particular, such parameters define the texture order resulting in a linear order that traverses the caches in memory contiguous blocks. This study proposes a reverse-engineering observation method that exploits the contiguity of tiled cached regions and cache set conflicts to efficiently reveal such parameters. The complexity of the method is <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula>, where n is the number of bits in one of the texture buffer dimensions, ensuring practical applicability. Furthermore, optimization of a benchmark workload—via input data and memory access pattern alignment with the inferred multidimensional tiling layout—yields up to a <inline-formula> <tex-math notation="LaTeX">$2.22\times $ </tex-math></inline-formula> speedup.https://ieeexplore.ieee.org/document/11015443/2D texture cachescache microbenchmarkingembedded GPUsreverse engineering |
| spellingShingle | Ahmed El-Mahdy Marwa K. Elteir Kholoud Shata Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs IEEE Access 2D texture caches cache microbenchmarking embedded GPUs reverse engineering |
| title | Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs |
| title_full | Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs |
| title_fullStr | Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs |
| title_full_unstemmed | Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs |
| title_short | Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs |
| title_sort | efficient cache performance equivalent 2d texel to memory mapping identification for embedded gpus |
| topic | 2D texture caches cache microbenchmarking embedded GPUs reverse engineering |
| url | https://ieeexplore.ieee.org/document/11015443/ |
| work_keys_str_mv | AT ahmedelmahdy efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus AT marwakelteir efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus AT kholoudshata efficientcacheperformanceequivalent2dtexeltomemorymappingidentificationforembeddedgpus |