A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11071293/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849420254096130048 |
|---|---|
| author | Fatemeh Hossein-Khani Omid Akbari |
| author_facet | Fatemeh Hossein-Khani Omid Akbari |
| author_sort | Fatemeh Hossein-Khani |
| collection | DOAJ |
| description | The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores’ temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques. |
| format | Article |
| id | doaj-art-a21fb5edb8c9441e879a7e1940fb3870 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-a21fb5edb8c9441e879a7e1940fb38702025-08-20T03:31:48ZengIEEEIEEE Access2169-35362025-01-011312346012347210.1109/ACCESS.2025.358576811071293A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered ManycoresFatemeh Hossein-Khani0https://orcid.org/0000-0003-2484-2889Omid Akbari1https://orcid.org/0000-0003-4022-663XDepartment of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, IranDepartment of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, IranThe increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores’ temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques.https://ieeexplore.ieee.org/document/11071293/Reinforcement learningtask mappingreliabilitythermal-awareagingmanycore systems |
| spellingShingle | Fatemeh Hossein-Khani Omid Akbari A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores IEEE Access Reinforcement learning task mapping reliability thermal-aware aging manycore systems |
| title | A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores |
| title_full | A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores |
| title_fullStr | A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores |
| title_full_unstemmed | A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores |
| title_short | A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores |
| title_sort | reinforcement learning based task mapping method to improve the reliability of clustered manycores |
| topic | Reinforcement learning task mapping reliability thermal-aware aging manycore systems |
| url | https://ieeexplore.ieee.org/document/11071293/ |
| work_keys_str_mv | AT fatemehhosseinkhani areinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores AT omidakbari areinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores AT fatemehhosseinkhani reinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores AT omidakbari reinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores |