A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores

The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatemeh Hossein-Khani, Omid Akbari
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071293/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849420254096130048
author Fatemeh Hossein-Khani
Omid Akbari
author_facet Fatemeh Hossein-Khani
Omid Akbari
author_sort Fatemeh Hossein-Khani
collection DOAJ
description The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores’ temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques.
format Article
id doaj-art-a21fb5edb8c9441e879a7e1940fb3870
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-a21fb5edb8c9441e879a7e1940fb38702025-08-20T03:31:48ZengIEEEIEEE Access2169-35362025-01-011312346012347210.1109/ACCESS.2025.358576811071293A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered ManycoresFatemeh Hossein-Khani0https://orcid.org/0000-0003-2484-2889Omid Akbari1https://orcid.org/0000-0003-4022-663XDepartment of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, IranDepartment of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, IranThe increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores’ temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques.https://ieeexplore.ieee.org/document/11071293/Reinforcement learningtask mappingreliabilitythermal-awareagingmanycore systems
spellingShingle Fatemeh Hossein-Khani
Omid Akbari
A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
IEEE Access
Reinforcement learning
task mapping
reliability
thermal-aware
aging
manycore systems
title A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
title_full A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
title_fullStr A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
title_full_unstemmed A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
title_short A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
title_sort reinforcement learning based task mapping method to improve the reliability of clustered manycores
topic Reinforcement learning
task mapping
reliability
thermal-aware
aging
manycore systems
url https://ieeexplore.ieee.org/document/11071293/
work_keys_str_mv AT fatemehhosseinkhani areinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores
AT omidakbari areinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores
AT fatemehhosseinkhani reinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores
AT omidakbari reinforcementlearningbasedtaskmappingmethodtoimprovethereliabilityofclusteredmanycores