Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting

In this work, a novel off-policy reinforcement learning framework is developed for a relay-assisted cooperative device-to-device (D2D) communication system powered by ambient backscattering system (ABCS) and energy harvesting. The relay follows the harvest-receive-sense-then-transmit (HRSTT) strateg...

Full description

Saved in:
Bibliographic Details
Main Authors: Saranya Karattupalayam Chidambaram, Sudhanshu Arya, Yogesh Kumar Choukiker, Abhijit Bhowmick
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123025019462
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849426139354759168
author Saranya Karattupalayam Chidambaram
Sudhanshu Arya
Yogesh Kumar Choukiker
Abhijit Bhowmick
author_facet Saranya Karattupalayam Chidambaram
Sudhanshu Arya
Yogesh Kumar Choukiker
Abhijit Bhowmick
author_sort Saranya Karattupalayam Chidambaram
collection DOAJ
description In this work, a novel off-policy reinforcement learning framework is developed for a relay-assisted cooperative device-to-device (D2D) communication system powered by ambient backscattering system (ABCS) and energy harvesting. The relay follows the harvest-receive-sense-then-transmit (HRSTT) strategy. To this end, a relay architecture is proposed, and an end-to-end frame structure is presented in this context. In the proposed architecture, the activity of a cellular user (CU) is sensed by the relay, which then selects between two mutually exclusive transmission modes: ambient backscatter mode (ABSM) and active RF transceiver mode (ARTRM) based on sensing information. An analytical framework is developed to capture the impact of random dwelling time of CU in a given time frame. In particular, to ensure reliable and effective communication performance, a novel off-policy machine learning (ML) framework ML-aided module for sensing and power allocation (MLAMSPA) has also been developed. Inspired from Q-learning, MLAMSPA is optimal for real-time sensing and power allocation for the users’ signals. The proposed MLAMSPA framework not only captures the dynamic time allocation between backscattering and direct transmission but also embeds the effects of dynamic channel conditions and power control into the reward signal used for learning the optimal policy. The network performance is studied in terms of CU dwelling time distribution, power allocation to the user signal, CU activity, sum-rate and outage. The impact of various network parameters, such as the mean of CU dwelling time, SNR threshold, harvesting time, backscatter parameters, etc., on sum-rate and outage is analysed.
format Article
id doaj-art-4b3965eee95b40f38de42bb51df22c6f
institution Kabale University
issn 2590-1230
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj-art-4b3965eee95b40f38de42bb51df22c6f2025-08-20T03:29:32ZengElsevierResults in Engineering2590-12302025-09-012710587510.1016/j.rineng.2025.105875Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvestingSaranya Karattupalayam Chidambaram0Sudhanshu Arya1Yogesh Kumar Choukiker2Abhijit Bhowmick3School of Electronics Engineering, VIT Vellore, Tamil Nadu, IndiaSchool of Electronics Engineering, VIT Vellore, Tamil Nadu, IndiaSchool of Electronics Engineering, VIT Vellore, Tamil Nadu, IndiaCorresponding author.; School of Electronics Engineering, VIT Vellore, Tamil Nadu, IndiaIn this work, a novel off-policy reinforcement learning framework is developed for a relay-assisted cooperative device-to-device (D2D) communication system powered by ambient backscattering system (ABCS) and energy harvesting. The relay follows the harvest-receive-sense-then-transmit (HRSTT) strategy. To this end, a relay architecture is proposed, and an end-to-end frame structure is presented in this context. In the proposed architecture, the activity of a cellular user (CU) is sensed by the relay, which then selects between two mutually exclusive transmission modes: ambient backscatter mode (ABSM) and active RF transceiver mode (ARTRM) based on sensing information. An analytical framework is developed to capture the impact of random dwelling time of CU in a given time frame. In particular, to ensure reliable and effective communication performance, a novel off-policy machine learning (ML) framework ML-aided module for sensing and power allocation (MLAMSPA) has also been developed. Inspired from Q-learning, MLAMSPA is optimal for real-time sensing and power allocation for the users’ signals. The proposed MLAMSPA framework not only captures the dynamic time allocation between backscattering and direct transmission but also embeds the effects of dynamic channel conditions and power control into the reward signal used for learning the optimal policy. The network performance is studied in terms of CU dwelling time distribution, power allocation to the user signal, CU activity, sum-rate and outage. The impact of various network parameters, such as the mean of CU dwelling time, SNR threshold, harvesting time, backscatter parameters, etc., on sum-rate and outage is analysed.http://www.sciencedirect.com/science/article/pii/S2590123025019462BackscatteringDevice-to-device (D2D) communicationEnergy harvestingmachine learningsum-rate
spellingShingle Saranya Karattupalayam Chidambaram
Sudhanshu Arya
Yogesh Kumar Choukiker
Abhijit Bhowmick
Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
Results in Engineering
Backscattering
Device-to-device (D2D) communication
Energy harvesting
machine learning
sum-rate
title Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
title_full Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
title_fullStr Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
title_full_unstemmed Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
title_short Novel off-policy reinforcement learning framework for relay-assisted D2D network powered by ambient backscattering and energy harvesting
title_sort novel off policy reinforcement learning framework for relay assisted d2d network powered by ambient backscattering and energy harvesting
topic Backscattering
Device-to-device (D2D) communication
Energy harvesting
machine learning
sum-rate
url http://www.sciencedirect.com/science/article/pii/S2590123025019462
work_keys_str_mv AT saranyakarattupalayamchidambaram noveloffpolicyreinforcementlearningframeworkforrelayassistedd2dnetworkpoweredbyambientbackscatteringandenergyharvesting
AT sudhanshuarya noveloffpolicyreinforcementlearningframeworkforrelayassistedd2dnetworkpoweredbyambientbackscatteringandenergyharvesting
AT yogeshkumarchoukiker noveloffpolicyreinforcementlearningframeworkforrelayassistedd2dnetworkpoweredbyambientbackscatteringandenergyharvesting
AT abhijitbhowmick noveloffpolicyreinforcementlearningframeworkforrelayassistedd2dnetworkpoweredbyambientbackscatteringandenergyharvesting