6G-oriented cross-modal signal reconstruction technology

Objectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ang LI, Jianxin CHEN, Xin WEI, Liang ZHOU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2022-06-01
Series:	Tongxin xuebao
Subjects:	6G cross-modal signal reconstruction multi-modal dataset 3D CNN GAN
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539245855997952
author	Ang LI Jianxin CHEN Xin WEI Liang ZHOU
author_facet	Ang LI Jianxin CHEN Xin WEI Liang ZHOU
author_sort	Ang LI
collection	DOAJ
description	Objectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal processing, transmission and storage of existing communication systems. Therefore, a cross-modal signal reconstruction scheme is urgently needed to reduce the amount of transmitted data to support 6G immersive multimodal services in order to meet the user's immersive experience requirements and guarantee low latency,high reliability and high capacity communication quality. Methods:Firstly,by controlling the robot to touch various materials,a dataset containing audio, visual and touch signals, VisTouch, is constructed to lay the foundation for subsequent research on various cross-modal problems; secondly, by exploiting the semantic correlation between multimodal signals, a universal and robust end-to-end cross-modal signal reconstruction architecture is designed, comprising three parts: a feature extraction module, a reconstruction module and an evaluation module. The feature extraction module maps the source modal signals into a semantic feature vector in the common semantic space, and the reconstruction module inverse transforms this semantic feature vector into the target modal signal.The evaluation module evaluates the reconstruction quality in semantic and spatio-temporal dimensions, and feeds the optimization information to the feature extraction module and the reconstruction module during the training process of the framework, forming a closed-loop loop to achieve accurate signal reconstruction through continuous iteration. Further, a teleoperated platform is designed to deploy the constructed haptic reconstruction model into the codec to actually verify the operational efficiency of the model; finally, the reliability of the cross-modal signal reconstruction architecture and the accuracy of the haptic reconstruction model are verified by experimental results. Results: The constructed VisTouch dataset involves three modalities: audio, video and haptics, and contains 47 common slices of life samples. The average absolute error and accuracy of the constructed video-assisted haptic reconstruction model on the VisTouch dataset reached 0.0135 and 0.78 respectively. In order to implement the proposed cross-modal signal reconstruction framework into practical application scenarios, a teleoperation platform was further built using the robot and Nvidia development board for the industrial scenario of The results of running on this platform show that the actual mean absolute error is 0.0126,the total end-to-end delay is 127ms and the reconstruction model delay is 98ms.A questionnaire was also used to assess user satisfaction,where the mean value of haptic realism satisfaction is 4.43 with a variance of 0.72 and the mean value of time delay satisfaction is 3.87 with a variance of 1.07. Conclusions: The results of the dataset runs fully demonstrate the practicality of the constructed VisTouch dataset and the accuracy of the video-assisted haptic reconstruction model, while the actual test results of the teleoperated platform indicate that users consider the haptic signals generated by the model to be closer to the actual signals,but are generally satisfied with the running time of the algorithm, i.e. the complexity of this modality needs further optimization.
format	Article
id	doaj-art-fe3a80f2ce3c4b8089a17038f3477414
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2022-06-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-fe3a80f2ce3c4b8089a17038f34774142025-01-14T07:23:38ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2022-06-01432840598366636G-oriented cross-modal signal reconstruction technologyAng LIJianxin CHENXin WEILiang ZHOUObjectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal processing, transmission and storage of existing communication systems. Therefore, a cross-modal signal reconstruction scheme is urgently needed to reduce the amount of transmitted data to support 6G immersive multimodal services in order to meet the user's immersive experience requirements and guarantee low latency,high reliability and high capacity communication quality. Methods:Firstly,by controlling the robot to touch various materials,a dataset containing audio, visual and touch signals, VisTouch, is constructed to lay the foundation for subsequent research on various cross-modal problems; secondly, by exploiting the semantic correlation between multimodal signals, a universal and robust end-to-end cross-modal signal reconstruction architecture is designed, comprising three parts: a feature extraction module, a reconstruction module and an evaluation module. The feature extraction module maps the source modal signals into a semantic feature vector in the common semantic space, and the reconstruction module inverse transforms this semantic feature vector into the target modal signal.The evaluation module evaluates the reconstruction quality in semantic and spatio-temporal dimensions, and feeds the optimization information to the feature extraction module and the reconstruction module during the training process of the framework, forming a closed-loop loop to achieve accurate signal reconstruction through continuous iteration. Further, a teleoperated platform is designed to deploy the constructed haptic reconstruction model into the codec to actually verify the operational efficiency of the model; finally, the reliability of the cross-modal signal reconstruction architecture and the accuracy of the haptic reconstruction model are verified by experimental results. Results: The constructed VisTouch dataset involves three modalities: audio, video and haptics, and contains 47 common slices of life samples. The average absolute error and accuracy of the constructed video-assisted haptic reconstruction model on the VisTouch dataset reached 0.0135 and 0.78 respectively. In order to implement the proposed cross-modal signal reconstruction framework into practical application scenarios, a teleoperation platform was further built using the robot and Nvidia development board for the industrial scenario of The results of running on this platform show that the actual mean absolute error is 0.0126,the total end-to-end delay is 127ms and the reconstruction model delay is 98ms.A questionnaire was also used to assess user satisfaction,where the mean value of haptic realism satisfaction is 4.43 with a variance of 0.72 and the mean value of time delay satisfaction is 3.87 with a variance of 1.07. Conclusions: The results of the dataset runs fully demonstrate the practicality of the constructed VisTouch dataset and the accuracy of the video-assisted haptic reconstruction model, while the actual test results of the teleoperated platform indicate that users consider the haptic signals generated by the model to be closer to the actual signals,but are generally satisfied with the running time of the algorithm, i.e. the complexity of this modality needs further optimization.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/6Gcross-modal signal reconstructionmulti-modal dataset3D CNNGAN
spellingShingle	Ang LI Jianxin CHEN Xin WEI Liang ZHOU 6G-oriented cross-modal signal reconstruction technology Tongxin xuebao 6G cross-modal signal reconstruction multi-modal dataset 3D CNN GAN
title	6G-oriented cross-modal signal reconstruction technology
title_full	6G-oriented cross-modal signal reconstruction technology
title_fullStr	6G-oriented cross-modal signal reconstruction technology
title_full_unstemmed	6G-oriented cross-modal signal reconstruction technology
title_short	6G-oriented cross-modal signal reconstruction technology
title_sort	6g oriented cross modal signal reconstruction technology
topic	6G cross-modal signal reconstruction multi-modal dataset 3D CNN GAN
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/
work_keys_str_mv	AT angli 6gorientedcrossmodalsignalreconstructiontechnology AT jianxinchen 6gorientedcrossmodalsignalreconstructiontechnology AT xinwei 6gorientedcrossmodalsignalreconstructiontechnology AT liangzhou 6gorientedcrossmodalsignalreconstructiontechnology

6G-oriented cross-modal signal reconstruction technology

Similar Items