6G-oriented cross-modal signal reconstruction technology

Objectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal...

Full description

Saved in:
Bibliographic Details
Main Authors: Ang LI, Jianxin CHEN, Xin WEI, Liang ZHOU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2022-06-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539245855997952
author Ang LI
Jianxin CHEN
Xin WEI
Liang ZHOU
author_facet Ang LI
Jianxin CHEN
Xin WEI
Liang ZHOU
author_sort Ang LI
collection DOAJ
description Objectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal processing, transmission and storage of existing communication systems. Therefore, a cross-modal signal reconstruction scheme is urgently needed to reduce the amount of transmitted data to support 6G immersive multimodal services in order to meet the user's immersive experience requirements and guarantee low latency,high reliability and high capacity communication quality. Methods:Firstly,by controlling the robot to touch various materials,a dataset containing audio, visual and touch signals, VisTouch, is constructed to lay the foundation for subsequent research on various cross-modal problems; secondly, by exploiting the semantic correlation between multimodal signals, a universal and robust end-to-end cross-modal signal reconstruction architecture is designed, comprising three parts: a feature extraction module, a reconstruction module and an evaluation module. The feature extraction module maps the source modal signals into a semantic feature vector in the common semantic space, and the reconstruction module inverse transforms this semantic feature vector into the target modal signal.The evaluation module evaluates the reconstruction quality in semantic and spatio-temporal dimensions, and feeds the optimization information to the feature extraction module and the reconstruction module during the training process of the framework, forming a closed-loop loop to achieve accurate signal reconstruction through continuous iteration. Further, a teleoperated platform is designed to deploy the constructed haptic reconstruction model into the codec to actually verify the operational efficiency of the model; finally, the reliability of the cross-modal signal reconstruction architecture and the accuracy of the haptic reconstruction model are verified by experimental results. Results: The constructed VisTouch dataset involves three modalities: audio, video and haptics, and contains 47 common slices of life samples. The average absolute error and accuracy of the constructed video-assisted haptic reconstruction model on the VisTouch dataset reached 0.0135 and 0.78 respectively. In order to implement the proposed cross-modal signal reconstruction framework into practical application scenarios, a teleoperation platform was further built using the robot and Nvidia development board for the industrial scenario of The results of running on this platform show that the actual mean absolute error is 0.0126,the total end-to-end delay is 127ms and the reconstruction model delay is 98ms.A questionnaire was also used to assess user satisfaction,where the mean value of haptic realism satisfaction is 4.43 with a variance of 0.72 and the mean value of time delay satisfaction is 3.87 with a variance of 1.07. Conclusions: The results of the dataset runs fully demonstrate the practicality of the constructed VisTouch dataset and the accuracy of the video-assisted haptic reconstruction model, while the actual test results of the teleoperated platform indicate that users consider the haptic signals generated by the model to be closer to the actual signals,but are generally satisfied with the running time of the algorithm, i.e. the complexity of this modality needs further optimization.
format Article
id doaj-art-fe3a80f2ce3c4b8089a17038f3477414
institution Kabale University
issn 1000-436X
language zho
publishDate 2022-06-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-fe3a80f2ce3c4b8089a17038f34774142025-01-14T07:23:38ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2022-06-01432840598366636G-oriented cross-modal signal reconstruction technologyAng LIJianxin CHENXin WEILiang ZHOUObjectives:It is well known that multimodal services containing audio,video and haptics such as mixed reality,digital twin and metaverse are bound to become killer applications in the 6G era,however,the large amount of multimodal data generated by such services is highly likely to burden the signal processing, transmission and storage of existing communication systems. Therefore, a cross-modal signal reconstruction scheme is urgently needed to reduce the amount of transmitted data to support 6G immersive multimodal services in order to meet the user's immersive experience requirements and guarantee low latency,high reliability and high capacity communication quality. Methods:Firstly,by controlling the robot to touch various materials,a dataset containing audio, visual and touch signals, VisTouch, is constructed to lay the foundation for subsequent research on various cross-modal problems; secondly, by exploiting the semantic correlation between multimodal signals, a universal and robust end-to-end cross-modal signal reconstruction architecture is designed, comprising three parts: a feature extraction module, a reconstruction module and an evaluation module. The feature extraction module maps the source modal signals into a semantic feature vector in the common semantic space, and the reconstruction module inverse transforms this semantic feature vector into the target modal signal.The evaluation module evaluates the reconstruction quality in semantic and spatio-temporal dimensions, and feeds the optimization information to the feature extraction module and the reconstruction module during the training process of the framework, forming a closed-loop loop to achieve accurate signal reconstruction through continuous iteration. Further, a teleoperated platform is designed to deploy the constructed haptic reconstruction model into the codec to actually verify the operational efficiency of the model; finally, the reliability of the cross-modal signal reconstruction architecture and the accuracy of the haptic reconstruction model are verified by experimental results. Results: The constructed VisTouch dataset involves three modalities: audio, video and haptics, and contains 47 common slices of life samples. The average absolute error and accuracy of the constructed video-assisted haptic reconstruction model on the VisTouch dataset reached 0.0135 and 0.78 respectively. In order to implement the proposed cross-modal signal reconstruction framework into practical application scenarios, a teleoperation platform was further built using the robot and Nvidia development board for the industrial scenario of The results of running on this platform show that the actual mean absolute error is 0.0126,the total end-to-end delay is 127ms and the reconstruction model delay is 98ms.A questionnaire was also used to assess user satisfaction,where the mean value of haptic realism satisfaction is 4.43 with a variance of 0.72 and the mean value of time delay satisfaction is 3.87 with a variance of 1.07. Conclusions: The results of the dataset runs fully demonstrate the practicality of the constructed VisTouch dataset and the accuracy of the video-assisted haptic reconstruction model, while the actual test results of the teleoperated platform indicate that users consider the haptic signals generated by the model to be closer to the actual signals,but are generally satisfied with the running time of the algorithm, i.e. the complexity of this modality needs further optimization.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/6Gcross-modal signal reconstructionmulti-modal dataset3D CNNGAN
spellingShingle Ang LI
Jianxin CHEN
Xin WEI
Liang ZHOU
6G-oriented cross-modal signal reconstruction technology
Tongxin xuebao
6G
cross-modal signal reconstruction
multi-modal dataset
3D CNN
GAN
title 6G-oriented cross-modal signal reconstruction technology
title_full 6G-oriented cross-modal signal reconstruction technology
title_fullStr 6G-oriented cross-modal signal reconstruction technology
title_full_unstemmed 6G-oriented cross-modal signal reconstruction technology
title_short 6G-oriented cross-modal signal reconstruction technology
title_sort 6g oriented cross modal signal reconstruction technology
topic 6G
cross-modal signal reconstruction
multi-modal dataset
3D CNN
GAN
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022093/
work_keys_str_mv AT angli 6gorientedcrossmodalsignalreconstructiontechnology
AT jianxinchen 6gorientedcrossmodalsignalreconstructiontechnology
AT xinwei 6gorientedcrossmodalsignalreconstructiontechnology
AT liangzhou 6gorientedcrossmodalsignalreconstructiontechnology