A monitoring network SIMNet for weld penetration status based on multimodal fusion

Abstract This paper primarily addresses the challenges posed by the difficulties in directly measuring the fusion width at the bottom of the weld and in real-time monitoring of the penetration state during the groove welding process. It focuses on the research of online penetration state monitoring...

Full description

Saved in:
Bibliographic Details
Main Authors: Qi Jiang, Yiming Wang, Yan Kong, Yu Liu, Ce Ma
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-06324-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238854587908096
author Qi Jiang
Yiming Wang
Yan Kong
Yu Liu
Ce Ma
author_facet Qi Jiang
Yiming Wang
Yan Kong
Yu Liu
Ce Ma
author_sort Qi Jiang
collection DOAJ
description Abstract This paper primarily addresses the challenges posed by the difficulties in directly measuring the fusion width at the bottom of the weld and in real-time monitoring of the penetration state during the groove welding process. It focuses on the research of online penetration state monitoring technology, which utilizes multi-modal signals such as sound and image during the welding process. The multimodal network proposed in this paper, SIMNet, first employs the short-time Fourier transform (STFT) to convert the original sound signal into the time–frequency domain for preliminary feature extraction. Secondly, a visual feature extractor based on an attention mechanism is used to extract image features. Meanwhile, a cosine similarity loss function is introduced to align the features of the two modalities in the semantic space before fusion. Finally, the interaction and fusion of features are achieved through a cross-attention mechanism. The experimental results demonstrate that SIMNet achieves the best performance with a mean squared error (MSE) of 0.1141 mm, compared to other mainstream algorithms. Furthermore, the inference speed with multimodal input reaches 60 frames per second (FPS), enabling quantitative and real-time multimodal fusion intelligent penetration state monitoring.
format Article
id doaj-art-6180310d0e2947ccbfc4144e1c76e76d
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6180310d0e2947ccbfc4144e1c76e76d2025-08-20T04:01:24ZengNature PortfolioScientific Reports2045-23222025-07-0115111210.1038/s41598-025-06324-yA monitoring network SIMNet for weld penetration status based on multimodal fusionQi Jiang0Yiming Wang1Yan Kong2Yu Liu3Ce Ma4Nanjing University of Posts and TelecommunicationsLuoyang Institute of Science and TechnologyNanjing University of Aeronautics and AstronauticsLuoyang Institute of Science and TechnologyLuoyang Institute of Science and TechnologyAbstract This paper primarily addresses the challenges posed by the difficulties in directly measuring the fusion width at the bottom of the weld and in real-time monitoring of the penetration state during the groove welding process. It focuses on the research of online penetration state monitoring technology, which utilizes multi-modal signals such as sound and image during the welding process. The multimodal network proposed in this paper, SIMNet, first employs the short-time Fourier transform (STFT) to convert the original sound signal into the time–frequency domain for preliminary feature extraction. Secondly, a visual feature extractor based on an attention mechanism is used to extract image features. Meanwhile, a cosine similarity loss function is introduced to align the features of the two modalities in the semantic space before fusion. Finally, the interaction and fusion of features are achieved through a cross-attention mechanism. The experimental results demonstrate that SIMNet achieves the best performance with a mean squared error (MSE) of 0.1141 mm, compared to other mainstream algorithms. Furthermore, the inference speed with multimodal input reaches 60 frames per second (FPS), enabling quantitative and real-time multimodal fusion intelligent penetration state monitoring.https://doi.org/10.1038/s41598-025-06324-ySound signalMolten pool imagePenetration stateCNNAttention mechanism
spellingShingle Qi Jiang
Yiming Wang
Yan Kong
Yu Liu
Ce Ma
A monitoring network SIMNet for weld penetration status based on multimodal fusion
Scientific Reports
Sound signal
Molten pool image
Penetration state
CNN
Attention mechanism
title A monitoring network SIMNet for weld penetration status based on multimodal fusion
title_full A monitoring network SIMNet for weld penetration status based on multimodal fusion
title_fullStr A monitoring network SIMNet for weld penetration status based on multimodal fusion
title_full_unstemmed A monitoring network SIMNet for weld penetration status based on multimodal fusion
title_short A monitoring network SIMNet for weld penetration status based on multimodal fusion
title_sort monitoring network simnet for weld penetration status based on multimodal fusion
topic Sound signal
Molten pool image
Penetration state
CNN
Attention mechanism
url https://doi.org/10.1038/s41598-025-06324-y
work_keys_str_mv AT qijiang amonitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yimingwang amonitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yankong amonitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yuliu amonitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT cema amonitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT qijiang monitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yimingwang monitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yankong monitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT yuliu monitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion
AT cema monitoringnetworksimnetforweldpenetrationstatusbasedonmultimodalfusion