Dense dynamic convolutional network for Bel canto vocal technique assessment

Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Ne...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhenyi Hou, Xu Zhao, Shanggerile Jiang, Daijun Luo, Xinyu Sheng, Kaili Geng, Kejie Ye, Jiajing Xia, Yitao Zhang, Chenxi Ban, Jiaxing Chen, Yan Zou, Yuchao Feng, Xin Yuan, Guangyu Fan
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Scientific Reports
Subjects:	Vocal education Vocal technique assessment Deep learning
Online Access:	https://doi.org/10.1038/s41598-025-98726-1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849728260926078976
author	Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan
author_facet	Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan
author_sort	Zhenyi Hou
collection	DOAJ
description	Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.
format	Article
id	doaj-art-dca6648945864997a182bbd8c9d95191
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-05-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-dca6648945864997a182bbd8c9d951912025-08-20T03:09:35ZengNature PortfolioScientific Reports2045-23222025-05-0115111310.1038/s41598-025-98726-1Dense dynamic convolutional network for Bel canto vocal technique assessmentZhenyi Hou0Xu Zhao1Shanggerile Jiang2Daijun Luo3Xinyu Sheng4Kaili Geng5Kejie Ye6Jiajing Xia7Yitao Zhang8Chenxi Ban9Jiaxing Chen10Yan Zou11Yuchao Feng12Xin Yuan13Guangyu Fan14University of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyShanghai Conservatory of MusicWestlake UniversityWestlake UniversityUniversity of Shanghai for Science and TechnologyAbstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.https://doi.org/10.1038/s41598-025-98726-1Vocal educationVocal technique assessmentDeep learning
spellingShingle	Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan Dense dynamic convolutional network for Bel canto vocal technique assessment Scientific Reports Vocal education Vocal technique assessment Deep learning
title	Dense dynamic convolutional network for Bel canto vocal technique assessment
title_full	Dense dynamic convolutional network for Bel canto vocal technique assessment
title_fullStr	Dense dynamic convolutional network for Bel canto vocal technique assessment
title_full_unstemmed	Dense dynamic convolutional network for Bel canto vocal technique assessment
title_short	Dense dynamic convolutional network for Bel canto vocal technique assessment
title_sort	dense dynamic convolutional network for bel canto vocal technique assessment
topic	Vocal education Vocal technique assessment Deep learning
url	https://doi.org/10.1038/s41598-025-98726-1
work_keys_str_mv	AT zhenyihou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xuzhao densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT shanggerilejiang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT daijunluo densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xinyusheng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT kailigeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT kejieye densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT jiajingxia densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yitaozhang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT chenxiban densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT jiaxingchen densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yanzou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yuchaofeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xinyuan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT guangyufan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment

Dense dynamic convolutional network for Bel canto vocal technique assessment

Similar Items