Dense dynamic convolutional network for Bel canto vocal technique assessment
Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Ne...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-98726-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849728260926078976 |
|---|---|
| author | Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan |
| author_facet | Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan |
| author_sort | Zhenyi Hou |
| collection | DOAJ |
| description | Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education. |
| format | Article |
| id | doaj-art-dca6648945864997a182bbd8c9d95191 |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-dca6648945864997a182bbd8c9d951912025-08-20T03:09:35ZengNature PortfolioScientific Reports2045-23222025-05-0115111310.1038/s41598-025-98726-1Dense dynamic convolutional network for Bel canto vocal technique assessmentZhenyi Hou0Xu Zhao1Shanggerile Jiang2Daijun Luo3Xinyu Sheng4Kaili Geng5Kejie Ye6Jiajing Xia7Yitao Zhang8Chenxi Ban9Jiaxing Chen10Yan Zou11Yuchao Feng12Xin Yuan13Guangyu Fan14University of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyShanghai Conservatory of MusicWestlake UniversityWestlake UniversityUniversity of Shanghai for Science and TechnologyAbstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.https://doi.org/10.1038/s41598-025-98726-1Vocal educationVocal technique assessmentDeep learning |
| spellingShingle | Zhenyi Hou Xu Zhao Shanggerile Jiang Daijun Luo Xinyu Sheng Kaili Geng Kejie Ye Jiajing Xia Yitao Zhang Chenxi Ban Jiaxing Chen Yan Zou Yuchao Feng Xin Yuan Guangyu Fan Dense dynamic convolutional network for Bel canto vocal technique assessment Scientific Reports Vocal education Vocal technique assessment Deep learning |
| title | Dense dynamic convolutional network for Bel canto vocal technique assessment |
| title_full | Dense dynamic convolutional network for Bel canto vocal technique assessment |
| title_fullStr | Dense dynamic convolutional network for Bel canto vocal technique assessment |
| title_full_unstemmed | Dense dynamic convolutional network for Bel canto vocal technique assessment |
| title_short | Dense dynamic convolutional network for Bel canto vocal technique assessment |
| title_sort | dense dynamic convolutional network for bel canto vocal technique assessment |
| topic | Vocal education Vocal technique assessment Deep learning |
| url | https://doi.org/10.1038/s41598-025-98726-1 |
| work_keys_str_mv | AT zhenyihou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xuzhao densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT shanggerilejiang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT daijunluo densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xinyusheng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT kailigeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT kejieye densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT jiajingxia densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yitaozhang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT chenxiban densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT jiaxingchen densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yanzou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT yuchaofeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT xinyuan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment AT guangyufan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment |