Dense dynamic convolutional network for Bel canto vocal technique assessment

Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Ne...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhenyi Hou, Xu Zhao, Shanggerile Jiang, Daijun Luo, Xinyu Sheng, Kaili Geng, Kejie Ye, Jiajing Xia, Yitao Zhang, Chenxi Ban, Jiaxing Chen, Yan Zou, Yuchao Feng, Xin Yuan, Guangyu Fan
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-98726-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728260926078976
author Zhenyi Hou
Xu Zhao
Shanggerile Jiang
Daijun Luo
Xinyu Sheng
Kaili Geng
Kejie Ye
Jiajing Xia
Yitao Zhang
Chenxi Ban
Jiaxing Chen
Yan Zou
Yuchao Feng
Xin Yuan
Guangyu Fan
author_facet Zhenyi Hou
Xu Zhao
Shanggerile Jiang
Daijun Luo
Xinyu Sheng
Kaili Geng
Kejie Ye
Jiajing Xia
Yitao Zhang
Chenxi Ban
Jiaxing Chen
Yan Zou
Yuchao Feng
Xin Yuan
Guangyu Fan
author_sort Zhenyi Hou
collection DOAJ
description Abstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.
format Article
id doaj-art-dca6648945864997a182bbd8c9d95191
institution DOAJ
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-dca6648945864997a182bbd8c9d951912025-08-20T03:09:35ZengNature PortfolioScientific Reports2045-23222025-05-0115111310.1038/s41598-025-98726-1Dense dynamic convolutional network for Bel canto vocal technique assessmentZhenyi Hou0Xu Zhao1Shanggerile Jiang2Daijun Luo3Xinyu Sheng4Kaili Geng5Kejie Ye6Jiajing Xia7Yitao Zhang8Chenxi Ban9Jiaxing Chen10Yan Zou11Yuchao Feng12Xin Yuan13Guangyu Fan14University of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyUniversity of Shanghai for Science and TechnologyShanghai Conservatory of MusicWestlake UniversityWestlake UniversityUniversity of Shanghai for Science and TechnologyAbstract The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer’s singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.https://doi.org/10.1038/s41598-025-98726-1Vocal educationVocal technique assessmentDeep learning
spellingShingle Zhenyi Hou
Xu Zhao
Shanggerile Jiang
Daijun Luo
Xinyu Sheng
Kaili Geng
Kejie Ye
Jiajing Xia
Yitao Zhang
Chenxi Ban
Jiaxing Chen
Yan Zou
Yuchao Feng
Xin Yuan
Guangyu Fan
Dense dynamic convolutional network for Bel canto vocal technique assessment
Scientific Reports
Vocal education
Vocal technique assessment
Deep learning
title Dense dynamic convolutional network for Bel canto vocal technique assessment
title_full Dense dynamic convolutional network for Bel canto vocal technique assessment
title_fullStr Dense dynamic convolutional network for Bel canto vocal technique assessment
title_full_unstemmed Dense dynamic convolutional network for Bel canto vocal technique assessment
title_short Dense dynamic convolutional network for Bel canto vocal technique assessment
title_sort dense dynamic convolutional network for bel canto vocal technique assessment
topic Vocal education
Vocal technique assessment
Deep learning
url https://doi.org/10.1038/s41598-025-98726-1
work_keys_str_mv AT zhenyihou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT xuzhao densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT shanggerilejiang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT daijunluo densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT xinyusheng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT kailigeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT kejieye densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT jiajingxia densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT yitaozhang densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT chenxiban densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT jiaxingchen densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT yanzou densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT yuchaofeng densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT xinyuan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment
AT guangyufan densedynamicconvolutionalnetworkforbelcantovocaltechniqueassessment