MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
Sign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating wit...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10550916/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849430707912310784 |
|---|---|
| author | Khan Abrar Shams Md. Rafid Reaz Mohammad Ryan Ur Rafi Sanjida Islam Md. Shahriar Rahman Rafeed Rahman Md. Tanzim Reza Mohammad Zavid Parvez Subrata Chakraborty Biswajeet Pradhan Abdullah Alamri |
| author_facet | Khan Abrar Shams Md. Rafid Reaz Mohammad Ryan Ur Rafi Sanjida Islam Md. Shahriar Rahman Rafeed Rahman Md. Tanzim Reza Mohammad Zavid Parvez Subrata Chakraborty Biswajeet Pradhan Abdullah Alamri |
| author_sort | Khan Abrar Shams |
| collection | DOAJ |
| description | Sign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras that continuously capture images, which a DL model then processes. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model attained the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition. |
| format | Article |
| id | doaj-art-f4e37d2fac354083b4ff0361e1e6b59a |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-f4e37d2fac354083b4ff0361e1e6b59a2025-08-20T03:27:52ZengIEEEIEEE Access2169-35362024-01-0112836388365710.1109/ACCESS.2024.341083710550916MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language RecognitionKhan Abrar Shams0Md. Rafid Reaz1Mohammad Ryan Ur Rafi2https://orcid.org/0009-0009-5178-3985Sanjida Islam3Md. Shahriar Rahman4Rafeed Rahman5Md. Tanzim Reza6Mohammad Zavid Parvez7https://orcid.org/0000-0002-1895-8474Subrata Chakraborty8https://orcid.org/0000-0002-0102-5424Biswajeet Pradhan9https://orcid.org/0000-0001-9863-2054Abdullah Alamri10Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshSchool of Computing, Mathematics and Engineering, Charles Sturt University, Bathurst, NSW, AustraliaSchool of Science and Technology, University of New England, Armidale, NSW, AustraliaCentre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, AustraliaDepartment of Geology and Geophysics, College of Science, King Saud University, Riyadh, Saudi ArabiaSign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras that continuously capture images, which a DL model then processes. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model attained the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.https://ieeexplore.ieee.org/document/10550916/Bangla sign language (BdSL)convolutional neural networkensemble method |
| spellingShingle | Khan Abrar Shams Md. Rafid Reaz Mohammad Ryan Ur Rafi Sanjida Islam Md. Shahriar Rahman Rafeed Rahman Md. Tanzim Reza Mohammad Zavid Parvez Subrata Chakraborty Biswajeet Pradhan Abdullah Alamri MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition IEEE Access Bangla sign language (BdSL) convolutional neural network ensemble method |
| title | MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition |
| title_full | MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition |
| title_fullStr | MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition |
| title_full_unstemmed | MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition |
| title_short | MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition |
| title_sort | multimodal ensemble approach leveraging spatial skeletal and edge features for enhanced bangla sign language recognition |
| topic | Bangla sign language (BdSL) convolutional neural network ensemble method |
| url | https://ieeexplore.ieee.org/document/10550916/ |
| work_keys_str_mv | AT khanabrarshams multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT mdrafidreaz multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT mohammadryanurrafi multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT sanjidaislam multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT mdshahriarrahman multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT rafeedrahman multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT mdtanzimreza multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT mohammadzavidparvez multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT subratachakraborty multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT biswajeetpradhan multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition AT abdullahalamri multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition |