MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Sign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating wit...

Full description

Saved in:
Bibliographic Details
Main Authors: Khan Abrar Shams, Md. Rafid Reaz, Mohammad Ryan Ur Rafi, Sanjida Islam, Md. Shahriar Rahman, Rafeed Rahman, Md. Tanzim Reza, Mohammad Zavid Parvez, Subrata Chakraborty, Biswajeet Pradhan, Abdullah Alamri
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10550916/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849430707912310784
author Khan Abrar Shams
Md. Rafid Reaz
Mohammad Ryan Ur Rafi
Sanjida Islam
Md. Shahriar Rahman
Rafeed Rahman
Md. Tanzim Reza
Mohammad Zavid Parvez
Subrata Chakraborty
Biswajeet Pradhan
Abdullah Alamri
author_facet Khan Abrar Shams
Md. Rafid Reaz
Mohammad Ryan Ur Rafi
Sanjida Islam
Md. Shahriar Rahman
Rafeed Rahman
Md. Tanzim Reza
Mohammad Zavid Parvez
Subrata Chakraborty
Biswajeet Pradhan
Abdullah Alamri
author_sort Khan Abrar Shams
collection DOAJ
description Sign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras that continuously capture images, which a DL model then processes. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model attained the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.
format Article
id doaj-art-f4e37d2fac354083b4ff0361e1e6b59a
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-f4e37d2fac354083b4ff0361e1e6b59a2025-08-20T03:27:52ZengIEEEIEEE Access2169-35362024-01-0112836388365710.1109/ACCESS.2024.341083710550916MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language RecognitionKhan Abrar Shams0Md. Rafid Reaz1Mohammad Ryan Ur Rafi2https://orcid.org/0009-0009-5178-3985Sanjida Islam3Md. Shahriar Rahman4Rafeed Rahman5Md. Tanzim Reza6Mohammad Zavid Parvez7https://orcid.org/0000-0002-1895-8474Subrata Chakraborty8https://orcid.org/0000-0002-0102-5424Biswajeet Pradhan9https://orcid.org/0000-0001-9863-2054Abdullah Alamri10Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshDepartment of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, BangladeshSchool of Computing, Mathematics and Engineering, Charles Sturt University, Bathurst, NSW, AustraliaSchool of Science and Technology, University of New England, Armidale, NSW, AustraliaCentre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, AustraliaDepartment of Geology and Geophysics, College of Science, King Saud University, Riyadh, Saudi ArabiaSign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras that continuously capture images, which a DL model then processes. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model attained the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.https://ieeexplore.ieee.org/document/10550916/Bangla sign language (BdSL)convolutional neural networkensemble method
spellingShingle Khan Abrar Shams
Md. Rafid Reaz
Mohammad Ryan Ur Rafi
Sanjida Islam
Md. Shahriar Rahman
Rafeed Rahman
Md. Tanzim Reza
Mohammad Zavid Parvez
Subrata Chakraborty
Biswajeet Pradhan
Abdullah Alamri
MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
IEEE Access
Bangla sign language (BdSL)
convolutional neural network
ensemble method
title MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
title_full MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
title_fullStr MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
title_full_unstemmed MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
title_short MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
title_sort multimodal ensemble approach leveraging spatial skeletal and edge features for enhanced bangla sign language recognition
topic Bangla sign language (BdSL)
convolutional neural network
ensemble method
url https://ieeexplore.ieee.org/document/10550916/
work_keys_str_mv AT khanabrarshams multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT mdrafidreaz multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT mohammadryanurrafi multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT sanjidaislam multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT mdshahriarrahman multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT rafeedrahman multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT mdtanzimreza multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT mohammadzavidparvez multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT subratachakraborty multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT biswajeetpradhan multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition
AT abdullahalamri multimodalensembleapproachleveragingspatialskeletalandedgefeaturesforenhancedbanglasignlanguagerecognition