OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the design...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Sir Syed University of Engineering and Technology, Karachi.
2024-05-01
|
| Series: | Sir Syed University Research Journal of Engineering and Technology |
| Subjects: | |
| Online Access: | http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849433888613466112 |
|---|---|
| author | Jakhro Abdul Naveed Mudasar Ahmed Soomro Leezna Saleem Muhammad Khalid Shaikh |
| author_facet | Jakhro Abdul Naveed Mudasar Ahmed Soomro Leezna Saleem Muhammad Khalid Shaikh |
| author_sort | Jakhro Abdul Naveed |
| collection | DOAJ |
| description |
This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made
publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems.
|
| format | Article |
| id | doaj-art-0d4d0a3c8e064d949a1ab9e3500a7ecc |
| institution | Kabale University |
| issn | 1997-0641 2415-2048 |
| language | English |
| publishDate | 2024-05-01 |
| publisher | Sir Syed University of Engineering and Technology, Karachi. |
| record_format | Article |
| series | Sir Syed University Research Journal of Engineering and Technology |
| spelling | doaj-art-0d4d0a3c8e064d949a1ab9e3500a7ecc2025-08-20T03:26:52ZengSir Syed University of Engineering and Technology, Karachi.Sir Syed University Research Journal of Engineering and Technology1997-06412415-20482024-05-0114110.33317/ssurj.618OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition Jakhro Abdul Naveed0Mudasar Ahmed Soomro 1Leezna Saleem2Muhammad Khalid Shaikh31Department of Information Technology Shaheed Benazir Bhutto University, Naushahro Feroze Campus, Sindh, PakistanDepartment of Information Technology Shaheed Benazir Bhutto University, Naushahro Feroze Campus, Sindh, Pakistan College Education Department Karachi, Govt. of Sindh,Department of Information Technology University of Sindh, Khan Bhadur Syed Allahndo Shah, Naushahro Feroze Campus, Sindh, Pakistan This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems. http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618Benchmark DatasetHandwritten Character RecognitionPattern RecognitionMachine LearningSindhi Language |
| spellingShingle | Jakhro Abdul Naveed Mudasar Ahmed Soomro Leezna Saleem Muhammad Khalid Shaikh OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition Sir Syed University Research Journal of Engineering and Technology Benchmark Dataset Handwritten Character Recognition Pattern Recognition Machine Learning Sindhi Language |
| title | OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition |
| title_full | OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition |
| title_fullStr | OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition |
| title_full_unstemmed | OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition |
| title_short | OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition |
| title_sort | ohscr benchmarks dataset for offline handwritten sindhi character recognition |
| topic | Benchmark Dataset Handwritten Character Recognition Pattern Recognition Machine Learning Sindhi Language |
| url | http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618 |
| work_keys_str_mv | AT jakhroabdulnaveed ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition AT mudasarahmedsoomro ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition AT leeznasaleem ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition AT muhammadkhalidshaikh ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition |