OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the design...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Sir Syed University of Engineering and Technology, Karachi.
2024-05-01
|
| Series: | Sir Syed University Research Journal of Engineering and Technology |
| Subjects: | |
| Online Access: | http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made
publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems.
|
|---|---|
| ISSN: | 1997-0641 2415-2048 |