OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition

This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the design...

Full description

Saved in:
Bibliographic Details
Main Authors: Jakhro Abdul Naveed, Mudasar Ahmed Soomro, Leezna Saleem, Muhammad Khalid Shaikh
Format: Article
Language:English
Published: Sir Syed University of Engineering and Technology, Karachi. 2024-05-01
Series:Sir Syed University Research Journal of Engineering and Technology
Subjects:
Online Access:http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849433888613466112
author Jakhro Abdul Naveed
Mudasar Ahmed Soomro
Leezna Saleem
Muhammad Khalid Shaikh
author_facet Jakhro Abdul Naveed
Mudasar Ahmed Soomro
Leezna Saleem
Muhammad Khalid Shaikh
author_sort Jakhro Abdul Naveed
collection DOAJ
description This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems.
format Article
id doaj-art-0d4d0a3c8e064d949a1ab9e3500a7ecc
institution Kabale University
issn 1997-0641
2415-2048
language English
publishDate 2024-05-01
publisher Sir Syed University of Engineering and Technology, Karachi.
record_format Article
series Sir Syed University Research Journal of Engineering and Technology
spelling doaj-art-0d4d0a3c8e064d949a1ab9e3500a7ecc2025-08-20T03:26:52ZengSir Syed University of Engineering and Technology, Karachi.Sir Syed University Research Journal of Engineering and Technology1997-06412415-20482024-05-0114110.33317/ssurj.618OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition Jakhro Abdul Naveed0Mudasar Ahmed Soomro 1Leezna Saleem2Muhammad Khalid Shaikh31Department of Information Technology Shaheed Benazir Bhutto University, Naushahro Feroze Campus, Sindh, PakistanDepartment of Information Technology Shaheed Benazir Bhutto University, Naushahro Feroze Campus, Sindh, Pakistan College Education Department Karachi, Govt. of Sindh,Department of Information Technology University of Sindh, Khan Bhadur Syed Allahndo Shah, Naushahro Feroze Campus, Sindh, Pakistan This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems. http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618Benchmark DatasetHandwritten Character RecognitionPattern RecognitionMachine LearningSindhi Language
spellingShingle Jakhro Abdul Naveed
Mudasar Ahmed Soomro
Leezna Saleem
Muhammad Khalid Shaikh
OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
Sir Syed University Research Journal of Engineering and Technology
Benchmark Dataset
Handwritten Character Recognition
Pattern Recognition
Machine Learning
Sindhi Language
title OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
title_full OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
title_fullStr OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
title_full_unstemmed OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
title_short OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
title_sort ohscr benchmarks dataset for offline handwritten sindhi character recognition
topic Benchmark Dataset
Handwritten Character Recognition
Pattern Recognition
Machine Learning
Sindhi Language
url http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/618
work_keys_str_mv AT jakhroabdulnaveed ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition
AT mudasarahmedsoomro ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition
AT leeznasaleem ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition
AT muhammadkhalidshaikh ohscrbenchmarksdatasetforofflinehandwrittensindhicharacterrecognition