Printed Persian Subword Recognition Using Wavelet Packet Descriptors

In this paper, we present a new approach to offline OCR (optical character recognition) for printed Persian subwords using wavelet packet transform. The proposed algorithm is used to extract font invariant and size invariant features from 87804 subwords of 4 fonts and 3 sizes. The feature vectors a...

Full description

Saved in:
Bibliographic Details
Main Authors: Samira Nasrollahi, Afshin Ebrahimi
Format: Article
Language:English
Published: Wiley 2013-01-01
Series:Journal of Engineering
Online Access:http://dx.doi.org/10.1155/2013/465469
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832565348960829440
author Samira Nasrollahi
Afshin Ebrahimi
author_facet Samira Nasrollahi
Afshin Ebrahimi
author_sort Samira Nasrollahi
collection DOAJ
description In this paper, we present a new approach to offline OCR (optical character recognition) for printed Persian subwords using wavelet packet transform. The proposed algorithm is used to extract font invariant and size invariant features from 87804 subwords of 4 fonts and 3 sizes. The feature vectors are compressed using PCA. The obtained feature vectors yield a pictorial dictionary for which an entry is the mean of each group that consists of the same subword with 4 fonts in 3 sizes. The sets of these features are congregated by combining them with the dot features for the recognition of printed Persian subwords. To evaluate the feature extraction results, this algorithm was tested on a set of 2000 subwords in printed Persian text documents. An encouraging recognition rate of 97.9% is got at subword level recognition.
format Article
id doaj-art-02eebe484f164c859fcbf87781ca2c82
institution Kabale University
issn 2314-4904
2314-4912
language English
publishDate 2013-01-01
publisher Wiley
record_format Article
series Journal of Engineering
spelling doaj-art-02eebe484f164c859fcbf87781ca2c822025-02-03T01:08:00ZengWileyJournal of Engineering2314-49042314-49122013-01-01201310.1155/2013/465469465469Printed Persian Subword Recognition Using Wavelet Packet DescriptorsSamira Nasrollahi0Afshin Ebrahimi1Faculty of Electrical Engineering, Sahand University of Technology, Tabriz, IranFaculty of Electrical Engineering, Sahand University of Technology, Tabriz, IranIn this paper, we present a new approach to offline OCR (optical character recognition) for printed Persian subwords using wavelet packet transform. The proposed algorithm is used to extract font invariant and size invariant features from 87804 subwords of 4 fonts and 3 sizes. The feature vectors are compressed using PCA. The obtained feature vectors yield a pictorial dictionary for which an entry is the mean of each group that consists of the same subword with 4 fonts in 3 sizes. The sets of these features are congregated by combining them with the dot features for the recognition of printed Persian subwords. To evaluate the feature extraction results, this algorithm was tested on a set of 2000 subwords in printed Persian text documents. An encouraging recognition rate of 97.9% is got at subword level recognition.http://dx.doi.org/10.1155/2013/465469
spellingShingle Samira Nasrollahi
Afshin Ebrahimi
Printed Persian Subword Recognition Using Wavelet Packet Descriptors
Journal of Engineering
title Printed Persian Subword Recognition Using Wavelet Packet Descriptors
title_full Printed Persian Subword Recognition Using Wavelet Packet Descriptors
title_fullStr Printed Persian Subword Recognition Using Wavelet Packet Descriptors
title_full_unstemmed Printed Persian Subword Recognition Using Wavelet Packet Descriptors
title_short Printed Persian Subword Recognition Using Wavelet Packet Descriptors
title_sort printed persian subword recognition using wavelet packet descriptors
url http://dx.doi.org/10.1155/2013/465469
work_keys_str_mv AT samiranasrollahi printedpersiansubwordrecognitionusingwaveletpacketdescriptors
AT afshinebrahimi printedpersiansubwordrecognitionusingwaveletpacketdescriptors