Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers

Identification of script from document images is an active area of research under document image processing for a multilingual/ multiscript country like India. In this paper the real life problem of printed script identification from official Indian document images is considered and performances of...

Full description

Saved in:
Bibliographic Details
Main Authors: Sk Md Obaidullah, Anamika Mondal, Nibaran Das, Kaushik Roy
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/2014/896128
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Identification of script from document images is an active area of research under document image processing for a multilingual/ multiscript country like India. In this paper the real life problem of printed script identification from official Indian document images is considered and performances of different well-known classifiers are evaluated. Two important evaluating parameters, namely, AAR (average accuracy rate) and MBT (model building time), are computed for this performance analysis. Experiment was carried out on 459 printed document images with 5-fold cross-validation. Simple Logistic model shows highest AAR of 98.9% among all. BayesNet and Random Forest model have average accuracy rate of 96.7% and 98.2% correspondingly with lowest MBT of 0.09 s.
ISSN:1687-9724
1687-9732