Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques

This paper presents a new technique for the binarization of historical document images characterized by deteriorations and damages making their automatic processing difficult at several levels. The proposed method is based on hybrid thresholding combining the advantages of global and local methods a...

Full description

Saved in:
Bibliographic Details
Main Authors: Toufik Sari, Abderrahmane Kefali, Halima Bahi
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:Advances in Multimedia
Online Access:http://dx.doi.org/10.1155/2014/934656
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568362921623552
author Toufik Sari
Abderrahmane Kefali
Halima Bahi
author_facet Toufik Sari
Abderrahmane Kefali
Halima Bahi
author_sort Toufik Sari
collection DOAJ
description This paper presents a new technique for the binarization of historical document images characterized by deteriorations and damages making their automatic processing difficult at several levels. The proposed method is based on hybrid thresholding combining the advantages of global and local methods and on the mixture of several binarization techniques. Two stages have been included. In the first stage, global thresholding is applied on the entire image and two different thresholds are determined from which the most of image pixels are classified into foreground or background. In the second stage, the remaining pixels are assigned to foreground or background classes based on local analysis. In this stage, several local thresholding methods are combined and the final binary value of each remaining pixel is chosen as the most probable one. The proposed technique has been tested on a large collection of standard and synthetic documents and compared with well-known methods using standard measures and was shown to be more powerful.
format Article
id doaj-art-641cab07a8e241ef90e6786845a37266
institution Kabale University
issn 1687-5680
1687-5699
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series Advances in Multimedia
spelling doaj-art-641cab07a8e241ef90e6786845a372662025-02-03T00:59:15ZengWileyAdvances in Multimedia1687-56801687-56992014-01-01201410.1155/2014/934656934656Text Extraction from Historical Document Images by the Combination of Several Thresholding TechniquesToufik Sari0Abderrahmane Kefali1Halima Bahi2LabGED Laboratory, Badji Mokhtar, BP 12, 23000 Annaba, AlgeriaLabGED Laboratory, Badji Mokhtar, BP 12, 23000 Annaba, AlgeriaLabGED Laboratory, Badji Mokhtar, BP 12, 23000 Annaba, AlgeriaThis paper presents a new technique for the binarization of historical document images characterized by deteriorations and damages making their automatic processing difficult at several levels. The proposed method is based on hybrid thresholding combining the advantages of global and local methods and on the mixture of several binarization techniques. Two stages have been included. In the first stage, global thresholding is applied on the entire image and two different thresholds are determined from which the most of image pixels are classified into foreground or background. In the second stage, the remaining pixels are assigned to foreground or background classes based on local analysis. In this stage, several local thresholding methods are combined and the final binary value of each remaining pixel is chosen as the most probable one. The proposed technique has been tested on a large collection of standard and synthetic documents and compared with well-known methods using standard measures and was shown to be more powerful.http://dx.doi.org/10.1155/2014/934656
spellingShingle Toufik Sari
Abderrahmane Kefali
Halima Bahi
Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
Advances in Multimedia
title Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
title_full Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
title_fullStr Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
title_full_unstemmed Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
title_short Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
title_sort text extraction from historical document images by the combination of several thresholding techniques
url http://dx.doi.org/10.1155/2014/934656
work_keys_str_mv AT toufiksari textextractionfromhistoricaldocumentimagesbythecombinationofseveralthresholdingtechniques
AT abderrahmanekefali textextractionfromhistoricaldocumentimagesbythecombinationofseveralthresholdingtechniques
AT halimabahi textextractionfromhistoricaldocumentimagesbythecombinationofseveralthresholdingtechniques