A new split based searching for exact pattern matching for natural texts.

Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to...

Full description

Saved in:
Bibliographic Details
Main Authors: Saqib Hakak, Amirrudin Kamsin, Palaiahnakote Shivakumara, Mohd Yamani Idna Idris, Gulshan Amin Gilkar
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0200912&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850077170910625792
author Saqib Hakak
Saqib Hakak
Amirrudin Kamsin
Palaiahnakote Shivakumara
Mohd Yamani Idna Idris
Gulshan Amin Gilkar
author_facet Saqib Hakak
Saqib Hakak
Amirrudin Kamsin
Palaiahnakote Shivakumara
Mohd Yamani Idna Idris
Gulshan Amin Gilkar
author_sort Saqib Hakak
collection DOAJ
description Exact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases.
format Article
id doaj-art-521ba67019c242268652bd5dc7fcf23b
institution DOAJ
issn 1932-6203
language English
publishDate 2018-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-521ba67019c242268652bd5dc7fcf23b2025-08-20T02:45:52ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01137e020091210.1371/journal.pone.0200912A new split based searching for exact pattern matching for natural texts.Saqib HakakSaqib HakakAmirrudin KamsinPalaiahnakote ShivakumaraMohd Yamani Idna IdrisGulshan Amin GilkarExact pattern matching algorithms are popular and used widely in several applications, such as molecular biology, text processing, image processing, web search engines, network intrusion detection systems and operating systems. The focus of these algorithms is to achieve time efficiency according to applications but not memory consumption. In this work, we propose a novel idea to achieve both time efficiency and memory consumption by splitting query string for searching in Corpus. For a given text, the proposed algorithm split the query pattern into two equal halves and considers the second (right) half as a query string for searching in Corpus. Once the match is found with second halves, the proposed algorithm applies brute force procedure to find remaining match by referring the location of right half. Experimental results on different S1 Dataset, namely Arabic, English, Chinese, Italian and French text databases show that the proposed algorithm outperforms the existing S1 Algorithm in terms of time efficiency and memory consumption as the length of the query pattern increases.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0200912&type=printable
spellingShingle Saqib Hakak
Saqib Hakak
Amirrudin Kamsin
Palaiahnakote Shivakumara
Mohd Yamani Idna Idris
Gulshan Amin Gilkar
A new split based searching for exact pattern matching for natural texts.
PLoS ONE
title A new split based searching for exact pattern matching for natural texts.
title_full A new split based searching for exact pattern matching for natural texts.
title_fullStr A new split based searching for exact pattern matching for natural texts.
title_full_unstemmed A new split based searching for exact pattern matching for natural texts.
title_short A new split based searching for exact pattern matching for natural texts.
title_sort new split based searching for exact pattern matching for natural texts
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0200912&type=printable
work_keys_str_mv AT saqibhakak anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT saqibhakak anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT amirrudinkamsin anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT palaiahnakoteshivakumara anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT mohdyamaniidnaidris anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT gulshanamingilkar anewsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT saqibhakak newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT saqibhakak newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT amirrudinkamsin newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT palaiahnakoteshivakumara newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT mohdyamaniidnaidris newsplitbasedsearchingforexactpatternmatchingfornaturaltexts
AT gulshanamingilkar newsplitbasedsearchingforexactpatternmatchingfornaturaltexts