Approximate String Matching with Non-Overlapping Adjacent Unbalanced Translocations
In this paper, we investigate the <i>approximate string matching problem</i> when the allowed edit operations are <i>non-overlapping unbalanced translocations of adjacent factors</i>. This kind of edit operation takes place when two adjacent substrings of the text swap, resul...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/13/2103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this paper, we investigate the <i>approximate string matching problem</i> when the allowed edit operations are <i>non-overlapping unbalanced translocations of adjacent factors</i>. This kind of edit operation takes place when two adjacent substrings of the text swap, resulting in a modified string. The two involved substrings are allowed to be of different lengths. Such large-scale modifications of strings have various applications, notably in fields such as computational biology and genomics, where structural rearrangements play a key role. However, despite their central role in many fields of text processing, little attention has been devoted to the problem of matching strings allowing for this kind of edit operation. In this paper, we present three algorithms for solving the problem, all of them with an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><mi>n</mi><msup><mi>m</mi><mn>3</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> worst-case and an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>m</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula>-space complexity, where <i>m</i> and <i>n</i> are the length of the pattern and of the text, respectively. Specifically, our first algorithm is based on the dynamic programming approach. Our second solution improves the previous one by making use of the Directed Acyclic Word Graph of the pattern. Finally, our third algorithm is based on an alignment procedure. We also show that under the assumptions of equiprobability and independence of characters, our second algorithm has an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><mi>n</mi><msubsup><mo form="prefix">log</mo><mi>σ</mi><mn>2</mn></msubsup><mi>m</mi><mo>)</mo></mrow></semantics></math></inline-formula> average time complexity for an alphabet of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>σ</mi><mo>≥</mo><mn>4</mn></mrow></semantics></math></inline-formula>. |
|---|---|
| ISSN: | 2227-7390 |