Approximate String Matching with Non-Overlapping Adjacent Unbalanced Translocations

In this paper, we investigate the <i>approximate string matching problem</i> when the allowed edit operations are <i>non-overlapping unbalanced translocations of adjacent factors</i>. This kind of edit operation takes place when two adjacent substrings of the text swap, resul...

Full description

Saved in:

Bibliographic Details
Main Authors:	Domenico Cantone, Simone Faro, Arianna Pavone
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Mathematics
Subjects:	approximate string matching unbalanced translocations non-overlapping adjacent factors edit operations text algorithms chromosomal rearrangements
Online Access:	https://www.mdpi.com/2227-7390/13/13/2103
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we investigate the <i>approximate string matching problem</i> when the allowed edit operations are <i>non-overlapping unbalanced translocations of adjacent factors</i>. This kind of edit operation takes place when two adjacent substrings of the text swap, resulting in a modified string. The two involved substrings are allowed to be of different lengths. Such large-scale modifications of strings have various applications, notably in fields such as computational biology and genomics, where structural rearrangements play a key role. However, despite their central role in many fields of text processing, little attention has been devoted to the problem of matching strings allowing for this kind of edit operation. In this paper, we present three algorithms for solving the problem, all of them with an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><mi>n</mi><msup><mi>m</mi><mn>3</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> worst-case and an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>m</mi><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula>-space complexity, where <i>m</i> and <i>n</i> are the length of the pattern and of the text, respectively. Specifically, our first algorithm is based on the dynamic programming approach. Our second solution improves the previous one by making use of the Directed Acyclic Word Graph of the pattern. Finally, our third algorithm is based on an alignment procedure. We also show that under the assumptions of equiprobability and independence of characters, our second algorithm has an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><mi>n</mi><msubsup><mo form="prefix">log</mo><mi>σ</mi><mn>2</mn></msubsup><mi>m</mi><mo>)</mo></mrow></semantics></math></inline-formula> average time complexity for an alphabet of size <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>σ</mi><mo>≥</mo><mn>4</mn></mrow></semantics></math></inline-formula>.
ISSN:	2227-7390

Approximate String Matching with Non-Overlapping Adjacent Unbalanced Translocations

Similar Items