Computing generalized cophenetic distances under all Lp norms: A near-linear time algorithmic framework.
The cophenetic distance is a well-established metric in biology used to compare pairs of trees represented in a vector format. This distance was introduced by Cardona and his co-authors, building on the foundational work of Sokal and Rohlf, which dates back over 60 years. It is widely recognized for...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-06-01
|
| Series: | PLoS Computational Biology |
| Online Access: | https://doi.org/10.1371/journal.pcbi.1013069 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The cophenetic distance is a well-established metric in biology used to compare pairs of trees represented in a vector format. This distance was introduced by Cardona and his co-authors, building on the foundational work of Sokal and Rohlf, which dates back over 60 years. It is widely recognized for its versatility since it can analyze trees with edge weights using various vector norms. However, when comparing large-scale trees, the quadratic runtime of the current best-known (i.e., naïve) algorithm for computing the cophenetic distance can become prohibitive. Recently, a new algorithmic framework with near-linear time complexity has been developed to calculate the distances of a generalized class of cophenetic distances, which are derived from the work of Sokal and Rohlf. This improvement not only allows the cophenetic distance to be utilized in large-scale studies but also enhances the versatility of these studies by incorporating generalized variants of the cophenetic distance. However, the framework is limited to applying only the L1 and L2 vector norms, which significantly restricts the versatility of generalized cophenetic distances in large-scale applications. To address this limitation, we present a near-linear time algorithmic framework for computing the generalized cophenetic distances across all Lp vector norms. In our scalability study, we showcase the practical performance of our unrestricted algorithmic framework. Furthermore, we investigate the applicability of the generalized cophenetic distances by analyzing the distributions of key components of these distances under various vector norms. |
|---|---|
| ISSN: | 1553-734X 1553-7358 |