A new alignment-free method: K-mer Subsequence Natural Vector (K-mer SNV) for classification of fungi

Abstract As eukaryotic organisms, fungi play a pivotal role within ecosystems and exert profound influences on agriculture, the pharmaceutical industry, and human health. The classification of fungi in databases has emerged as a crucial and complex issue in the field of biology. In this study, by le...

Full description

Saved in:
Bibliographic Details
Main Authors: Lily He, Mochao Huang, Gulinisha Yiming, Yi Zhu, Ruowei Liu, Jinghan Chen, Stephen S. T. Yau
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06152-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract As eukaryotic organisms, fungi play a pivotal role within ecosystems and exert profound influences on agriculture, the pharmaceutical industry, and human health. The classification of fungi in databases has emerged as a crucial and complex issue in the field of biology. In this study, by leveraging the local distribution of k-mer in nucleotide sequences, we introduce a novel alignment-free method, denoted as k-mer SNV, to address this challenge. On a large fungi dataset including 120,140 sequences, our innovative approach has achieved remarkable success in predicting the taxonomic labels of fungi across six hierarchical taxonomic levels: phylum (99.52%), class (98.17%), order (97.20%), family (96.11%), genus (94.14%), and species (93.32%). The approach is also evaluated on the common Taxxi benchmark dataset. Based on these results, it has been convincingly demonstrated that the k-mer SNV method exhibits outstanding performance in processing large-scale fungal sequence data.
ISSN:1471-2105