Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation

Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis an...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoqiu Song, Saima Sultana Tithi, Connor Brown, Frank O. Aylward, Roderick Jensen, Liqing Zhang
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/18515.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544135381614592
author Haoqiu Song
Saima Sultana Tithi
Connor Brown
Frank O. Aylward
Roderick Jensen
Liqing Zhang
author_facet Haoqiu Song
Saima Sultana Tithi
Connor Brown
Frank O. Aylward
Roderick Jensen
Liqing Zhang
author_sort Haoqiu Song
collection DOAJ
description Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform depth of coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence. Results show that Virseqimprover has good performances on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge.
format Article
id doaj-art-69d80c90df8e46debdad71a13ff02a06
institution Kabale University
issn 2167-8359
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj-art-69d80c90df8e46debdad71a13ff02a062025-01-12T15:05:09ZengPeerJ Inc.PeerJ2167-83592025-01-0113e1851510.7717/peerj.18515Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotationHaoqiu Song0Saima Sultana Tithi1Connor Brown2Frank O. Aylward3Roderick Jensen4Liqing Zhang5Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of AmericaDepartment of Cell & Molecular Biology, St. Jude Children’s Research Hospital, Memphis, TN, United States of AmericaDepartment of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of AmericaDepartment of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of AmericaDepartment of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of AmericaDepartment of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of AmericaDespite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform depth of coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence. Results show that Virseqimprover has good performances on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge.https://peerj.com/articles/18515.pdfMetagenomicsViral genome assemblyViral metagenomics
spellingShingle Haoqiu Song
Saima Sultana Tithi
Connor Brown
Frank O. Aylward
Roderick Jensen
Liqing Zhang
Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
PeerJ
Metagenomics
Viral genome assembly
Viral metagenomics
title Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
title_full Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
title_fullStr Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
title_full_unstemmed Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
title_short Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation
title_sort virseqimprover an integrated pipeline for viral contig error correction extension and annotation
topic Metagenomics
Viral genome assembly
Viral metagenomics
url https://peerj.com/articles/18515.pdf
work_keys_str_mv AT haoqiusong virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation
AT saimasultanatithi virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation
AT connorbrown virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation
AT frankoaylward virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation
AT roderickjensen virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation
AT liqingzhang virseqimproveranintegratedpipelineforviralcontigerrorcorrectionextensionandannotation