AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]

Amplicon sequencing (AmpSeq) is a methodology that targets specific genomic regions of interest for polymerase chain reaction (PCR) amplification so that they can be sequenced to a high depth of coverage. Amplicons are typically chosen to be highly polymorphic, usually with several highly informativ...

Full description

Saved in:
Bibliographic Details
Main Authors: Jacob E. Munro, Melanie Bahlo, Jiru Han
Format: Article
Language:English
Published: F1000 Research Ltd 2023-03-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/12-327/v1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846159790365999104
author Jacob E. Munro
Melanie Bahlo
Jiru Han
author_facet Jacob E. Munro
Melanie Bahlo
Jiru Han
author_sort Jacob E. Munro
collection DOAJ
description Amplicon sequencing (AmpSeq) is a methodology that targets specific genomic regions of interest for polymerase chain reaction (PCR) amplification so that they can be sequenced to a high depth of coverage. Amplicons are typically chosen to be highly polymorphic, usually with several highly informative, high frequency single nucleotide polymorphisms (SNPs) segregating in an amplicon of 100–200 base pair (bp). This allows high sensitivity detection and quantification of the frequency of each sequence within each sample making it suitable for applications such as low frequency somatic mosaicism detection or minor clone detection in mixed samples. AmpSeq is being increasingly applied to both biological and medical studies, in applications such as cancer, infectious diseases and brain mosaicism studies. Current bioinformatics pipelines for AmpSeq data processing lack downstream analysis, have difficulty distinguishing between true sequences and PCR sequencing errors and artifacts, and often require bioinformatic expertise. We present a new R package: AmpSeqR, designed for the processing of deep short-read amplicon sequencing data, with a focus on infectious diseases. The pipeline integrates several existing R packages combining them with newly developed functions to perform optimal filtering of reads to remove noise and improve the accuracy of the detected sequences data, permitting detection of very low frequency clones in mixed samples. The package provides useful functions including data pre-processing, amplicon sequence variants (ASVs) estimation, data post-processing, data visualization, and automatically generates a comprehensive Rmarkdown report that contains all essential results facilitating easy inclusion into reports and publications. AmpSeqR is publicly available at https://github.com/bahlolab/AmpSeqR.
format Article
id doaj-art-01f3d9f48340464c8e0f6dfefc700ebe
institution Kabale University
issn 2046-1402
language English
publishDate 2023-03-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj-art-01f3d9f48340464c8e0f6dfefc700ebe2024-11-23T01:00:00ZengF1000 Research LtdF1000Research2046-14022023-03-0112142271AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]Jacob E. Munro0Melanie Bahlo1https://orcid.org/0000-0001-5132-0774Jiru Han2Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, AustraliaPopulation Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, AustraliaPopulation Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, AustraliaAmplicon sequencing (AmpSeq) is a methodology that targets specific genomic regions of interest for polymerase chain reaction (PCR) amplification so that they can be sequenced to a high depth of coverage. Amplicons are typically chosen to be highly polymorphic, usually with several highly informative, high frequency single nucleotide polymorphisms (SNPs) segregating in an amplicon of 100–200 base pair (bp). This allows high sensitivity detection and quantification of the frequency of each sequence within each sample making it suitable for applications such as low frequency somatic mosaicism detection or minor clone detection in mixed samples. AmpSeq is being increasingly applied to both biological and medical studies, in applications such as cancer, infectious diseases and brain mosaicism studies. Current bioinformatics pipelines for AmpSeq data processing lack downstream analysis, have difficulty distinguishing between true sequences and PCR sequencing errors and artifacts, and often require bioinformatic expertise. We present a new R package: AmpSeqR, designed for the processing of deep short-read amplicon sequencing data, with a focus on infectious diseases. The pipeline integrates several existing R packages combining them with newly developed functions to perform optimal filtering of reads to remove noise and improve the accuracy of the detected sequences data, permitting detection of very low frequency clones in mixed samples. The package provides useful functions including data pre-processing, amplicon sequence variants (ASVs) estimation, data post-processing, data visualization, and automatically generates a comprehensive Rmarkdown report that contains all essential results facilitating easy inclusion into reports and publications. AmpSeqR is publicly available at https://github.com/bahlolab/AmpSeqR.https://f1000research.com/articles/12-327/v1amplicon sequencing data visualization summary report R packageeng
spellingShingle Jacob E. Munro
Melanie Bahlo
Jiru Han
AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
F1000Research
amplicon sequencing
data visualization
summary report
R package
eng
title AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
title_full AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
title_fullStr AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
title_full_unstemmed AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
title_short AmpSeqR: an R package for amplicon deep sequencing data analysis [version 1; peer review: 1 approved, 2 approved with reservations]
title_sort ampseqr an r package for amplicon deep sequencing data analysis version 1 peer review 1 approved 2 approved with reservations
topic amplicon sequencing
data visualization
summary report
R package
eng
url https://f1000research.com/articles/12-327/v1
work_keys_str_mv AT jacobemunro ampseqranrpackageforamplicondeepsequencingdataanalysisversion1peerreview1approved2approvedwithreservations
AT melaniebahlo ampseqranrpackageforamplicondeepsequencingdataanalysisversion1peerreview1approved2approvedwithreservations
AT jiruhan ampseqranrpackageforamplicondeepsequencingdataanalysisversion1peerreview1approved2approvedwithreservations