Contamination Survey of Insect Genomic and Transcriptomic Data

The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species’ sequencing data. Insecta, the most diverse group within A...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiali Zhou, Xinrui Zhang, Yujie Wang, Haoxian Liang, Yuhao Yang, Xiaolei Huang, Jun Deng
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Animals
Subjects:
Online Access:https://www.mdpi.com/2076-2615/14/23/3432
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260727665786880
author Jiali Zhou
Xinrui Zhang
Yujie Wang
Haoxian Liang
Yuhao Yang
Xiaolei Huang
Jun Deng
author_facet Jiali Zhou
Xinrui Zhang
Yujie Wang
Haoxian Liang
Yuhao Yang
Xiaolei Huang
Jun Deng
author_sort Jiali Zhou
collection DOAJ
description The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species’ sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank’s genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it.
format Article
id doaj-art-948a6b3ea5f64e5e9fdf4d50afe0dce9
institution OA Journals
issn 2076-2615
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Animals
spelling doaj-art-948a6b3ea5f64e5e9fdf4d50afe0dce92025-08-20T01:55:34ZengMDPI AGAnimals2076-26152024-11-011423343210.3390/ani14233432Contamination Survey of Insect Genomic and Transcriptomic DataJiali Zhou0Xinrui Zhang1Yujie Wang2Haoxian Liang3Yuhao Yang4Xiaolei Huang5Jun Deng6State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaState Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, ChinaThe rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species’ sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank’s genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it.https://www.mdpi.com/2076-2615/14/23/3432contaminationgenomic/transcriptomic databaseInsectaCOI barcodingsource
spellingShingle Jiali Zhou
Xinrui Zhang
Yujie Wang
Haoxian Liang
Yuhao Yang
Xiaolei Huang
Jun Deng
Contamination Survey of Insect Genomic and Transcriptomic Data
Animals
contamination
genomic/transcriptomic database
Insecta
COI barcoding
source
title Contamination Survey of Insect Genomic and Transcriptomic Data
title_full Contamination Survey of Insect Genomic and Transcriptomic Data
title_fullStr Contamination Survey of Insect Genomic and Transcriptomic Data
title_full_unstemmed Contamination Survey of Insect Genomic and Transcriptomic Data
title_short Contamination Survey of Insect Genomic and Transcriptomic Data
title_sort contamination survey of insect genomic and transcriptomic data
topic contamination
genomic/transcriptomic database
Insecta
COI barcoding
source
url https://www.mdpi.com/2076-2615/14/23/3432
work_keys_str_mv AT jializhou contaminationsurveyofinsectgenomicandtranscriptomicdata
AT xinruizhang contaminationsurveyofinsectgenomicandtranscriptomicdata
AT yujiewang contaminationsurveyofinsectgenomicandtranscriptomicdata
AT haoxianliang contaminationsurveyofinsectgenomicandtranscriptomicdata
AT yuhaoyang contaminationsurveyofinsectgenomicandtranscriptomicdata
AT xiaoleihuang contaminationsurveyofinsectgenomicandtranscriptomicdata
AT jundeng contaminationsurveyofinsectgenomicandtranscriptomicdata