Measuring quality of DNA sequence data via degradation.

We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that t...

Full description

Saved in:
Bibliographic Details
Main Authors: Alan F Karr, Jason Hauzel, Adam A Porter, Marcel Schaefer
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0271970&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850023507601129472
author Alan F Karr
Jason Hauzel
Adam A Porter
Marcel Schaefer
author_facet Alan F Karr
Jason Hauzel
Adam A Porter
Marcel Schaefer
author_sort Alan F Karr
collection DOAJ
description We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.
format Article
id doaj-art-be2a08f11c7c46ccab3e23161ca7d5c8
institution DOAJ
issn 1932-6203
language English
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-be2a08f11c7c46ccab3e23161ca7d5c82025-08-20T03:01:21ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01178e027197010.1371/journal.pone.0271970Measuring quality of DNA sequence data via degradation.Alan F KarrJason HauzelAdam A PorterMarcel SchaeferWe formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0271970&type=printable
spellingShingle Alan F Karr
Jason Hauzel
Adam A Porter
Marcel Schaefer
Measuring quality of DNA sequence data via degradation.
PLoS ONE
title Measuring quality of DNA sequence data via degradation.
title_full Measuring quality of DNA sequence data via degradation.
title_fullStr Measuring quality of DNA sequence data via degradation.
title_full_unstemmed Measuring quality of DNA sequence data via degradation.
title_short Measuring quality of DNA sequence data via degradation.
title_sort measuring quality of dna sequence data via degradation
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0271970&type=printable
work_keys_str_mv AT alanfkarr measuringqualityofdnasequencedataviadegradation
AT jasonhauzel measuringqualityofdnasequencedataviadegradation
AT adamaporter measuringqualityofdnasequencedataviadegradation
AT marcelschaefer measuringqualityofdnasequencedataviadegradation