Comprehensive duck DNA fingerprinting based on machine learning for breed identification

Duck is one of the most widely distributed waterfowl in the world, with more than 6 billion of them farmed annually in the world, and has great economic and ecological value. Amidst mounting global prioritization of duck genetic resource exploration and prevalent inter-varietal hybridization events,...

Full description

Saved in:
Bibliographic Details
Main Authors: DengKe Yan, Feng Zhu, HaoLin Wang, ZhongTao Yin, ZhuoCheng Hou
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Poultry Science
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0032579125006029
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061602961752064
author DengKe Yan
Feng Zhu
HaoLin Wang
ZhongTao Yin
ZhuoCheng Hou
author_facet DengKe Yan
Feng Zhu
HaoLin Wang
ZhongTao Yin
ZhuoCheng Hou
author_sort DengKe Yan
collection DOAJ
description Duck is one of the most widely distributed waterfowl in the world, with more than 6 billion of them farmed annually in the world, and has great economic and ecological value. Amidst mounting global prioritization of duck genetic resource exploration and prevalent inter-varietal hybridization events, the traditional breed identification methods are difficult to address actual requirements, restricting the utilization, development and protection of duck germplasm resources. This study aims to develop an accurate, efficient, and scalable duck DNA fingerprinting system based on genomic technologies and machine learning methods to address the urgent need for breed identification tools in high-quality agricultural production and ecological protection. Our study aims to construct a global duck DNA fingerprint map based on genomic data and machine learning algorithm, develop an accurate, efficient and scalable duck DNA fingerprinting identification tool, and solve the urgent need for breed identification tools for high-quality agricultural production and ecological protection. In this study, we obtained the whole genome resequencing data of 196 duck individuals from 16 breeds and constructed a high-density duck population variation dataset containing 2,360,039 SNPs. Four characteristic molecular marker selection methods (Delta, Average Euclidean Distance (AED), Polymorphism Information Content (PIC), and Fixation Index (FST)) and four machine learning classification algorithms (Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes (NB)) were tested. The results showed that AED indicator had the best performance in selecting SNP markers in ducks, and the classification accuracy was the highest (98.38 %) when 2000 SNP sites were selected. SVM algorithm showed the best classification performance in ducks, with the classification accuracy of 98.71 % and the running time was within 70 seconds. We constructed the duck DNA fingerprinting maps of 16 breeds based on the AED indicator and SVM algorithm, each containing 200 SNP markers. We have also developed a user-friendly and efficient duck DNA fingerprinting identification tool that could achieve identification of large-scale genetic resources, and also collect new duck genetic resources and use them for breed identification. Our results provide advanced method and utility tool support for identifying and utilizing world-wide duck germplasm resources and a reference for the development of DNA fingerprinting maps for other major agricultural animals.
format Article
id doaj-art-552eb51020ce4fac943f689bc0d86a75
institution DOAJ
issn 0032-5791
language English
publishDate 2025-08-01
publisher Elsevier
record_format Article
series Poultry Science
spelling doaj-art-552eb51020ce4fac943f689bc0d86a752025-08-20T02:50:09ZengElsevierPoultry Science0032-57912025-08-01104810535910.1016/j.psj.2025.105359Comprehensive duck DNA fingerprinting based on machine learning for breed identificationDengKe Yan0Feng Zhu1HaoLin Wang2ZhongTao Yin3ZhuoCheng Hou4National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, PR ChinaNational Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, PR ChinaNational Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, PR ChinaFrontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing 100193, PR China; National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, PR China; Corresponding authors.Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing 100193, PR China; National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, PR China; Corresponding authors.Duck is one of the most widely distributed waterfowl in the world, with more than 6 billion of them farmed annually in the world, and has great economic and ecological value. Amidst mounting global prioritization of duck genetic resource exploration and prevalent inter-varietal hybridization events, the traditional breed identification methods are difficult to address actual requirements, restricting the utilization, development and protection of duck germplasm resources. This study aims to develop an accurate, efficient, and scalable duck DNA fingerprinting system based on genomic technologies and machine learning methods to address the urgent need for breed identification tools in high-quality agricultural production and ecological protection. Our study aims to construct a global duck DNA fingerprint map based on genomic data and machine learning algorithm, develop an accurate, efficient and scalable duck DNA fingerprinting identification tool, and solve the urgent need for breed identification tools for high-quality agricultural production and ecological protection. In this study, we obtained the whole genome resequencing data of 196 duck individuals from 16 breeds and constructed a high-density duck population variation dataset containing 2,360,039 SNPs. Four characteristic molecular marker selection methods (Delta, Average Euclidean Distance (AED), Polymorphism Information Content (PIC), and Fixation Index (FST)) and four machine learning classification algorithms (Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes (NB)) were tested. The results showed that AED indicator had the best performance in selecting SNP markers in ducks, and the classification accuracy was the highest (98.38 %) when 2000 SNP sites were selected. SVM algorithm showed the best classification performance in ducks, with the classification accuracy of 98.71 % and the running time was within 70 seconds. We constructed the duck DNA fingerprinting maps of 16 breeds based on the AED indicator and SVM algorithm, each containing 200 SNP markers. We have also developed a user-friendly and efficient duck DNA fingerprinting identification tool that could achieve identification of large-scale genetic resources, and also collect new duck genetic resources and use them for breed identification. Our results provide advanced method and utility tool support for identifying and utilizing world-wide duck germplasm resources and a reference for the development of DNA fingerprinting maps for other major agricultural animals.http://www.sciencedirect.com/science/article/pii/S0032579125006029SNP selectionBreed assignmentMachine learningDNA fingerprinting
spellingShingle DengKe Yan
Feng Zhu
HaoLin Wang
ZhongTao Yin
ZhuoCheng Hou
Comprehensive duck DNA fingerprinting based on machine learning for breed identification
Poultry Science
SNP selection
Breed assignment
Machine learning
DNA fingerprinting
title Comprehensive duck DNA fingerprinting based on machine learning for breed identification
title_full Comprehensive duck DNA fingerprinting based on machine learning for breed identification
title_fullStr Comprehensive duck DNA fingerprinting based on machine learning for breed identification
title_full_unstemmed Comprehensive duck DNA fingerprinting based on machine learning for breed identification
title_short Comprehensive duck DNA fingerprinting based on machine learning for breed identification
title_sort comprehensive duck dna fingerprinting based on machine learning for breed identification
topic SNP selection
Breed assignment
Machine learning
DNA fingerprinting
url http://www.sciencedirect.com/science/article/pii/S0032579125006029
work_keys_str_mv AT dengkeyan comprehensiveduckdnafingerprintingbasedonmachinelearningforbreedidentification
AT fengzhu comprehensiveduckdnafingerprintingbasedonmachinelearningforbreedidentification
AT haolinwang comprehensiveduckdnafingerprintingbasedonmachinelearningforbreedidentification
AT zhongtaoyin comprehensiveduckdnafingerprintingbasedonmachinelearningforbreedidentification
AT zhuochenghou comprehensiveduckdnafingerprintingbasedonmachinelearningforbreedidentification