Clines, clusters, and the effect of study design on the inference of human population structure.

Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the...

Full description

Saved in:
Bibliographic Details
Main Authors: Noah A Rosenberg, Saurabh Mahajan, Sohini Ramachandran, Chengfeng Zhao, Jonathan K Pritchard, Marcus W Feldman
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2005-12-01
Series:PLoS Genetics
Online Access:https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.0010070&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850183319648468992
author Noah A Rosenberg
Saurabh Mahajan
Sohini Ramachandran
Chengfeng Zhao
Jonathan K Pritchard
Marcus W Feldman
author_facet Noah A Rosenberg
Saurabh Mahajan
Sohini Ramachandran
Chengfeng Zhao
Jonathan K Pritchard
Marcus W Feldman
author_sort Noah A Rosenberg
collection DOAJ
description Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables--sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample--on the "clusteredness" of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.
format Article
id doaj-art-e794a51658464be3a8a0a82efed21da6
institution OA Journals
issn 1553-7390
1553-7404
language English
publishDate 2005-12-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj-art-e794a51658464be3a8a0a82efed21da62025-08-20T02:17:24ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042005-12-0116e7010.1371/journal.pgen.0010070Clines, clusters, and the effect of study design on the inference of human population structure.Noah A RosenbergSaurabh MahajanSohini RamachandranChengfeng ZhaoJonathan K PritchardMarcus W FeldmanPreviously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables--sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample--on the "clusteredness" of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.0010070&type=printable
spellingShingle Noah A Rosenberg
Saurabh Mahajan
Sohini Ramachandran
Chengfeng Zhao
Jonathan K Pritchard
Marcus W Feldman
Clines, clusters, and the effect of study design on the inference of human population structure.
PLoS Genetics
title Clines, clusters, and the effect of study design on the inference of human population structure.
title_full Clines, clusters, and the effect of study design on the inference of human population structure.
title_fullStr Clines, clusters, and the effect of study design on the inference of human population structure.
title_full_unstemmed Clines, clusters, and the effect of study design on the inference of human population structure.
title_short Clines, clusters, and the effect of study design on the inference of human population structure.
title_sort clines clusters and the effect of study design on the inference of human population structure
url https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.0010070&type=printable
work_keys_str_mv AT noaharosenberg clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT saurabhmahajan clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT sohiniramachandran clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT chengfengzhao clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT jonathankpritchard clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT marcuswfeldman clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure