- + References
The 1000 Genomes Project (1000GP) set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. In the final phase of the project (Phase 3), the consortium published the reconstruction of the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping (The 1000 Genomes Project Consortium 2015). With 84.7 million single nucleotide polymorphisms (SNPs), the resource is estimated to include >99% of SNP variants with a frequency of >1% for a variety of ancestries.
We have created a pipeline for the population genomics analysis of the 1000GP data, by removing inbred individuals (Gazal et al. 2015), removing non-accessible nucleotides (see Help -> Genome accessibility), and adding an outgroup species (chimpanzee version panTro4). At the present moment, a large battery of variation, divergence, linkage disequilibrium and tests of neutrality are available for all 26 human 1000GP populations, both analyzed in sliding windows along the human genome (see Help -> Tracks description), and for each gene separately (see Help -> Integrative MKT).
The 26 analyzed 1000GP populations (The 1000 Genomes Project Consortium 2015) are listed below:
Population Description | Population Code | Metapopulation (Analysis Panel) | Number of samples | |
Utah residents (CEPH) with Northern and Western European ancestry | CEU | EUR | 99 | |
British in England and Scotland | GBR | EUR | 91 | |
Finnish in Finland | FIN | EUR | 99 | |
Iberian Populations in Spain | IBS | EUR | 107 | |
Toscani in Italia | TSI | EUR | 107 | |
Esan in Nigeria | ESN | AFR | 99 | |
Gambian in Western Division, Mandinka | GWD | AFR | 113 | |
Luhya in Webuye, Kenya | LWK | AFR | 99 | |
Mende in Sierra Leone | MSL | AFR | 85 | |
Yoruba in Ibadan, Nigeria | YRI | AFR | 108 | |
African Caribbean in Barbados | ACB | AFR | 96 | |
People with African Ancestry in Southwest USA | ASW | AFR | 61 | |
Chinese Dai in Xishuangbanna, China | CDX | EAS | 93 | |
Han Chinese in Beijing, China | CHB | EAS | 103 | |
Southern Han Chinese | CHS | EAS | 105 | |
Japanese in Tokyo, Japan | JPT | EAS | 104 | |
Kinh in Ho Chi Minh City, Vietnam | KHV | EAS | 99 | |
Bengali in Bangladesh | BEB | SAS | 86 | |
Gujarati Indians in Houston, TX, USA | GIH | SAS | 103 | |
Indian Telugu in the UK | ITU | SAS | 102 | |
Punjabi in Lahore, Pakistan | PJL | SAS | 96 | |
Sri Lankan Tamil in the UK | STU | SAS | 102 | |
Colombians in Medellin, Colombia | CLM | AMR | 94 | |
People with Mexican Ancestry in Los Angeles, CA, USA | MXL | AMR | 64 | |
Peruvians in Lima, Peru | PEL | AMR | 85 | |
Puerto Ricans in Puerto Rico | PUR | AMR | 104 |
Note that population genetics statistics are not calculated for metapopulations nor across the whole 1000GP dataset. The reason is that population genetics statistics all assume that analyzed genomes are a random sample of a non-stratified population, and the aggregation of genomes from multiple diverse populations around the globe does not satisfy any of the two assumptions.
- Gazal, S. et al. (2015). High level of inbreeding in final phase of 1000 Genomes Project. Sci. Rep., 5: 17453. [link]- The 1000 Genomes Project Consortium, (2015). A global reference for human genetic variation. Nature, 526: 68-74. [link]