The 1000 Genomes Project (1000GP) set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. In the final phase of the project (Phase 3), the consortium published the reconstruction of the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping (The 1000 Genomes Project Consortium 2015). With 84.7 million single nucleotide polymorphisms (SNPs), the resource is estimated to include >99% of SNP variants with a frequency of >1% for a variety of ancestries.

We have created a pipeline for the population genomics analysis of the 1000GP data, by removing inbred individuals (Gazal et al. 2015), removing non-accessible nucleotides (see Help -> Genome accessibility), and adding an outgroup species (chimpanzee version panTro4). At the present moment, a large battery of variation, divergence, linkage disequilibrium and tests of neutrality are available for all 26 human 1000GP populations, both analyzed in sliding windows along the human genome (see Help -> Tracks description), and for each gene separately (see Help -> Integrative MKT).

The 26 analyzed 1000GP populations (The 1000 Genomes Project Consortium 2015) are listed below:

      Population Description Population Code Metapopulation (Analysis Panel) Number of samples
      Utah residents (CEPH) with Northern and Western European ancestry CEU EUR 99
      British in England and Scotland GBR EUR 91
      Finnish in Finland FIN EUR 99
      Iberian Populations in Spain IBS EUR 107
      Toscani in Italia TSI EUR 107
      Esan in Nigeria ESN AFR 99
      Gambian in Western Division, Mandinka GWD AFR 113
      Luhya in Webuye, Kenya LWK AFR 99
      Mende in Sierra Leone MSL AFR 85
      Yoruba in Ibadan, Nigeria YRI AFR 108
      African Caribbean in Barbados ACB AFR 96
      People with African Ancestry in Southwest USA ASW AFR 61
      Chinese Dai in Xishuangbanna, China CDX EAS 93
      Han Chinese in Beijing, China CHB EAS 103
      Southern Han Chinese CHS EAS 105
      Japanese in Tokyo, Japan JPT EAS 104
      Kinh in Ho Chi Minh City, Vietnam KHV EAS 99
      Bengali in Bangladesh BEB SAS 86
      Gujarati Indians in Houston, TX, USA GIH SAS 103
      Indian Telugu in the UK ITU SAS 102
      Punjabi in Lahore, Pakistan PJL SAS 96
      Sri Lankan Tamil in the UK STU SAS 102
      Colombians in Medellin, Colombia CLM AMR 94
      People with Mexican Ancestry in Los Angeles, CA, USA MXL AMR 64
      Peruvians in Lima, Peru PEL AMR 85
      Puerto Ricans in Puerto Rico PUR AMR 104

Note that population genetics statistics are not calculated for metapopulations nor across the whole 1000GP dataset. The reason is that population genetics statistics all assume that analyzed genomes are a random sample of a non-stratified population, and the aggregation of genomes from multiple diverse populations around the globe does not satisfy any of the two assumptions.

+ References

- Gazal, S. et al. (2015). High level of inbreeding in final phase of 1000 Genomes Project. Sci. Rep., 5: 17453. [link]
- The 1000 Genomes Project Consortium, (2015). A global reference for human genetic variation. Nature, 526: 68-74. [link]