Tracks Description

PopHuman contains +1000 tracks, including both variation statistics for the 26 1000GP populations, and reference tracks of the human hg19 reference sequence. Given the large number of tracks available, these can be filtered and selected using the “Select tracks” tool, which is displayed on the top left corner (below the navigation bar). This tool is used to narrow your search in order to finally find and select your track of interest, and this process can be done several times in order to finally get all the desired tracks selected.

The filtering process is normally performed by refining your search using the menu on the left, in which track features are classified according to four major categories: Data selection (by metapopulation and/or population), Variation statistics, Visualization (only one available window size of 10 kb by now), and Reference tracks.

Below, there is a brief description of all the tracks stored in PopHuman, including Variation statistics and Reference tracks:

Variation statistics

1. Frequency-based nucleotide variation

S. Number of segregating sites per site (Nei 1987).
S2. Number of segregating sites per site, excluding unknown nucleotides in the outgroup (Nei 1987).
Pi. Nucleotide diversity: average number of nucleotide differences per site between any two sequences (Jukes and Cantor 1969; Nei and Li 1979; Nei 1987).
theta. Nucleotide polymorphism: proportion of nucleotide sites that are expected to be polymorphic in any suitable sample (Watterson 1975; Tajima 1993, 1996).
nuc_diversity_within. Nucleotide diversity within the population (Hudson, Slatkin and Maddison 1992; Wakeley 1996).
hap_diversity_within. Haplotype diversity within the population (Hudson, Slatkin and Maddison 1992).
Pneu. Number of 4-fold (putatively neutral) segregating sites.
Psel. Number of 0-fold (putatively selected) segregating sites.

2. Divergence-based metrics

Divsites. Number of divergent sites.
D. Proportion of sites with divergent nucleotides.
K. Nucleotide divergence per base pair, corrected by Jukes-Cantor (Jukes and Cantor 1969).
Dneu. Number of 4-fold (putatively neutral) divergent sites.
Dsel. Number of 0-fold (putatively selected) divergent sites.

3. Linkage disequilibrium

Wall_B. Wall’s B summary statistic of linkage disequilibrium (Wall 1999), proportion of pairs of adjacent segregating sites that are congruent, with values approaching 1 indicating extensive congruence among adjacent segregating sites.
Wall_Q. Wall’s Q summary statistic of linkage disequilibrium (Wall 1999), proportion of pairs of adjacent segregating sites that are congruent.
Rozas_ZA. Rozas’s ZA summary statistics (Rozas et al 2001), average of r2 only between adjacent polymorphic sites.
Rozas_ZZ. Rozas’s ZZ summary statistics (Rozas et al 2001), Rozas’s ZA minus Kelly’s ZnS.
Kelly_ZnS. ZnS summary statistic (Kelly 1997), average pairwise r2 value.
iHS. Integrated haplotype score (Voight et al 2006), based on the frequency of alleles in regions of high LD. Only for autosomes.

4. Recombination

recomb_Bherer2017_females. Recombination estimates (cM/Mb) from the refined genetic map by Bherer et al. 2017, which collects recombination events from six recent studies of human pedigrees, pertaining to a total of 104,246 informative meioses. Females map.
recomb_Bherer2017_males. Recombination estimates (cM/Mb) from the refined genetic map by Bherer et al. 2017, which collects recombination events from six recent studies of human pedigrees, pertaining to a total of 104,246 informative meioses. Males map.
recomb_Bherer2017_sexavg. Recombination estimates (cM/Mb) from the refined genetic map by Bherer et al. 2017, which collects recombination events from six recent studies of human pedigrees, pertaining to a total of 104,246 informative meioses. Values from the females/males maps are averaged.
recomb_Genethon_females_1Mb. Genethon genetic map based on 5,264 microsatellites for 8 CEPH families consisting of 134 individuals with 186 meioses. Females map.
recomb_Genethon_males_1Mb. Genethon genetic map based on 5,264 microsatellites for 8 CEPH families consisting of 134 individuals with 186 meioses. Males map.
recomb_Genethon_sexavg_1Mb. Genethon genetic map based on 5,264 microsatellites for 8 CEPH families consisting of 134 individuals with 186 meioses. Values from the females/males maps are averaged.
recomb_Marshfield_females_1Mb. Marshfield genetic map based on 8,325 short tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134 individuals with 186 meioses. Females map.
recomb_Marshfield_males_1Mb. Marshfield genetic map based on 8,325 short tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134 individuals with 186 meioses. Males map.
recomb_Marshfield_sexavg_1Mb. Marshfield genetic map based on 8,325 short tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134 individuals with 186 meioses. Values from the females/males maps are averaged.
recomb_deCODE_females_1Mb. deCODE genetic map based on 5,136 microsatellite markers for 146 families with a total of 1,257 meiotic events. Females map.
recomb_deCODE_males_1Mb. deCODE genetic map based on 5,136 microsatellite markers for 146 families with a total of 1,257 meiotic events. Males map.
recomb_deCODE_sexavg_1Mb. deCODE genetic map based on 5,136 microsatellite markers for 146 families with a total of 1,257 meiotic events. Values from the females/males maps are averaged.

5. Selection tests based on SFS and/or variability

Tajima_D. Tajima's D test statistic (Tajima 1989), based on the differences between the number of segregating sites and the average number of nucleotide differences.
FuLi_F. Fu & Li's F test statistic (Fu and Li 1993), number of derived nucleotide variants observed only once in a sample with the mean pairwise difference between sequences.
FuLi_D. Fu & Li's D test statistic (Fu and Li 1993), number of derived nucleotide variants observed only once in a sample with the total number of derived nucleotide variants.
FayWu_H. Fay & Wu’s H test statistic (Fay and Wu 2000), number of derived nucleotide variants at low and high frequencies with the number of variants at intermediate frequencies.
Zeng_E. Zeng’s E test statistic (Zeng et al 2006), difference between θL and θW, sensitive to changes in high-frequency variants.
Fst Fst statistic (Hudson et al. 1992), measures average levels of gene flow based on allele frequencies under the infinite-sites model.

6. Selection tests based on the MKT

NI. Neutrality index (Rand and Kann 1996), which summarizes the four values in an McDonald and Kreitman test (McDonald and Kreitman 1991) table as a ratio of ratios, computed as NI = (Psel/Pneu) / (Dsel/Dneu).
alpha. Proportion of substitutions that are adaptive (Charlesworth 1994; Smith and Eyre-Walker 2002), based on the McDonald and Kreitman test (McDonald and Kreitman 1991), which compares the amount of variation within species to the divergence between species at two types of site: synonymous and nonsynonymous sites. The test assumes that all synonymous mutations are neutral and that nonsynonymous mutations are either strongly deleterious, neutral, or strongly advantageous. For the calculation of this track, four-fold degenerate sites were used as synonymous (neutral) sites and zero-fold degenerate sites as nonsynonymous (putatively adaptive) sites, as alpha = 1 - ((Psel/Pneu) / (Dsel/Dneu)).
DoS. Direction of Selection (Stoletzki and Eyre-Walker 2011), difference between the proportion of nonsynonymous divergence and nonsynonymous polymorphism, computed as DoS = (Dsel/(Dsel+Dneu)) - (Psel/(Psel+Pneu)).
Fisher1. Fisher exact test p-value (Fisher 1922) for the McDonald and Kreitman test (McDonald and Kreitman 1991) 2x2 contingency table containing Dsel, Dneu, Psel and Pneu estimates, used to determine the significance of the MK test.
Pneu_less5. Number of 4-fold (putatively neutral) segregating sites with MAF<5% (Mackay et al 2012).
Pneu_more5. Number of 4-fold (putatively neutral) segregating sites with MAF>5% (Mackay et al 2012).
Psel_less5. Number of 0-fold (putatively selected) segregating sites with MAF<5% (Mackay et al 2012).
Psel_more5. Number of 0-fold (putatively selected) segregating sites with MAF>5% (Mackay et al 2012).
Psel_neutral_less5. Fraction of 0-fold segregating sites with DAF < 5% that are neutral, computed as Psel_neutral_less5 = (Psel x Pneu_less5/Pneu)) (Mackay et al 2012).
Psel_neutral. Fraction of new mutations that are neutral, calculated after removing the excess of sites at MAF<5% due to slightly deleterious mutations, calculated as Psel_neutral = Psel_neutral_less5 + Psel_more5 (Mackay et al 2012).
Psel_weak. Fraction of new mutations that are weakly deleterious and segregate at MAF<5%, computed as Psel_weak = Psel_less5 – Psel_neutral_less5 (Mackay et al 2012).
alpha_cor. Fraction of new mutations that are adaptive, calculated after removing slightly deleterious mutations as alpha_cor = 1-(Psel_neutral/Pneu)*(Dneu/Dsel) (Charlesworth 1994; Mackay et al 2012).
Fisher2. Fisher exact test p-value (Fisher 1922) for the McDonald and Kreitman test (McDonald and Kreitman 1991) 2x2 contingency table containing Dsel_neutral, Dneu, Psel and Pneu estimates, used to determine the significance of the MK test.

Help -> Integrative MKT

Reference tracks

These tracks have been obtained from the UCSC Genome Browser.

1. Sequencing and annotation

Gene annotations. Gene annotations in the human hg19 reference genome. This track is a set of gene predictions based on data from RefSeq, GenBank, CCDS, Rfam, and the tRNA Genes track. It includes both protein-coding genes and non-coding RNA genes. Annotations in this track are linked to the NCBI and UCSC databases. [Track information]
Reference Sequence. Reference sequence of the human hg19 genome. [Track information]
Alignability of 36mers by GEM from ENCODE/CRG(Guigo). Measures how often the sequence found at a particular location (36mers) aligns within the whole genome. It tolerates up to 2 mismatches. Ranges from 0 to 1. [Track information]
Gaps. Gaps in the assembly represented as black boxes. [Track information]
Mappability DAC Blacklisted Regions from ENCODE/DAC(Kundaje). Identifies regions of the reference genome that are troublesome for high throughput sequencing aligners. Troubled regions may be due to repetitive elements or other anomalies. [Track information]
Uniqueness of 35bp Windows from ENCODE/OpenChrom(Duke). Measures sequence uniqueness throughout the reference genome. Ranges from 0 to 1. [Track information]

2. Regulation

Conserved Transcription Factor Binding Sites (TFBS). This track contains the location and score of TFBSs conserved in the human/mouse/rat alignment. [Track information]
CpG Islands. This track shows CpG islands that are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. CpG islands in repeats are masked. [Track information]
ORegAnno. This track displays literature-curated regulatory regions, transcription factor binding sites, and regulatory polymorphisms from ORegAnno (Open Regulatory Annotation). [Track information]
Vista Enhancers. The VISTA Enhancer Browser identifies distant-acting transcriptional enhancers in the human genome by coupling the identification of evolutionary conserved non-coding sequences with a moderate throughput mouse transgenesis enhancer assay. [Track information]

3. Comparative genomics

100 vertebrates Basewise Conservation by PhyloP. This track shows multiple alignments of 100 vertebrate species and measurements of evolutionary conservation using phyloP from the PHAST package, for all species. [Track information]
100 vertebrates Conserved Elements (phastConsElements100way). This track shows the conserved elements obtained using PhastCons. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. [Track information]
100 vertebrates conservation by PhastCons. This track shows multiple alignments of 100 vertebrate species and measurements of evolutionary conservation using PhastCons from the PHAST package, for all species. [Track information]
Genomic Evolutionary Rate Profiling (GERP). GERP is a method for producing position-specific estimates of evolutionary constraint using maximum likelihood evolutionary rate estimation. It also discovers "constrained elements" where multiple positions combine to give a signal that is indicative of a putative functional element; this track shows the position-specific scores only, not the element predictions. [Track information]

4. Variation

1000 Genomes Project Phase 3 Paired-end Accessible Regions - Pilot Criteria. This track shows which genome regions are more or less accessible to next generation sequencing methods that use short, paired-end reads. Pilot stringency regions cover 94.5% of non-N bases in the genome. [Track information]
1000 Genomes Project Phase 3 Paired-end Accessible Regions - Strict Criteria. This track shows which genome regions are more or less accessible to next generation sequencing methods that use short, paired-end reads. Strict regions cover 75.5% (76.9% on autosomes). Each site meeting the Strict criteria also passes the Pilot criteria. [Track information]
DGV Struct Var Database of Genomic Variants: Structural Var Regions (CNV, Inversion, In/del). This track displays copy number variants (CNVs), insertions/deletions (InDels), inversions and inversion breakpoints annotated by the Database of Genomic Variants (DGV), which contains genomic variations observed in healthy individuals. [Track information]
Simple Nucleotide Polymorphisms (dbSNP 147). This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels), from dbSNP build 147. [Track information]

5. Repeats

Repeating Elements: RepeatMasker. This track shows a detailed annotation of the repeats that are present in the query sequence. [Track information]
Segmental Dups. This track shows regions detected as putative genomic duplications (>1 kb, >90% similar) within the golden path. [Track information]
Simple Tandem Repeats (STRs). This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF), which is specialized for this purpose. [Track information]

+ References

Charlesworth, B. (1994) The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63: 213-27. [link]
Fay, J.C. and C.-I. Wu (2000). Hitchhiking under positive Darwinian selection. Genetics 155: 1405-1413 [link]
Fisher, R. (1922). On the interpretation of chi-square from contingency tables, and the calculation of P. Journal of the Royal Stat. Soc. 85: 87-94.
Fu, Y. X. and Li, W. H. (1993) Statistical test of neutrality of mutations. Genetics 133: 693-709. [link]
Hill, W. G. and Robertson, A. (1968) Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226-231. [link]
Hudson, R. R., et al. (1992) Estimation of levels of gene flow from DNA sequence data. Genetics 1992 132: 583-9. [link]
Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules, pp. 21-32 in Mammalian protein metabolism>, edited by H. N. Munro. Academic Press, New York.
Kelly, J. K. (1997) A test of neutrality based on interlocus associations. Genetics 146: 1197-1206. [link]
Kong, A., et al. (2010) Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467: 1099-103. [link]
Mackay, T. F. et al. (2012) The Drosophila melanogaster Genetic Reference Panel. Nature 482: 173-8. [link]
McDonald, J. H. and Kreitman, M. (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652-654. [link]
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New York.
Nei, M. and Li W.H. (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci U S A 76, 5269–5273. [link]
Rand D. M., Kann L. M. (1996) Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Mol. Biol. Evol. 13: 735–748. [link]
Rozas, J., M. Gullaud, G. Blandin, and M. Aguade (2001). DNA variation at the rp49 gene region of Drosophila simulans: evolutionary inferences from an unusual haplotype structure. Genetics 158: 1147-1155. [link]
Smith, N. G. and Eyre-Walker, A. (2002) Adaptive protein evolution in Drosophila. Nature 415: 1022-4. [link]
Stoletzki, N. and Eyre-Walker, A. (2011) Estimation of the Neutrality Index. Mol. Biol. Evol. 28: 63-70. [link]
Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585-595. [link]
Tajima, F. (1993) Mesurement of DNA polymorphism. In: Mechanisms of molecular evolution: introduction to molecular paleopopulation biology, Takahata N. and Clark A. G. (Ed.). Sinauer Associates Inc., Suderland, MA.
Tajima, F. (1996) The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics 143: 1457-1465. [link]
Voight,B.F., Kudaravalli,S., Wen,X. and Pritchard,J.K. (2006) A map of recent positive selection in the human genome. PLoS Biol, 4, e72. [link]
Wakeley, J. (1996). The Variance of Pairwise Nucleotide Differences in Two Populations with Migration. Theor. Popul. Biol. 49: 39-57. [link]
Wall, J.(1999). Recombination and the power of statistical tests of neutrality. Genet Res 74: 65-79[link]
Watterson G. A. (1975) On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276. [link]
Zeng, K., Y.-X. Fu, S. Shi, and C.-I. Wu (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174: 1431-1439. [link]