Jun Z Li

Stanford University, Stanford, CA, USA

Are you Jun Z Li?

Claim your profile

Publications (17)227.16 Total impact

  • Article: Genomic patterns of homozygosity in worldwide human populations.
    [show abstract] [hide abstract]
    ABSTRACT: Genome-wide patterns of homozygosity runs and their variation across individuals provide a valuable and often untapped resource for studying human genetic diversity and evolutionary history. Using genotype data at 577,489 autosomal SNPs, we employed a likelihood-based approach to identify runs of homozygosity (ROH) in 1,839 individuals representing 64 worldwide populations, classifying them by length into three classes-short, intermediate, and long-with a model-based clustering algorithm. For each class, the number and total length of ROH per individual show considerable variation across individuals and populations. The total lengths of short and intermediate ROH per individual increase with the distance of a population from East Africa, in agreement with similar patterns previously observed for locus-wise homozygosity and linkage disequilibrium. By contrast, total lengths of long ROH show large interindividual variations that probably reflect recent inbreeding patterns, with higher values occurring more often in populations with known high frequencies of consanguineous unions. Across the genome, distributions of ROH are not uniform, and they have distinctive continental patterns. ROH frequencies across the genome are correlated with local genomic variables such as recombination rate, as well as with signals of recent positive selection. In addition, long ROH are more frequent in genomic regions harboring genes associated with autosomal-dominant diseases than in regions not implicated in Mendelian diseases. These results provide insight into the way in which homozygosity patterns are produced, and they generate baseline homozygosity patterns that can be used to aid homozygosity mapping of genes associated with recessive diseases.
    The American Journal of Human Genetics 08/2012; 91(2):275-92. · 10.60 Impact Factor
  • Article: Genome-wide association studies of quantitatively measured skin, hair, and eye pigmentation in four European populations.
    [show abstract] [hide abstract]
    ABSTRACT: Pigmentation of the skin, hair, and eyes varies both within and between human populations. Identifying the genes and alleles underlying this variation has been the goal of many candidate gene and several genome-wide association studies (GWAS). Most GWAS for pigmentary traits to date have been based on subjective phenotypes using categorical scales. But skin, hair, and eye pigmentation vary continuously. Here, we seek to characterize quantitative variation in these traits objectively and accurately and to determine their genetic basis. Objective and quantitative measures of skin, hair, and eye color were made using reflectance or digital spectroscopy in Europeans from Ireland, Poland, Italy, and Portugal. A GWAS was conducted for the three quantitative pigmentation phenotypes in 176 women across 313,763 SNP loci, and replication of the most significant associations was attempted in a sample of 294 European men and women from the same countries. We find that the pigmentation phenotypes are highly stratified along axes of European genetic differentiation. The country of sampling explains approximately 35% of the variation in skin pigmentation, 31% of the variation in hair pigmentation, and 40% of the variation in eye pigmentation. All three quantitative phenotypes are correlated with each other. In our two-stage association study, we reproduce the association of rs1667394 at the OCA2/HERC2 locus with eye color but we do not identify new genetic determinants of skin and hair pigmentation supporting the lack of major genes affecting skin and hair color variation within Europe and suggesting that not only careful phenotyping but also larger cohorts are required to understand the genetic architecture of these complex quantitative traits. Interestingly, we also see that in each of these four populations, men are more lightly pigmented in the unexposed skin of the inner arm than women, a fact that is underappreciated and may vary across the world.
    PLoS ONE 01/2012; 7(10):e48294. · 4.09 Impact Factor
  • Article: Inference of unexpected genetic relatedness among individuals in HapMap Phase III.
    [show abstract] [hide abstract]
    ABSTRACT: The International Haplotype Map Project (HapMap) has provided an essential database for studies of human population genetics and genome-wide association. Phases I and II of the HapMap project generated genotype data across ∼3 million SNP loci in 270 individuals representing four populations. Phase III provides dense genotype data on ∼1.5 million SNPs, generated by Illumina and Affymetrix platforms in a larger set of individuals. Release 3 of phase III of the HapMap contains 1397 individuals from 11 populations, including 250 of the original 270 phase I and phase II individuals and 1147 additional individuals. Although some known relationships among the phase III individuals have been described in the data release, the genotype data that are currently available provide an opportunity to empirically ascertain previously unknown relationships. We performed a systematic analysis of genetic relatedness and were able not only to confirm the reported relationships, but also to detect numerous additional, previously unidentified pairs of close relatives in the HapMap sample. The inferred relative pairs make it possible to propose standardized subsets of unrelated individuals for use in future studies in which relatedness needs to be clearly defined.
    The American Journal of Human Genetics 10/2010; 87(4):457-64. · 10.60 Impact Factor
  • Article: Detecting simultaneous changepoints in multiple sequences.
    [show abstract] [hide abstract]
    ABSTRACT: We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
    Biometrika 09/2010; 97(3):631-645. · 1.91 Impact Factor
  • Source
    Article: Characterization of X-linked SNP genotypic variation in globally distributed human populations.
    [show abstract] [hide abstract]
    ABSTRACT: The transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples. We found that X chromosomes tend to be more differentiated between human populations than autosomes, with several notable exceptions. Comparisons between genetically distant populations also showed an excess of X-linked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps. The relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection.
    Genome biology 01/2010; 11(1):R10. · 6.63 Impact Factor
  • Source
    Article: Joint estimation of DNA copy number from multiple platforms.
    [show abstract] [hide abstract]
    ABSTRACT: DNA copy number variants (CNVs) are gains and losses of segments of chromosomes, and comprise an important class of genetic variation. Recently, various microarray hybridization-based techniques have been developed for high-throughput measurement of DNA copy number. In many studies, multiple technical platforms or different versions of the same platform were used to interrogate the same samples; and it became necessary to pool information across these multiple sources to derive a consensus molecular profile for each sample. An integrated analysis is expected to maximize resolution and accuracy, yet currently there is no well-formulated statistical method to address the between-platform differences in probe coverage, assay methods, sensitivity and analytical complexity. The conventional approach is to apply one of the CNV detection ('segmentation') algorithms to search for DNA segments of altered signal intensity. The results from multiple platforms are combined after segmentation. Here we propose a new method, Multi-Platform Circular Binary Segmentation (MPCBS), which pools statistical evidence across platforms during segmentation, and does not require pre-standardization of different data sources. It involves a weighted sum of t-statistics, which arises naturally from the generalized log-likelihood ratio of a multi-platform model. We show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on eight HapMap samples that MPCBS achieves improved spatial resolution, detection power and provides a natural consensus across platforms. We also apply the new method to analyze multi-platform data for tumor samples. The R package for MPCBS is registered on R-Forge (http://r-forge.r-project.org/) under project name MPCBS. Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2009; 26(2):153-60. · 5.47 Impact Factor
  • Article: Genome-wide association and meta-analysis of bipolar disorder in individuals of European ancestry.
    [show abstract] [hide abstract]
    ABSTRACT: Bipolar disorder (BP) is a disabling and often life-threatening disorder that affects approximately 1% of the population worldwide. To identify genetic variants that increase the risk of BP, we genotyped on the Illumina HumanHap550 Beadchip 2,076 bipolar cases and 1,676 controls of European ancestry from the National Institute of Mental Health Human Genetics Initiative Repository, and the Prechter Repository and samples collected in London, Toronto, and Dundee. We imputed SNP genotypes and tested for SNP-BP association in each sample and then performed meta-analysis across samples. The strongest association P value for this 2-study meta-analysis was 2.4 x 10(-6). We next imputed SNP genotypes and tested for SNP-BP association based on the publicly available Affymetrix 500K genotype data from the Wellcome Trust Case Control Consortium for 1,868 BP cases and a reference set of 12,831 individuals. A 3-study meta-analysis of 3,683 nonoverlapping cases and 14,507 extended controls on >2.3 M genotyped and imputed SNPs resulted in 3 chromosomal regions with association P approximately 10(-7): 1p31.1 (no known genes), 3p21 (>25 known genes), and 5q15 (MCTP1). The most strongly associated nonsynonymous SNP rs1042779 (OR = 1.19, P = 1.8 x 10(-7)) is in the ITIH1 gene on chromosome 3, with other strongly associated nonsynonymous SNPs in GNL3, NEK4, and ITIH3. Thus, these chromosomal regions harbor genes implicated in cell cycle, neurogenesis, neuroplasticity, and neurosignaling. In addition, we replicated the reported ANK3 association results for SNP rs10994336 in the nonoverlapping GSK sample (OR = 1.37, P = 0.042). Although these results are promising, analysis of additional samples will be required to confirm that variant(s) in these regions influence BP risk.
    Proceedings of the National Academy of Sciences 05/2009; 106(18):7501-6. · 9.68 Impact Factor
  • Article: Signals of recent positive selection in a worldwide sample of human populations.
    [show abstract] [hide abstract]
    ABSTRACT: Genome-wide scans for recent positive selection in humans have yielded insight into the mechanisms underlying the extensive phenotypic diversity in our species, but have focused on a limited number of populations. Here, we present an analysis of recent selection in a global sample of 53 populations, using genotype data from the Human Genome Diversity-CEPH Panel. We refine the geographic distributions of known selective sweeps, and find extensive overlap between these distributions for populations in the same continental region but limited overlap between populations outside these groupings. We present several examples of previously unrecognized candidate targets of selection, including signals at a number of genes in the NRG-ERBB4 developmental pathway in non-African populations. Analysis of recently identified genes involved in complex diseases suggests that there has been selection on loci involved in susceptibility to type II diabetes. Finally, we search for local adaptation between geographically close populations, and highlight several examples.
    Genome Research 04/2009; 19(5):826-37. · 13.61 Impact Factor
  • Article: Comprehensive genomic characterization defines human glioblastoma genes and core pathways.
    [show abstract] [hide abstract]
    ABSTRACT: Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multi-dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas-the most common type of adult brain cancer-and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol-3-OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
    Nature 10/2008; · 36.28 Impact Factor
  • Source
    Article: Comprehensive genomic characterization defines human glioblastoma genes and core pathways
    [show abstract] [hide abstract]
    ABSTRACT: Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multi-dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas—the most common type of adult brain cancer—and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol-3-OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
    Nature 09/2008; 455(7216):1061-1068. · 36.28 Impact Factor
  • Article: Ribosomal mutations cause p53-mediated dark skin and pleiotropic effects.
    [show abstract] [hide abstract]
    ABSTRACT: Mutations in genes encoding ribosomal proteins cause the Minute phenotype in Drosophila and mice, and Diamond-Blackfan syndrome in humans. Here we report two mouse dark skin (Dsk) loci caused by mutations in Rps19 (ribosomal protein S19) and Rps20 (ribosomal protein S20). We identify a common pathophysiologic program in which p53 stabilization stimulates Kit ligand expression, and, consequently, epidermal melanocytosis via a paracrine mechanism. Accumulation of p53 also causes reduced body size and erythrocyte count. These results provide a mechanistic explanation for the diverse collection of phenotypes that accompany reduced dosage of genes encoding ribosomal proteins, and have implications for understanding normal human variation and human disease.
    Nature Genetics 08/2008; 40(8):963-70. · 35.53 Impact Factor
  • Source
    Article: Worldwide human relationships inferred from genome-wide patterns of variation.
    [show abstract] [hide abstract]
    ABSTRACT: Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.
    Science 03/2008; 319(5866):1100-4. · 31.20 Impact Factor
  • Article: Wild-type huntingtin participates in protein trafficking between the Golgi and the extracellular space.
    [show abstract] [hide abstract]
    ABSTRACT: Huntington disease (HD) is an autosomal dominant neurodegenerative disease caused by an expanded CAG trinucleotide repeat in the first exon of the HD gene, which results in a toxic polyglutamine stretch within huntingtin, the protein it encodes. Understanding the normal function of this essential protein is vital to understanding the root of the disease, yet despite more than a decade of investigation, its role in the cell remains elusive. Identifying the subcellular localization of huntingtin and understanding its effects on global gene expression are critical to this endeavor. While most reports agree that huntingtin is predominantly a cytoplasmic protein, conflicting distribution patterns have been demonstrated at the subcellular level. Here, we examine wild-type huntingtin's localization in cultured cells by expressing the full-length human protein tagged with enhanced green fluorescent protein (EGFP) within its unspliced genomic context. In fibrosarcoma and neuroblastoma cells, huntingtin shows discrete punctate, perinuclear localization overlapping largely with the trans-Golgi and cytoplasmic clathrin-coated vesicles, implicating huntingtin in vesicle trafficking. To determine whether huntingtin is involved in trafficking a specific subset of proteins, we measured changes in global transcription levels in embryonic stem cells and neurons lacking huntingtin. Huntingtin null neurons exhibit a significant reduction in transcripts encoding proteins destined for the extracellular space, many of which are components of the extracellular matrix or involved in cellular adhesion, receptor binding and hormone activity. Together, these findings support a role for huntingtin in the intracellular trafficking of proteins required for the construction of the extracellular matrix.
    Human Molecular Genetics 03/2007; 16(4):391-409. · 7.64 Impact Factor
  • Source
    Article: Sample matching by inferred agonal stress in gene expression analyses of the brain.
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression patterns in the brain are strongly influenced by the severity and duration of physiological stress at the time of death. This agonal effect, if not well controlled, can lead to spurious findings and diminished statistical power in case-control comparisons. While some recent studies match samples by tissue pH and clinically recorded agonal conditions, we found that these indicators were sometimes at odds with observed stress-related gene expression patterns, and that matching by these criteria still sometimes results in identifying case-control differences that are primarily driven by residual agonal effects. This problem is analogous to the one encountered in genetic association studies, where self-reported race and ethnicity are often imprecise proxies for an individual's actual genetic ancestry. We developed an Agonal Stress Rating (ASR) system that evaluates each sample's degree of stress based on gene expression data, and used ASRs in post hoc sample matching or covariate analysis. While gene expression patterns are generally correlated across different brain regions, we found strong region-region differences in empirical ASRs in many subjects that likely reflect inter-individual variabilities in local structure or function, resulting in region-specific vulnerability to agonal stress. Variation of agonal stress across different brain regions differs between individuals, revealing a new level of complexity for gene expression studies of brain tissues. The Agonal Stress Ratings quantitatively assess each sample's extent of regulatory response to agonal stress, and allow a strong control of this important confounder.
    BMC Genomics 02/2007; 8:336. · 4.07 Impact Factor
  • Source
    Article: Application of microarray technology in primate behavioral neuroscience research.
    [show abstract] [hide abstract]
    ABSTRACT: Gene expression profiling of brain tissue samples applied to DNA microarrays promises to provide novel insights into the neurobiological bases of primate behavior. The strength of the microarray technology lies in the ability to simultaneously measure the expression levels of all genes in defined brain regions that are known to mediate behavior. The application of microarrays presents, however, various limitations and challenges for primate neuroscience research. Low RNA abundance, modest changes in gene expression, heterogeneous distribution of mRNA among cell subpopulations, and individual differences in behavior all mandate great care in the collection, processing, and analysis of brain tissue. A unique problem for nonhuman primate research is the limited availability of species-specific arrays. Arrays designed for humans are often used, but expression level differences are inevitably confounded by gene sequence differences in all cross-species array applications. Tools to deal with this problem are currently being developed. Here we review these methodological issues, and provide examples from our experiences using human arrays to examine brain tissue samples from squirrel monkeys. Until species-specific microarrays become more widely available, great caution must be taken in the assessment and interpretation of microarray data from nonhuman primates. Nevertheless, the application of human microarrays in nonhuman primate neuroscience research recovers useful information from thousands of genes, and represents an important new strategy for understanding the molecular complexity of behavior and mental health.
    Methods 03/2006; 38(3):227-34. · 4.01 Impact Factor
  • Article: Systematic changes in gene expression in postmortem human brains associated with tissue pH and terminal medical conditions.
    [show abstract] [hide abstract]
    ABSTRACT: Studies of gene expression abnormalities in psychiatric or neurological disorders often involve the use of postmortem brain tissue. Compared with single-cell organisms or clonal cell lines, the biological environment and medical history of human subjects cannot be controlled, and are often difficult to document fully. The chance of finding significant and replicable changes depends on the nature and magnitude of the observed variations among the studied subjects. During an analysis of gene expression changes in mood disorders, we observed a remarkable degree of natural variation among 120 samples, which represented three brain regions in 40 subjects. Most of such diversity can be accounted for by two distinct expression patterns, which in turn are strongly correlated with tissue pH. Individuals who suffered prolonged agonal states, such as with respiratory arrest, multi-organ failure or coma, tended to have lower pH in the brain; whereas those who experienced brief deaths, associated with accidents, cardiac events or asphyxia, generally had normal pH. The lower pH samples exhibited a systematic decrease in expression of genes involved in energy metabolism and proteolytic activities, and a consistent increase of genes encoding stress-response proteins and transcription factors. This functional specificity of changed genes suggests that the difference is not merely due to random RNA degradation in low pH samples; rather it reflects a broad and actively coordinated biological response in living cells. These findings shed light on critical molecular mechanisms that are engaged during different forms of terminal stress, and may suggest clinical targets of protection or restoration.
    Human Molecular Genetics 04/2004; 13(6):609-16. · 7.64 Impact Factor
  • Source
    Article: Detecting simultaneous changepoints in multiple sequences
    [show abstract] [hide abstract]
    ABSTRACT: We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary. Copyright 2010, Oxford University Press.
    Biometrika 97(3):631-645. · 1.91 Impact Factor