Xu Yang

Beijing Genomics Institute, Bao'an, Guangdong, China

Are you Xu Yang?

Claim your profile

Publications (13)145.81 Total impact

  • Source
    Genome Biology and Evolution 11/2014; · 4.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mongolians have played a significant role in modern human evolution, especially after the rise of Genghis Khan (1162?-1227). Although the social cultural impacts of Genghis Khan and the Mongolian population have been well documented, explorations of their genome structure and genetic imprints on other human populations have been lacking. We here present the genome of a Mongolian male individual. The genome was de novo assembled using a total of 130.8-fold genomic data produced from massively parallel whole genome sequencing. We identified high-confidence variation sets, including 3.7 million single nucleotide polymorphisms (SNPs) and 756,234 short insertions and deletions (Indels). Functional SNP analysis predicted the individual has a pathogenic risk for carnitine deficiency. We located the patrilineal inheritance of the Mongolian genome to the lineage D3a through Y haplogroup analysis and inferred that the individual has a common patrilineal ancestor with Tibeto-Burman populations and is likely to be the progeny of the earliest settlers in East Asia. We finally investigated the genetic imprints of Mongolians on other human populations using different approaches. We found varying degrees of gene flows between Mongolians and populations living in Europe, South/Central Asia and the Indian subcontinent. The analyses demonstrate that the genetic impacts of Mongolians likely resulted from the expansion of the Mongolian Empire in the 13(th) century. The genome will be of great help in further explorations of modern human evolution and genetic causes of diseases/traits specific to Mongolians.
    Genome Biology and Evolution 11/2014; 6(12). DOI:10.1093/gbe/evu242 · 4.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Parkinson's disease (PD) is a common neurodegenerative disorder of complex aetiology. Rare, highly penetrant PD-causing mutations and common risk factors of small effect size have been identified in several genes/loci. However, these mutations and risk factors only explain a fraction of the disease burden, suggesting that additional, substantial genetic determinants remain to be found. Genetically isolated populations offer advantages for dissecting the genetic architecture of complex disorders, such as PD. We performed exome sequencing in 100 unrelated PD patients from Sardinia, a genetic isolate. SNPs absent from dbSNP129 and 1000 Genomes, shared by at least five patients, and of functional effects were genotyped in an independent Sardinian case-control sample (n = 500). Variants associated with PD with nominal p value <0.05 and those with odds ratio (OR) ≥3 were validated by Sanger sequencing and typed in a replication sample of 2965 patients and 2678 controls from Italy, Spain, and Portugal. We identified novel moderately rare variants in several genes, including SCAPER, HYDIN, UBE2H, EZR, MMRN2 and OGFOD1 that were specifically present in PD patients or enriched among them, nominating these as novel candidate risk genes for PD, although no variants achieved genome-wide significance after Bonferroni correction. Our results suggest that the genetic bases of PD are highly heterogeneous, with implications for the design of future large-scale exome or whole-genome analyses of this disease.
    Neurogenetics 10/2014; 16(1). DOI:10.1007/s10048-014-0425-x · 2.66 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population. The major tumor clone harbored APC and TP53 mutations as early oncogenic events, whereas the minor clone contained preponderant CDC27 and PABPC1 mutations. The absence of APC and TP53 mutations in the minor clone supports that these two clones were derived from two cellular origins. Examination of somatic mutation allele frequency spectra of additional 21 whole-tissue exome-sequenced cases revealed the heterogeneity of clonal origins in colon cancer. Next, we identified a mutated gene SLC12A5 that showed a high frequency of mutation at the single-cell level but exhibited low prevalence at the population level. Functional characterization of mutant SLC12A5 revealed its potential oncogenic effect in colon cancer. Our study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.Cell Research advance online publication 4 April 2014; doi:10.1038/cr.2014.43.
    Cell Research 04/2014; 24(6). DOI:10.1038/cr.2014.43 · 11.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Adrenal Cushing’s syndrome is caused by excess production of glucocorticoid from adrenocortical tumors and hyperplasias, which leads to metabolic disorders. We performed whole-exome sequencing of 49 blood-tumor pairs and RNA sequencing of 44 tumors from cortisol-producing adrenocortical adenomas (ACAs), adrenocorticotropic hormone–independent macronodular adrenocortical hyperplasias (AIMAHs), and adrenocortical oncocytomas (ADOs). We identified a hotspot in the PRKACA gene with a L205R mutation in 69.2% (27 out of 39) of ACAs and validated in 65.5% of a total of 87 ACAs. Our data revealed that the activating L205R mutation, which locates in the P+1 loop of the protein kinase A (PKA) catalytic subunit, promoted PKA substrate phosphorylation and target gene expression. Moreover, we discovered the recurrently mutated gene DOT1L in AIMAHs and CLASP2 in ADOs. Collectively, these data highlight potentially functional mutated genes in adrenal Cushing’s syndrome.
    Science 04/2014; 344(6186). DOI:10.1126/science.1249480 · 31.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To explore the contribution of functional coding variants to psoriasis, we analyzed nonsynonymous single-nucleotide variants (SNVs) across the genome by exome sequencing in 781 psoriasis cases and 676 controls and through follow-up validation in 1,326 candidate genes by targeted sequencing in 9,946 psoriasis cases and 9,906 controls from the Chinese population. We discovered two independent missense SNVs in IL23R and GJB2 of low frequency and five common missense SNVs in LCE3D, ERAP1, CARD14 and ZNF816A associated with psoriasis at genome-wide significance. Rare missense SNVs in FUT2 and TARBP1 were also observed with suggestive evidence of association. Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing. This suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for psoriasis.
    Nature Genetics 11/2013; 46(1). DOI:10.1038/ng.2827 · 29.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Marie Unna hereditary hypotrichosis (MUHH) is an autosomal dominant disorder characterised by coarse, wiry, twisted hair developed in early childhood and subsequent progressive hair loss. MUHH is a genetically heterogeneous disorder. No gene in 1p21.1–1q21.3 region responsible for MUHH has been identified. Methods Exome sequencing was performed on two affected subjects, who had normal vertex hair and modest alopecia, and one unaffected individual from a four-generation MUHH family of which our previous linkage study mapped the MUHH locus on chromosome 1p21.1–1q21.3. Results We identified a missense mutation in EPS8L3 (NM_024526.3: exon2: c.22G->A:p.Ala8Thr) within 1p21.1–1q21.3. Sanger sequencing confirmed the cosegregation of this mutation with the disease phenotype in the family by demonstrating the presence of the heterozygous mutation in all the eight affected and absence in all the seven unaffected individuals. This mutation was found to be absent in 676 unrelated healthy controls and 781 patients of other disease from another unpublished project of our group. Conclusions Taken together, our results suggest that EPS8L3 is a causative gene for MUHH, which was helpful for advancing us on understanding of the pathogenesis of MUHH. Our study also has further demonstrated the effectiveness of combining exome sequencing with linkage information for identifying Mendelian disease genes.
    Journal of Medical Genetics 10/2012; 49(12). DOI:10.1136/jmedgenet-2012-101134 · 5.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Punctate palmoplantar keratoderma (PPPK) is a rare autosomal dominant skin disorder characterised by numerous hyperkeratotic papules irregularly distributed on the palms and soles. To date, no causal gene for this disease has been identified. We performed exome sequencing analysis of four affected individuals and two unaffected controls from one Chinese PPPK family where disease locus was mapped at 8q24.13-8q24.21 by our previous linkage analysis. We identified a novel heterozygous mutation in COL14A1 gene (c.4505C→T (p.Pro1502Leu)), which located within the linkage region that we previously identified for PPPK. The mutation was shared by the four affected individuals, but not for the two controls of the family. Sanger sequencing confirmed this mutation in another four cases from this family. This mutation was invisible in the normal controls of this family as well as the additional 676 unrelated normal controls and 781 patients with other disease. The shared COL14A1 mutation, p.Pro1502Leu, is a missense substitution at a highly conserved amino acid residue across multiple species. The power of combining exome sequencing and linkage information in the study of genetics of autosomal dominant disorders, even in simplex cases, has been demonstrated. Our results suggested that COL14A1 would be a casual gene for PPPK, which was helpful for advancing us on understanding of the pathogenesis of PPPK.
    Journal of Medical Genetics 09/2012; 49(9):563-8. DOI:10.1136/jmedgenet-2012-100868 · 5.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, hundreds of gene loci associated with multiple cardiovascular pathologies and traits have been identified through high-throughput Next-Generation Sequencing (NGS) technology. Due to the increasing efficiency and decreasing cost of NGS, rapid progresses anticipated in the field of CVD research. This review summarizes the main strategies of CV research with NGS at the level of genomics, transcriptomics, epigenetics, and proteomics.
    06/2012; 2(2):138-146. DOI:10.3978/j.issn.2223-3652.2012.06.01
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tumor heterogeneity presents a challenge for inferring clonal evolution and driver gene identification. Here, we describe a method for analyzing the cancer genome at a single-cell nucleotide level. To perform our analyses, we first devised and validated a high-throughput whole-genome single-cell sequencing method using two lymphoblastoid cell line single cells. We then carried out whole-exome single-cell sequencing of 90 cells from a JAK2-negative myeloproliferative neoplasm patient. The sequencing data from 58 cells passed our quality control criteria, and these data indicated that this neoplasm represented a monoclonal evolution. We further identified essential thrombocythemia (ET)-related candidate mutations such as SESN2 and NTRK1, which may be involved in neoplasm progression. This pilot study allowed the initial characterization of the disease-related genetic architecture at the single-cell nucleotide level. Further, we established a single-cell sequencing method that opens the way for detailed analyses of a variety of tumor types, including those with high genetic complex between patients.
    Cell 03/2012; 148(5):873-85. DOI:10.1016/j.cell.2012.02.028 · 33.12 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, new-generation high-throughput technologies, including next-generation sequencing technology and mass spectrometry method, have been widely applied in solving biological problems, especially in human diseases field. This data driven, large-scale and industrialized research model enables the omnidirectional and multi-level study of human diseases from the perspectives of genomics, transcriptomics and proteomics levels, etc. In this paper, the latest development of the high-throughput technologies that applied in DNA, RNA, epigenomics, metagenomics including proteomics and some applications in translational medicine are reviewed. At genomics level, exome sequencing has been the hot spot of the recent research. However, the predominance of whole genome resequencing in detecting large structural variants within the whole genome level is coming to stand out as the drop of sequencing cost, which also makes it possible for personalized genome based medicine application. At trancriptomics level, e.g., small RNA sequencing can be used to detect known and predict unknown miRNA. Those small RNA could not only be the biomarkers for disease diagnosis and prognosis, but also show the potential of disease treatment. At proteomics level, e.g., target proteomics can be used to detect the possible disease-related protein or peptides, which can be useful index for clinical staging and typing. Furthermore, the application and development of trans-omics study in disease research are briefly introduced. By applying bioinformatics technologies for integrating multi-omics data, the mechanism, diagnosis and therapy of the disease are likely to be systemically explained and realized, so as to provide powerful tools for disease diagnosis and therapies.
    Hereditas (Beijing) 08/2011; 33(8):829-46.
  • Journal of Investigative Dermatology 03/2011; 131(7):1570-2. DOI:10.1038/jid.2011.62 · 6.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Autosomal-dominant spinocerebellar ataxias constitute a large, heterogeneous group of progressive neurodegenerative diseases with multiple types. To date, classical genetic studies have revealed 31 distinct genetic forms of spinocerebellar ataxias and identified 19 causative genes. Traditional positional cloning strategies, however, have limitations for finding causative genes of rare Mendelian disorders. Here, we used a combined strategy of exome sequencing and linkage analysis to identify a novel spinocerebellar ataxia causative gene, TGM6. We sequenced the whole exome of four patients in a Chinese four-generation spinocerebellar ataxia family and identified a missense mutation, c.1550T-G transition (L517W), in exon 10 of TGM6. This change is at a highly conserved position, is predicted to have a functional impact, and completely cosegregated with the phenotype. The exome results were validated using linkage analysis. The mutation we identified using exome sequencing was located in the same region (20p13-12.2) as that identified by linkage analysis, which cross-validated TGM6 as the causative spinocerebellar ataxia gene in this family. We also showed that the causative gene could be mapped by a combined method of linkage analysis and sequencing of one sample from the family. We further confirmed our finding by identifying another missense mutation c.980A-G transition (D327G) in exon seven of TGM6 in an additional spinocerebellar ataxia family, which also cosegregated with the phenotype. Both mutations were absent in 500 normal unaffected individuals of matched geographical ancestry. The finding of TGM6 as a novel causative gene of spinocerebellar ataxia illustrates whole-exome sequencing of affected individuals from one family as an effective and cost efficient method for mapping genes of rare Mendelian disorders and the use of linkage analysis and exome sequencing for further improving efficiency.
    Brain 12/2010; 133(Pt 12):3510-8. DOI:10.1093/brain/awq323 · 10.23 Impact Factor