[show abstract][hide abstract] ABSTRACT: Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases.
[show abstract][hide abstract] ABSTRACT: The extended Simes' test (known as GATES) and scaled chi-square test were proposed to combine a set of dependent genome-wide association signals at multiple single-nucleotide polymorphisms (SNPs) for assessing the overall significance of association at the gene or pathway levels. The two tests use different strategies to combine association p values and can outperform each other when the number of and linkage disequilibrium between SNPs vary. In this paper, we introduce a hybrid set-based test (HYST) combining the two tests for genome-wide association studies (GWASs). We describe how HYST can be used to evaluate statistical significance for association at the protein-protein interaction (PPI) level in order to increase power for detecting disease-susceptibility genes of moderate effect size. Computer simulations demonstrated that HYST had a reasonable type 1 error rate and was generally more powerful than its parents and other alternative tests to detect a PPI pair where both genes are associated with the disease of interest. We applied the method to three complex disease GWAS data sets in the public domain; the method detected a number of highly connected significant PPI pairs involving multiple confirmed disease-susceptibility genes not found in the SNP- and gene-based association analyses. These results indicate that HYST can be effectively used to examine a collection of predefined SNP sets based on prior biological knowledge for revealing additional disease-predisposing genes of modest effects in GWASs.
The American Journal of Human Genetics 09/2012; 91(3):478-88. · 11.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: Exome sequencing strategy is promising for finding novel mutations of human monogenic disorders. However, pinpointing the casual mutation in a small number of samples is still a big challenge. Here, we propose a three-level filtration and prioritization framework to identify the casual mutation(s) in exome sequencing studies. This efficient and comprehensive framework successfully narrowed down whole exome variants to very small numbers of candidate variants in the proof-of-concept examples. The proposed framework, implemented in a user-friendly software package, named KGGSeq (http://statgenpro.psychiatry.hku.hk/kggseq), will play a very useful role in exome sequencing-based discovery of human Mendelian disease genes.
Nucleic Acids Research 01/2012; 40(7):e53. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Our previous genome-wide association study (GWAS) in a Hong Kong Southern Chinese population with extreme bone mineral density (BMD) scores revealed suggestive association with MPP7, which ranked second after JAG1 as a candidate gene for BMD. To follow-up this suggestive signal, we replicated the top single-nucleotide polymorphism rs4317882 of MPP7 in three additional independent Asian-descent samples (n= 2684). The association of rs4317882 reached the genome-wide significance in the meta-analysis of all available subjects (P(meta)= 4.58 × 10(-8), n= 4204). Site heterogeneity was observed, with a larger effect on spine than hip BMD. Further functional studies in a zebrafish model revealed that vertebral bone mass was lower in an mpp7 knock-down model compared with the wide-type (P= 9.64 × 10(-4), n= 21). In addition, MPP7 was found to have constitutive expression in human bone-derived cells during osteogenesis. Immunostaining of murine MC3T3-E1 cells revealed that the Mpp7 protein is localized in the plasma membrane and intracytoplasmic compartment of osteoblasts. In an assessment of the function of identified variants, an electrophoretic mobility shift assay demonstrated the binding of transcriptional factor GATA2 to the risk allele 'A' but not the 'G' allele of rs4317882. An mRNA expression study in human peripheral blood mononuclear cells confirmed that the low BMD-related allele 'A' of rs4317882 was associated with lower MPP7 expression (P= 9.07 × 10(-3), n= 135). Our data suggest a genetic and functional association of MPP7 with BMD variation.
Human Molecular Genetics 12/2011; 21(7):1648-57. · 7.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: In the majority of patients, epilepsy is a complex disorder with multiple susceptibility genes interacting with environmental factors. However, we understand little about its genetic risks. Here, we report the first genome-wide association study (GWAS) to identify common susceptibility variants of epilepsy in Chinese. This two-stage GWAS included a total of 1087 patients and 3444 matched controls. In the combined analysis of the two stages, the strongest signals were observed with two highly correlated variants, rs2292096 [G] [P= 1.0 × 10(-8), odds ratio (OR) = 0.63] and rs6660197 [T] (P= 9.9 × 10(-7), OR = 0.69), with the former reaching genome-wide significance, on 1q32.1 in the CAMSAP1L1 gene, which encodes a cytoskeletal protein. We also refined a previously reported association with rs9390754 (P= 1.7 × 10(-5)) on 6q21 in the GRIK2 gene, which encodes a glutamate receptor, and identified several other loci in genes involved in neurotransmission or neuronal networking that warrant further investigation. Our results suggest that common genetic variants may increase the susceptibility to epilepsy in Chinese.
Human Molecular Genetics 11/2011; 21(5):1184-9. · 7.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: Serum osteoprotegerin (OPG) level is a key biomarker for numerous traits of clinical importance like diabetes, coronary artery disease, blood pressure, lipid profile, and cancers, but its genetic basis remains poorly understood. We estimated the heritability (h(2)) of serum OPG level in 1442 southern Chinese subjects from 306 families. The h(2) for unadjusted OPG was 0.62 for females and 0.17 for males; and for age-adjusted OPG, 0.75 for females and 0.37 for males. Adjustment for lifestyle factors including calcium and phytoestrogen intake, exercise, smoking, and alcohol consumption exerted only a modest effect on the h(2). In conclusion, we confirmed that circulating OPG is a heritable trait and there is a significant difference in heritability between sexes.
Annals of Human Genetics 09/2011; 75(5):584-8. · 2.22 Impact Factor
[show abstract][hide abstract] ABSTRACT: Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
[show abstract][hide abstract] ABSTRACT: Risk prediction based on genomic profiles has raised a lot of attention recently. However, family history is usually ignored in genetic risk prediction. In this study we proposed a statistical framework for risk prediction given an individual's genotype profile and family history. Genotype information about the relatives can also be incorporated. We allow risk prediction given the current age and follow-up period and consider competing risks of mortality. The framework allows easy extension to any family size and structure. In addition, the predicted risk at any percentile and the risk distribution graphs can be computed analytically. We applied the method to risk prediction for breast and prostate cancers by using known susceptibility loci from genome-wide association studies. For breast cancer, in the population the 10-year risk at age 50 ranged from 1.1% at the 5th percentile to 4.7% at the 95th percentile. If we consider the average 10-year risk at age 50 (2.39%) as the threshold for screening, the screening age ranged from 62 at the 20th percentile to 38 at the 95th percentile (and some never reach the threshold). For women with one affected first-degree relative, the 10-year risks ranged from 2.6% (at the 5th percentile) to 8.1% (at the 95th percentile). For prostate cancer, the corresponding 10-year risks at age 60 varied from 1.8% to 14.9% in the population and from 4.2% to 23.2% in those with an affected first-degree relative. We suggest that for some diseases genetic testing that incorporates family history can stratify people into diverse risk categories and might be useful in targeted prevention and screening.
The American Journal of Human Genetics 05/2011; 88(5):548-65. · 11.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: The gene has been proposed as an attractive unit of analysis for association studies, but a simple yet valid, powerful, and sufficiently fast method of evaluating the statistical significance of all genes in large, genome-wide datasets has been lacking. Here we propose the use of an extended Simes test that integrates functional information and association evidence to combine the p values of the single nucleotide polymorphisms within a gene to obtain an overall p value for the association of the entire gene. Our computer simulations demonstrate that this test is more powerful than the SNP-based test, offers effective control of the type 1 error rate regardless of gene size and linkage-disequilibrium pattern among markers, and does not need permutation or simulation to evaluate empirical significance. Its statistical power in simulated data is at least comparable, and often superior, to that of several alternative gene-based tests. When applied to real genome-wide association study (GWAS) datasets on Crohn disease, the test detected more significant genes than SNP-based tests and alternative gene-based tests. The proposed test, implemented in an open-source package, has the potential to identify additional novel disease-susceptibility genes for complex diseases from large GWAS datasets.
The American Journal of Human Genetics 03/2011; 88(3):283-93. · 11.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: Quantitative-trait association studies have been widely used in search for genetic loci for complex traits in recent years. Yet, fiscal constraints still prohibit many on-going research projects from recruiting a large number of individuals for genotyping to reach a desired level of statistical power. Accordingly, in this article, we describe a novel sib pair sampling strategy for genotyping in QTL association studies. With the use of phenotypic scores (and IBD allele-sharing probabilities if available), the genetic effect of a biallelic additive trait locus can be properly modelled within the maximum-likelihood variance components framework proposed by Fulker et al. (Am J Hum Genet 64(1):259-267, 1999) and sib pairs can be rank-ordered by use of informativeness indices. The performance of our method was investigated using simulation. The power of our approach was shown to be higher when compared with other phenotypic selection schemes. An R-script implementing all the selection approaches (including the traditional phenotype-based ones) used in the simulation is available at http://statgen.hku.hk/jshkwan .
[show abstract][hide abstract] ABSTRACT: The present study investigated the 2-week prevalence of depressive symptoms in college freshmen from Beijing and Hong Kong. The relationship between depression and 3 personality factors in these college freshmen was analyzed.
Center for Epidemiologic Studies Depression Scale (CES-D), Eysenck Personality Questionnaire-Neuroticism, Rosenberg Self-esteem Scale, and Frost Multidimensional Perfectionism Scale were administered to 988 Beijing and 802 Hong Kong Chinese college freshmen.
Approximately 24.8% of freshmen in Beijing had scores on the CES-D exceeding 16, whereas 8.9% reported scores of 25 or higher. There was no sex difference in prevalence in Beijing. Approximately 43.9% of freshmen in Hong Kong had scores on the CES-D exceeding 16, whereas 17.6% reported scores of 25 or higher. The prevalence is significantly different between sexes in Hong Kong, with approximately 36.1% of men having scores of 16 or higher and 13.4% having scores of 25 or higher and approximately 50.7% of women having scores of 16 or higher and 21.3% having scores of 25 or higher. High neuroticism, concern over mistakes, doubts about actions, low self-esteem, and poor organization were associated with current depressive symptoms in both sites.
The higher prevalence of current depressive symptoms in college freshmen in Hong Kong suggests that their mental health is not as satisfactory as that of their counterparts in Beijing. The strong relationship between certain personality features and current depressive symptoms is similar in both regions. Personality differences in the 2 sites explain only part, but not all, of the difference in depressive symptoms between the 2 sites.