Qiuying Sha

Michigan Technological University, Хаутон, Michigan, United States

Are you Qiuying Sha?

Claim your profile

Publications (41)104.4 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations. © 2015 WILEY PERIODICALS, INC.
    Genetic Epidemiology 03/2015; 39(4). DOI:10.1002/gepi.21894 · 2.95 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single-variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single-variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests.
    Genetic Epidemiology 07/2014; DOI:10.1002/gepi.21834 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across the whole genome. We measured the genetic association between a disease and a combination of variants of a single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows.
    BMC proceedings 06/2014; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S59. DOI:10.1186/1753-6561-8-S1-S59
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.European Journal of Human Genetics advance online publication, 26 March 2014; doi:10.1038/ejhg.2014.43.
    European journal of human genetics: EJHG 03/2014; 23(2). DOI:10.1038/ejhg.2014.43 · 4.23 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
    Genetic Epidemiology 02/2014; 38(2). DOI:10.1002/gepi.21787 · 2.95 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although next-generation sequencing technology allows sequencing the whole genome of large groups of individuals, the development of powerful statistical methods for rare variant association studies is still underway. Even though many statistical methods have been developed for mapping rare variants, most of these methods are for unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. The majority of the existing methods for unrelated individuals is essentially testing the effect of a weighted combination of variants with different weighting schemes. The performance of these methods depends on the weights being used. Recently, researchers proposed a test for Testing the effect of an Optimally Weighted combination of variants (TOW) for unrelated individuals. In this article, we extend our previously developed TOW for unrelated individuals to family-based data and propose a novel test for Testing the effect of an Optimally Weighted combination of variants for Family-based designs (TOW-F). The optimal weights are analytically derived. The results of extensive simulation studies show that TOW-F is robust to population stratification in a wide range of population structures, is robust to the direction and magnitude of the effects of causal variants, and is relatively robust to the percentage of neutral variants.
    Annals of Human Genetics 08/2013; 77. DOI:10.1111/ahg.12038 · 1.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing technology will soon allow sequencing the whole genome of large groups of individuals, and thus will make directly testing rare variants possible. Currently, most of existing methods for rare variant association studies are essentially testing the effect of a weighted combination of variants with different weighting schemes. Performance of these methods depends on the weights being used and no optimal weights are available. By putting large weights on rare variants and small weights on common variants, these methods target at rare variants only, although increasing evidence shows that complex diseases are caused by both common and rare variants. In this paper, we analytically derive optimal weights under a certain criterion. Based on the optimal weights, we propose a Variable Weight Test for testing the effect of an Optimally Weighted combination of variants (VW-TOW). VW-TOW aims to test the effects of both rare and common variants. VW-TOW is applicable to both quantitative and qualitative traits, allows covariates, can control for population stratification, and is robust to directions of effects of causal variants. Extensive simulation studies and application to the Genetic Analysis Workshop 17 (GAW17) data show that VW-TOW is more powerful than existing ones either for testing effects of both rare and common variants or for testing effects of rare variants only.
    Genetic Epidemiology 09/2012; 36(6):561-71. DOI:10.1002/gepi.21649 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current statistical methods to test association between rare variants and phenotypes are essentially the group-wise methods that collapse or aggregate all variants in a predefined group into a single variant. Comparing with the variant-by-variant methods, the group-wise methods have their advantages. However, two factors may affect the power of these methods. One is that some of the causal variants may be protective. When both risk and protective variants are presented, it will lose power by collapsing or aggregating all variants because the effects of risk and protective variants will counteract each other. The other is that not all variants in the group are causal; rather, a large proportion is believed to be neutral. When a large proportion of variants are neutral, collapsing or aggregating all variants may not be an optimal solution. We propose two alternative methods, adaptive clustering (AC) method and adaptive weighting (AW) method, aiming to test rare variant association in the presence of neutral and/or protective variants. Both of AC and AW are applicable to quantitative traits as well as qualitative traits. Results of extensive simulation studies show that AC and AW have similar power and both of them have clear advantages from power to computational efficiency comparing with existing group-wise methods and existing data-driven methods that allow neutral and protective variants. We recommend AW method because AW method is computationally more efficient than AC method.European Journal of Human Genetics advance online publication, 11 July 2012; doi:10.1038/ejhg.2012.143.
    European journal of human genetics: EJHG 07/2012; 21(3). DOI:10.1038/ejhg.2012.143 · 4.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although next-generation DNA sequencing technologies have made rare variant association studies feasible and affordable, the development of powerful statistical methods for rare variant association studies is still under way. Most of the existing methods for rare variant association studies compare the number of rare mutations in a group of rare variants (in a gene or a pathway) between cases and controls. However, these methods assume that all causal variants are risk to diseases. Recently, several methods that are robust to the direction and magnitude of effects of causal variants have been proposed. However, they are applicable to unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. In this article, we propose two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. Using extensive simulation studies, we evaluate and compare our proposed methods with two methods based on the weights proposed by Madsen and Browning. Our results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning, especially when both risk and protective variants are present.
    Genetic Epidemiology 07/2012; 36(5):499-507. DOI:10.1002/gepi.21646 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop statistical methods for detecting rare variants that are associated with quantitative traits. We propose two strategies and their combination for this purpose: the iterative regression strategy and the extreme values strategy. In the iterative regression strategy, we use iterative regression on residuals and a multimarker association test to identify a group of significant variants. In the extreme values strategy, we use individuals with extreme trait values to select candidate genes and then test only these candidate genes. These two strategies are integrated into a hybrid approach through a weighting technology. We apply the proposed methods to analyze the Genetic Analysis Workshop 17 data set. The results show that the hybrid approach is the most powerful approach. Using the hybrid approach, the average power to detect causal genes for Q1 is about 40% and the powers to detect FLT1 and KDR are 100% and 68% for Q1, respectively. The powers to detect VNN3 and BCHE are 34% and 30% for Q2, respectively.
    BMC proceedings 11/2011; 5 Suppl 9(Suppl 9):S112. DOI:10.1186/1753-6561-5-S9-S112
  • Source
    Adan Niu, Shuanglin Zhang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex diseases are presumed to be the result of multiple genes and environmental factors, which emphasize the importance of gene - gene and gene - environment interactions. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions with undetectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method was developed. MDR is well suited for examining high-order interactions and detecting interactions without main effects. Like most statistical methods in genetic association studies, MDR may also lead to a false positive in the presence of population stratification. Although many statistical methods have been proposed to detect main effects and control for population stratification using genomic markers, not many methods are available to detect interactions and control for population stratification at the same time. In this article, we developed a novel test, MDR in structured populations (MDR-SP), to detect the interactions and control for population stratification. MDR-SP is applicable to both quantitative and qualitative traits and can incorporate covariates. We present simulation studies to demonstrate the validity of the test and to evaluate its power.
    Annals of Human Genetics 11/2011; 75(6):742-54. DOI:10.1111/j.1469-1809.2011.00681.x · 1.93 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Testing for Hardy-Weinberg equilibrium (HWE) is used routinely as an important initial step for genotype data quality checking. Departure from HWE can be caused by many factors, such as genotyping errors, population stratification, and disease association, if we use affected individuals only. In a structured population, even if a marker is in HWE in each subpopulation, data may show departure from HWE if allele frequencies are different in different subpopulations and such a departure can be misinterpreted as a potential problem in genotyping quality, resulting in false exclusion from future analysis. In this article, we propose a new HWE test, a test for HWE in structured populations (HWES) that can assess departure from HWE and take into account of population stratification at the same time. Our proposed test can distinguish departure from HWE caused by population stratification and departure from HWE caused by other factors. We use simulation studies as well as applications to real data sets to evaluate the performance of the proposed test. Results show that, for a wide range of population structures, our proposed test has correct type I error rates while the traditional χ(2) test will lead to false-positive results. In homogenous populations, our proposed test has comparable power with the traditional χ(2) test.
    Genetic Epidemiology 11/2011; 35(7):671-8. DOI:10.1002/gepi.20617 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683-691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209-213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification.
    PLoS ONE 07/2011; 6(7):e21957. DOI:10.1371/journal.pone.0021957 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.
    Genetic Epidemiology 07/2011; 35(5):350-9. DOI:10.1002/gepi.20583 · 2.95 Impact Factor
  • Source
    Shurong Fang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: A genome-wide association (GWA) study is an approach that involves scanning markers across the whole genome to find genetic variations that contribute to a particular disease. For complex diseases, joint effects of genes play an important role. Thus, detecting genes with modest marginal effects but strong joint effects with other genes is important. However, to evaluate the joint effects requires heavy computing in GWA studies. In this article, we propose a novel two-stage approach, which is promising to identify joint effects in GWA studies, especially for monotonic models. In the first stage, all markers are ranked by a single-marker test (SMT) which can assess the association of a single marker and the disease. Then we use a marker clustering algorithm to group the highly correlated markers within a certain physical distance and select a marker with the smallest p-value as a representative in each cluster. In the second stage, we test the two-locus joint effects of the selected representatives. We apply a likelihood ratio test (LRT) to test each two-locus joint effect under monotonic models. Comparing with a regular two-stage method, the proposed method not only does reduce the computational burden, it also increases the power by reducing the number of tests. We perform simulation studies to investigate the power and type I error rate under different scenarios. The results show that our two-stage approach is more powerful than a single-marker method (SM) and a regular two-stage analysis based on the two-locus genotypic test (GT).
    Joint Statistical Meetings; 08/2010
  • Source
    Zhaogong Zhang, Adan Niu, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a two-stage approach based on 17 biologically plausible models to search for two-locus combinations that have significant joint effects on the disease status in genome-wide association (GWA) studies. In the two-stage analyses, we only test two-locus joint effects of SNPs that show modest marginal effects. We use simulation studies to compare the power of our two-stage analysis with a single-marker analysis and a two-stage analysis by using a full model. We find that for most plausible interaction effects, our two-stage analysis can dramatically increase the power to identify two-locus joint effects compared to a single-marker analysis and a two-stage analysis based on the full model. We also compare two-stage methods with one-stage methods. Our simulation results indicate that two-stage methods are more powerful than one-stage methods. We applied our two-stage approach to a GWA study for identifying genetic factors that might be relevant in the pathogenesis of sporadic Amyotrophic Lateral Sclerosis (ALS). Our proposed two-stage approach found that two SNPs have significant joint effect on sporadic ALS while the single-marker analysis and the two-stage analysis based on the full model did not find any significant results.
    Annals of Human Genetics 07/2010; 74(5):406-15. DOI:10.1111/j.1469-1809.2010.00594.x · 1.93 Impact Factor
  • Source
    Xuexia Wang, Huaizhen Qin, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : In genome-wide association studies, new schemes are needed to incorporate multiple-locus information. In this article, we proposed a two-stage sliding-window approach to detect associations between a disease and multiple genetic polymorphisms. In the proposed approach, we measured the genetic association between a disease and a single-nucleotide polymorphism window by the newly developed likelihood ratio test-principal components statistic, and performed a sliding-window technique to detect disease susceptibility windows. We split the whole sample into two sub-samples, each of which contained a portion of cases and controls. In the first stage, we selected the top R windows by the statistics based on the first sub-sample, and in the second stage, we claimed significant windows by false-discovery rate correction on the p-values of the statistics based on the second sub-sample. By applying the new approach to the Genetic Analysis Workshop 16 Problem 1 data set, we detected 212 out of 531,601 windows to be responsible for rheumatoid arthritis. Except for chromosomes 4 and 18, each of the other 20 autosomes was found to harbor risk windows. Our results supported the findings of some rheumatoid arthritis susceptibility genes identified in the literature. In addition, we identified several new single-nucleotide polymorphism windows for follow-up studies.
    BMC proceedings 12/2009; 3 Suppl 7(Suppl 7):S28. DOI:10.1186/1753-6561-3-s7-s28
  • Source
    Qiuying Sha, Rui Tang, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : With the recent rapid improvements in high-throughout genotyping techniques, researchers are facing a very challenging task of large-scale genetic association analysis, especially at the whole-genome level, without an optimal solution. In this study, we propose a new approach for genetic association analysis based on a variable-sized sliding-window framework. This approach employs principal component analysis to find the optimal window size. Using the bisection algorithm in window size searching, the proposed method tackles the exhaustive computation problem. It is more efficient and effective than currently available approaches. We conduct the genome-wide association study in Genetic Analysis Workshop 16 (GAW16) Problem 1 data using the proposed method. Our method successfully identified several susceptibility genes that have been reported by other researchers and additional candidate genes for follow-up studies.
    BMC proceedings 12/2009; 3 Suppl 7(Suppl 7):S14. DOI:10.1186/1753-6561-3-s7-s14
  • Source
    Adan Niu, Zhaogong Zhang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : The goal of this paper is to search for two-locus combinations that are jointly associated with rheumatoid arthritis using the data set of Genetic Analysis Workshop 16 Problem 1. We use a two-stage strategy to reduce the computational burden associated with performing an exhaustive two-locus search across the genome. In the first stage, the full set of 531,689 single-nucleotide polymorphisms was screened using univariate testing. In the second stage, all pairs made from the 500 single-nucleotide polymorphisms with the lowest p-values from the first stage were evaluated under each of 17 two-locus models. Our analyses identified a two-locus combination - rs6939589 and rs11634386 - that proved to be significantly associated with rheumatoid arthritis under a Rec x Rec model (p-value = 0.045 after adjusting for multiple tests and multiple models).
    BMC proceedings 12/2009; 3 Suppl 7(Suppl 7):S26. DOI:10.1186/1753-6561-3-s7-s26
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes, such as PTPN22, STAT4, and TRAF1-C5, have been identified. However, it is presumed that some genes may interact to have a significant effect on the disease, while each of them only plays a modest role. We propose a new combinatorial association test to detect the gene-gene interaction in the rheumatoid arthritis data using multiple traits: disease status, anti-cyclic citrullinated peptide, and immunoglobulin M. Existing gene-gene interaction tests only use the information on a single trait at a time. In this article, we propose a new multivariate combinatorial searching method that utilizes multiple traits at the same time. Multivariate combinatorial searching method is conducted by incorporating the multiple traits with various techniques of feature selection to search for a set of disease-susceptibility genes that may interact. By analyzing three panels of markers, we have identified a significant gene-gene interaction between PTPN22 and TRAF1-C5.
    BMC proceedings 12/2009; 3 Suppl 7(Suppl 7):S43. DOI:10.1186/1753-6561-3-s7-s43