Qiuying Sha

Michigan Technological University, Houghton, MI, United States

Are you Qiuying Sha?

Claim your profile

Publications (39)105.17 Total impact

  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single-variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single-variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests.
    Genetic Epidemiology 07/2014; · 4.02 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.European Journal of Human Genetics advance online publication, 26 March 2014; doi:10.1038/ejhg.2014.43.
    European journal of human genetics: EJHG 03/2014; · 3.56 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
    Genetic Epidemiology 12/2013; · 4.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although next-generation sequencing technology allows sequencing the whole genome of large groups of individuals, the development of powerful statistical methods for rare variant association studies is still underway. Even though many statistical methods have been developed for mapping rare variants, most of these methods are for unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. The majority of the existing methods for unrelated individuals is essentially testing the effect of a weighted combination of variants with different weighting schemes. The performance of these methods depends on the weights being used. Recently, researchers proposed a test for Testing the effect of an Optimally Weighted combination of variants (TOW) for unrelated individuals. In this article, we extend our previously developed TOW for unrelated individuals to family-based data and propose a novel test for Testing the effect of an Optimally Weighted combination of variants for Family-based designs (TOW-F). The optimal weights are analytically derived. The results of extensive simulation studies show that TOW-F is robust to population stratification in a wide range of population structures, is robust to the direction and magnitude of the effects of causal variants, and is relatively robust to the percentage of neutral variants.
    Annals of Human Genetics 08/2013; · 2.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current statistical methods to test association between rare variants and phenotypes are essentially the group-wise methods that collapse or aggregate all variants in a predefined group into a single variant. Comparing with the variant-by-variant methods, the group-wise methods have their advantages. However, two factors may affect the power of these methods. One is that some of the causal variants may be protective. When both risk and protective variants are presented, it will lose power by collapsing or aggregating all variants because the effects of risk and protective variants will counteract each other. The other is that not all variants in the group are causal; rather, a large proportion is believed to be neutral. When a large proportion of variants are neutral, collapsing or aggregating all variants may not be an optimal solution. We propose two alternative methods, adaptive clustering (AC) method and adaptive weighting (AW) method, aiming to test rare variant association in the presence of neutral and/or protective variants. Both of AC and AW are applicable to quantitative traits as well as qualitative traits. Results of extensive simulation studies show that AC and AW have similar power and both of them have clear advantages from power to computational efficiency comparing with existing group-wise methods and existing data-driven methods that allow neutral and protective variants. We recommend AW method because AW method is computationally more efficient than AC method.European Journal of Human Genetics advance online publication, 11 July 2012; doi:10.1038/ejhg.2012.143.
    European journal of human genetics: EJHG 07/2012; · 3.56 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing technology will soon allow sequencing the whole genome of large groups of individuals, and thus will make directly testing rare variants possible. Currently, most of existing methods for rare variant association studies are essentially testing the effect of a weighted combination of variants with different weighting schemes. Performance of these methods depends on the weights being used and no optimal weights are available. By putting large weights on rare variants and small weights on common variants, these methods target at rare variants only, although increasing evidence shows that complex diseases are caused by both common and rare variants. In this paper, we analytically derive optimal weights under a certain criterion. Based on the optimal weights, we propose a Variable Weight Test for testing the effect of an Optimally Weighted combination of variants (VW-TOW). VW-TOW aims to test the effects of both rare and common variants. VW-TOW is applicable to both quantitative and qualitative traits, allows covariates, can control for population stratification, and is robust to directions of effects of causal variants. Extensive simulation studies and application to the Genetic Analysis Workshop 17 (GAW17) data show that VW-TOW is more powerful than existing ones either for testing effects of both rare and common variants or for testing effects of rare variants only.
    Genetic Epidemiology 06/2012; 36(6):561-71. · 4.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although next-generation DNA sequencing technologies have made rare variant association studies feasible and affordable, the development of powerful statistical methods for rare variant association studies is still under way. Most of the existing methods for rare variant association studies compare the number of rare mutations in a group of rare variants (in a gene or a pathway) between cases and controls. However, these methods assume that all causal variants are risk to diseases. Recently, several methods that are robust to the direction and magnitude of effects of causal variants have been proposed. However, they are applicable to unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. In this article, we propose two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. Using extensive simulation studies, we evaluate and compare our proposed methods with two methods based on the weights proposed by Madsen and Browning. Our results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning, especially when both risk and protective variants are present.
    Genetic Epidemiology 06/2012; 36(5):499-507. · 4.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop statistical methods for detecting rare variants that are associated with quantitative traits. We propose two strategies and their combination for this purpose: the iterative regression strategy and the extreme values strategy. In the iterative regression strategy, we use iterative regression on residuals and a multimarker association test to identify a group of significant variants. In the extreme values strategy, we use individuals with extreme trait values to select candidate genes and then test only these candidate genes. These two strategies are integrated into a hybrid approach through a weighting technology. We apply the proposed methods to analyze the Genetic Analysis Workshop 17 data set. The results show that the hybrid approach is the most powerful approach. Using the hybrid approach, the average power to detect causal genes for Q1 is about 40% and the powers to detect FLT1 and KDR are 100% and 68% for Q1, respectively. The powers to detect VNN3 and BCHE are 34% and 30% for Q2, respectively.
    BMC proceedings 11/2011; 5 Suppl 9:S112.
  • Source
    Adan Niu, Shuanglin Zhang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex diseases are presumed to be the result of multiple genes and environmental factors, which emphasize the importance of gene - gene and gene - environment interactions. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions with undetectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method was developed. MDR is well suited for examining high-order interactions and detecting interactions without main effects. Like most statistical methods in genetic association studies, MDR may also lead to a false positive in the presence of population stratification. Although many statistical methods have been proposed to detect main effects and control for population stratification using genomic markers, not many methods are available to detect interactions and control for population stratification at the same time. In this article, we developed a novel test, MDR in structured populations (MDR-SP), to detect the interactions and control for population stratification. MDR-SP is applicable to both quantitative and qualitative traits and can incorporate covariates. We present simulation studies to demonstrate the validity of the test and to evaluate its power.
    Annals of Human Genetics 11/2011; 75(6):742-54. · 2.22 Impact Factor
  • Qiuying Sha, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Testing for Hardy-Weinberg equilibrium (HWE) is used routinely as an important initial step for genotype data quality checking. Departure from HWE can be caused by many factors, such as genotyping errors, population stratification, and disease association, if we use affected individuals only. In a structured population, even if a marker is in HWE in each subpopulation, data may show departure from HWE if allele frequencies are different in different subpopulations and such a departure can be misinterpreted as a potential problem in genotyping quality, resulting in false exclusion from future analysis. In this article, we propose a new HWE test, a test for HWE in structured populations (HWES) that can assess departure from HWE and take into account of population stratification at the same time. Our proposed test can distinguish departure from HWE caused by population stratification and departure from HWE caused by other factors. We use simulation studies as well as applications to real data sets to evaluate the performance of the proposed test. Results show that, for a wide range of population structures, our proposed test has correct type I error rates while the traditional χ(2) test will lead to false-positive results. In homogenous populations, our proposed test has comparable power with the traditional χ(2) test.
    Genetic Epidemiology 08/2011; 35(7):671-8. · 4.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.
    Genetic Epidemiology 07/2011; 35(5):350-9. · 4.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683-691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209-213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification.
    PLoS ONE 01/2011; 6(7):e21957. · 3.53 Impact Factor
  • Source
    Shurong Fang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: A genome-wide association (GWA) study is an approach that involves scanning markers across the whole genome to find genetic variations that contribute to a particular disease. For complex diseases, joint effects of genes play an important role. Thus, detecting genes with modest marginal effects but strong joint effects with other genes is important. However, to evaluate the joint effects requires heavy computing in GWA studies. In this article, we propose a novel two-stage approach, which is promising to identify joint effects in GWA studies, especially for monotonic models. In the first stage, all markers are ranked by a single-marker test (SMT) which can assess the association of a single marker and the disease. Then we use a marker clustering algorithm to group the highly correlated markers within a certain physical distance and select a marker with the smallest p-value as a representative in each cluster. In the second stage, we test the two-locus joint effects of the selected representatives. We apply a likelihood ratio test (LRT) to test each two-locus joint effect under monotonic models. Comparing with a regular two-stage method, the proposed method not only does reduce the computational burden, it also increases the power by reducing the number of tests. We perform simulation studies to investigate the power and type I error rate under different scenarios. The results show that our two-stage approach is more powerful than a single-marker method (SM) and a regular two-stage analysis based on the two-locus genotypic test (GT).
    Joint Statistical Meetings; 08/2010
  • Source
    Zhaogong Zhang, Adan Niu, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a two-stage approach based on 17 biologically plausible models to search for two-locus combinations that have significant joint effects on the disease status in genome-wide association (GWA) studies. In the two-stage analyses, we only test two-locus joint effects of SNPs that show modest marginal effects. We use simulation studies to compare the power of our two-stage analysis with a single-marker analysis and a two-stage analysis by using a full model. We find that for most plausible interaction effects, our two-stage analysis can dramatically increase the power to identify two-locus joint effects compared to a single-marker analysis and a two-stage analysis based on the full model. We also compare two-stage methods with one-stage methods. Our simulation results indicate that two-stage methods are more powerful than one-stage methods. We applied our two-stage approach to a GWA study for identifying genetic factors that might be relevant in the pathogenesis of sporadic Amyotrophic Lateral Sclerosis (ALS). Our proposed two-stage approach found that two SNPs have significant joint effect on sporadic ALS while the single-marker analysis and the two-stage analysis based on the full model did not find any significant results.
    Annals of Human Genetics 07/2010; 74(5):406-15. · 2.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, Steen et al proposed a novel two-stage approach for family-based genome-wide association studies. In the first stage, a test based on between-family information is used to rank SNPs according to their P-values or conditional power of the test. In the second stage, the R most promising SNPs are tested using a family-based association test. We call this two-stage approach top R method. Ionita-Laza et al proposed an exponential weighting method within a two-stage framework. In the second stage of this approach, instead of testing top R SNPs, it tests all SNPs and weights the P-values of association test according to the information of the first stage. However, both of the top R and exponential weighting methods only use the information from the first stage to rank SNPs. It seems that the two methods do not use information from the first stage efficiently. Furthermore, it may be unreasonable for the exponential weighting method to use the same weight for all SNPs within a group when only one or a few SNPs are related with a disease. In this article, we propose a data-driven weighting scheme within a two-stage framework. In this method, we use the information from the first stage to determine a SNP-specific weight for each SNP. We use simulation studies to evaluate the performance of our method. The simulation results showed that our proposed method is consistently more powerful than the top R method and the exponential weighting method, regardless of the LD structure, population structure, and family structure.
    European journal of human genetics: EJHG 11/2009; 18(5):596-603. · 3.56 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently with the rapid improvements in high-throughout genotyping techniques, researchers are facing the very challenging task of analysing large-scale genetic associations, especially at the whole-genome level, without an optimal solution. In this study, we propose a new approach for genetic association analysis that is based on a variable-sized sliding-window framework and employs principal component analysis to find the optimum window size. With the help of the bisection algorithm in window-size searching, our method is more computationally efficient than available approaches. We evaluate the performance of the proposed method by comparing it with two other methods-a single-marker method and a variable-length Markov chain method. We demonstrate that, in most cases, the proposed method out-performs the other two methods. Furthermore, since the proposed method is based on genotype data, it does not require any computationally intensive phasing program to account for uncertain haplotype phase.
    Annals of Human Genetics 10/2009; 73(Pt 6):631-7. · 2.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Amyotrophic lateral sclerosis (ALS) is a fatal, degenerative neuromuscular disease characterized by a progressive loss of voluntary motor activity. About 95% of ALS patients are in "sporadic form"-meaning their disease is not associated with a family history of the disease. To date, the genetic factors of the sporadic form of ALS are poorly understood. We proposed a two-stage approach based on seventeen biological plausible models to search for two-locus combinations that have significant joint effects to the disease in a genome-wide association study (GWAS). We used a two-stage strategy to reduce the computational burden associated with performing an exhaustive two-locus search across the genome. In the first stage, all SNPs were screened using a single-marker test. In the second stage, all pairs made from the 1000 SNPs with the lowest p-values from the first stage were evaluated under each of the 17 two-locus models. we performed the two-stage approach on a GWAS data set of sporadic ALS from the SNP Database at the NINDS Human Genetics Resource Center DNA and Cell Line Repository http://ccr.coriell.org/ninds/. Our two-locus analysis showed that two two-locus combinations--rs4363506 (SNP1) and rs3733242 (SNP2), and rs4363506 and rs16984239 (SNP3) -- were significantly associated with sporadic ALS. After adjusting for multiple tests and multiple models, the combination of SNP1 and SNP2 had a p-value of 0.032 under the Dom intersection Dom epistatic model; SNP1 and SNP3 had a p-value of 0.042 under the Dom x Dom multiplicative model. The proposed two-stage analytical method can be used to search for joint effects of genes in GWAS. The two-stage strategy decreased the computational time and the multiple testing burdens associated with GWAS. We have also observed that the loci identified by our two-stage strategy can not be detected by single-locus tests.
    BMC Medical Genetics 10/2009; 10:86. · 2.54 Impact Factor
  • Source
    Qiuying Sha, Rui Tang, Shuanglin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : With the recent rapid improvements in high-throughout genotyping techniques, researchers are facing a very challenging task of large-scale genetic association analysis, especially at the whole-genome level, without an optimal solution. In this study, we propose a new approach for genetic association analysis based on a variable-sized sliding-window framework. This approach employs principal component analysis to find the optimal window size. Using the bisection algorithm in window size searching, the proposed method tackles the exhaustive computation problem. It is more efficient and effective than currently available approaches. We conduct the genome-wide association study in Genetic Analysis Workshop 16 (GAW16) Problem 1 data using the proposed method. Our method successfully identified several susceptibility genes that have been reported by other researchers and additional candidate genes for follow-up studies.
    BMC proceedings 01/2009; 3 Suppl 7:S14.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes, such as PTPN22, STAT4, and TRAF1-C5, have been identified. However, it is presumed that some genes may interact to have a significant effect on the disease, while each of them only plays a modest role. We propose a new combinatorial association test to detect the gene-gene interaction in the rheumatoid arthritis data using multiple traits: disease status, anti-cyclic citrullinated peptide, and immunoglobulin M. Existing gene-gene interaction tests only use the information on a single trait at a time. In this article, we propose a new multivariate combinatorial searching method that utilizes multiple traits at the same time. Multivariate combinatorial searching method is conducted by incorporating the multiple traits with various techniques of feature selection to search for a set of disease-susceptibility genes that may interact. By analyzing three panels of markers, we have identified a significant gene-gene interaction between PTPN22 and TRAF1-C5.
    BMC proceedings 01/2009; 3 Suppl 7:S43.
  • Source
    Adan Niu, Zhaogong Zhang, Qiuying Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT : The goal of this paper is to search for two-locus combinations that are jointly associated with rheumatoid arthritis using the data set of Genetic Analysis Workshop 16 Problem 1. We use a two-stage strategy to reduce the computational burden associated with performing an exhaustive two-locus search across the genome. In the first stage, the full set of 531,689 single-nucleotide polymorphisms was screened using univariate testing. In the second stage, all pairs made from the 500 single-nucleotide polymorphisms with the lowest p-values from the first stage were evaluated under each of 17 two-locus models. Our analyses identified a two-locus combination - rs6939589 and rs11634386 - that proved to be significantly associated with rheumatoid arthritis under a Rec x Rec model (p-value = 0.045 after adjusting for multiple tests and multiple models).
    BMC proceedings 01/2009; 3 Suppl 7:S26.