# Nan M. Laird's research while affiliated with Harvard University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (289)

BACKGROUND: The association between genetic variants on the X chromosome to risk of COPD has not been fully explored. We hypothesize that the X chromosome harbors variants important in determining risk of COPD related phenotypes and may drive sex differences in COPD manifestations. METHODS: Using X chromosome data from three COPD-enriched cohorts o...

SARS‐CoV‐2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS‐CoV‐2 genome affect pathogenicity is poorly understood. Starting in October 2020, using the methodology of genome‐wide association studies (GWAS), we looked at the association between whole‐genome sequencing (WGS) data of the vi...

Genetic variation in the viral genome sequence may contribute to the increased COVID-19 mortality. Although biological follow-up experiments are needed for functional validation, early containment of highly pathogenic viral strains during a pandemic may require early intervention when biostatistical extreme associations are identified.

Motivation
Analysis of rare variants in family-based studies remains a challenge. Transmission-based approaches provide robustness against population stratification, but the evaluation of the significance of test statistics based on asymptotic theory can be imprecise. In addition, power will depend heavily on the choice of the test statistic and on...

Background
SARS-CoV-2 mortality has been extensively studied in relation to host susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is poorly understood. Whole-genome sequencing (WGS) of the virus with death in SARS-CoV-2 patients is one potential method of early identification of highly pathogenic strains to targ...

Analysis of rare variants in family-based studies remains a challenge. To perform a region/set-based association analysis of rare variants in family-based studies, we propose a general methodological framework that integrates higher criticism, maximum, SKATs, and burden approaches into the family-based association testing (FBAT) framework. Using th...

This chapter discusses two alternative approaches for handling missing data: multiple imputation and weighting methods. Both approaches are appealing in settings where a conventional likelihood based analysis is no longer straightforward. Multiple imputation is a flexible method for handling missing data that has recently been implemented in numero...

This chapter reviews the main features of linear fixed effects models for longitudinal data and discusses their potential advantages and disadvantages relative to linear mixed effects models. It highlights the key differences between the two modeling approaches using a numerical illustration. The chapter discusses a mixed effects model for longitud...

This chapter reviews three general models for missing data that differ in terms of assumptions concerning whether missingness is related to observed and unobserved responses. It discusses the implicit assumptions about missing data that underlie the methods for longitudinal analysis. The chapter illustrates the main distinctions between the three g...

This chapter considers the design of a longitudinal study and focuses on the determination of sample size and power for longitudinal studies. It reviews sample size formulas for a univariate continuous response in a cross‐sectional study design. The chapter emphasizes the main considerations in determining how large the sample size needs to be to a...

This chapter introduces some vector and matrix notation and present a general linear regression model for longitudinal data. It considers some elementary descriptive methods for exploring longitudinal data, especially trends in the mean response over time. With longitudinal data, the covariance among the repeated measures can be expected to have ce...

This chapter presents a method for analyzing longitudinal data that imposes minimal structure or restrictions on the mean response over time and on the covariance among the repeated measures. The method focuses on analyzing response profiles and can be applied to longitudinal data when the design is balanced, with the timing of the repeated measure...

This chapter describes how generalized linear models can be extended to longitudinal data. These models are known as generalized linear mixed effects models (GLMMs). The chapter examines first the linear mixed effects model as a generalized linear model, albeit one with both fixed and random effects, and then looks at how the ideas underlying the l...

This chapter views the regression paradigm as a very flexible and versatile approach for analyzing longitudinal and correlated data arising from many different types of studies. To highlight some of the distinctive features of longitudinal and clustered data, it introduces four examples drawn from studies in the biomedical sciences. They are treatm...

This chapter describes two broad approaches for describing patterns of change in the mean response over time: polynomial trends and linear splines. One widely adopted approach for analyzing longitudinal data is to describe the patterns of change in the mean response over time in terms of simple polynomial trends, for example, linear or quadratic tr...

This chapter discusses the various covariance pattern models for longitudinal data. It first considers some of the implications of the correlation among longitudinal data, before looking at approaches for modeling the covariance or correlation among repeated measures. The chief advantage of an "unstructured" co‐variance is that no assumptions are m...

This chapter provides a non‐technical summary of the most salient features of generalized linear models for a single, univariate response. It presents a detailed and somewhat more technical overview of generalized linear models. Generalized linear models provide a unified method for analyzing diverse types of univariate responses. Generalized linea...

This chapter considers an approach for extending generalized linear models to longitudinal data that leads to a class of regression models that are known as marginal models. The term marginal in this context indicates that the model for the mean response depends only on the covariates of interest, and not on any random effects or previous responses...

The induced random effects covariance structure can often be described with relatively few parameters, regardless of the number and timing of the measurement occasions. Because linear mixed effects models explicitly distinguish between fixed and random effects, they allow the analysis of between‐subject and within‐subject sources of variation in th...

This chapter presents a broad overview of the main objectives of longitudinal analysis and some of the defining features of longitudinal data. It emphasizes that the major focus of the analysis of longitudinal data is on the assessment of within‐individual changes in the response variable over time. The chapter reviews the most salient features of...

This chapter examines a framework for estimation of the unknown parameters, (3 and 0), and begins by considering the maximum likelihood (ML) method of estimation in a simpler case where all observations can be assumed to be independent. The least squares estimate derived is the value produced by any standard statistical software for linear regressi...

Marginal models have a three‐part specification in terms of a regression model for the mean response, supplemented by assumptions concerning the variance of the response at each occasion and the pairwise within‐subject association among the responses. The generalized estimating equations (GEE) approach provides a convenient alternative to maximum l...

This chapter focuses on alternative methods of estimation and inference for generalized linear mixed effects models (GLMMs). It reviews two approximate methods of estimation known as penalized quasi‐likelihood (PQL) and marginal quasi‐likelihood(MQL). Both of these approximate methods have been implemented in various statistical software packages a...

This chapter provides a description of regression models for multilevel data as longitudinal data which are a special case of multilevel data, with only a single level of clustering and a natural ordering of the measurements within a cluster. It demonstrates that many of the methods for the analysis of longitudinal data considered are, more or less...

Residuals can be used to assess the adequacy of the fitted model and can also indicate the presence of outliers. Methods for residual analyses are well developed for standard regression settings with independent observations on a univariate response. Although residuals from a univariate linear regression are uncorrected with the covariates, the res...

This chapter examines semiparametric regression techniques, referred as "smoothing" methods, which do not require strong assumptions concerning the functional form of the pattern of change in the mean response. It focuses on a class of methods referred to as "penalized splines." There is a mixed effects model representation of penalized spline mode...

This chapter focuses on the comparison and differentiation of marginal and mixed effects models for longitudinal data. There are a number of important distinctions between these two broad classes of models that go beyond simple differences in approaches to accounting for the within‐subject association. The chapter emphasizes that these two classes...

This chapter discusses the application of methods for longitudinal data to closely related study designs. In these settings individuals have multiple commensurate measurements made under different circumstances and possibly also at different times. The first design that is considered is the classical repeated measures design. In this setting each s...

The authors surveyed the types of statistical methods used in the Journal during 2015. Investigators used a greater variety statistical methods, and the methods were more sophisticated than those used in earlier epochs.

Whole-exome sequencing using family data has identified rare coding variants in Mendelian diseases or complex diseases with Mendelian subtypes, using filters based on variant novelty, functionality, and segregation with the phenotype within families. However, formal statistical approaches are limited. We propose a gene-based segregation test (GESE)...

Rationale:
Emphysema has considerable variability in the severity and distribution of parenchymal destruction throughout the lungs. Upper lobe predominant emphysema has emerged as an important predictor of response to lung volume reduction surgery. Yet, aside from alpha-1 antitrypsin deficiency, the genetic determinants of emphysema distribution r...

Rationale:
Chronic obstructive pulmonary disease (COPD) susceptibility is in part related to genetic variants. Most genetic studies focus on genome-wide common variants without specific focus on coding variants, but common and rare coding variants may also affect COPD susceptibility.
Objectives:
To identify coding variants associated with COPD....

The focus of this article is on regression models for a binary response. In most fields of application, logistic regression has become the standard method for relating a binary response to a set of covariates. In this article the main features of logistic regression are described and some aspects of interpretation of logistic regression are illustr...

The focus of this article is on a class of regression models that are widely used for the analysis of cluster-correlated data. Linear mixed models are a natural extension of classical linear regression models that allow for the incorporation of random effects to account for the correlation among repeated measures on the same individual or cluster....

Pulmonary function decline is a major contributor to morbidity and mortality among smokers. Post bronchodilator FEV 1 and FEV 1 /FVC ratio are considered the standard assessment of airflow obstruction. We performed a genome-wide association study (GWAS) in 9919 current and former smokers in the COPDGene study (6659 non-Hispanic Whites [NHW] and 326...

Many correlated disease variables are analyzed jointly in genetic studies in the hope of increasing power to detect causal genetic variants. One approach involves assessing the relationship between each phenotype and each SNP individually and using a Bonferroni correction for the effective number of tests conducted. Alternatively, one can apply a m...

Chronic obstructive pulmonary disease (COPD) is defined by the presence of airflow limitation on spirometry, yet COPD subjects can have marked differences in CT imaging. These differences may be driven by genetic factors. We hypothesized that a genome-wide association study of quantitative imaging would identify loci not previously identified in an...

The revolution in next-generation sequencing has made obtaining both common and rare high-quality sequence variants across the entire genome feasible. Because researchers are now faced with the analytical challenges of handling a massive amount of genetic variant information from sequencing studies, numerous methods have been developed to assess th...

Background
The genetic risk factors for susceptibility to chronic obstructive pulmonary disease (COPD) are still largely unknown. Additional genetic variants are likely to be identified by genome-wide association studies in larger cohorts or specific subgroups. We sought to identify risk loci for moderate to severe and severe COPD with data from se...

Rationale Previous studies of chronic obstructive pulmonary disease (COPD) have suggested that genetic factors play an important role in the development of disease. However, SNPs that are associated with COPD in GWAS studies have been shown to account for only a small percentage of the genetic variance in phenotypes of COPD, such as spirometry and...

Background
We previously reported that asthmatic children with GSTM1 null genotype may be more susceptible to the acute effect of ozone on the small airways and might benefit from antioxidant supplementation. This study aims to assess the acute effect of ozone on lung function (FEF25-75) in asthmatic children according to dietary intake of vitamin...

Creation of genotype score by counting the number of risk alleles. Table S2. Basal characteristics of the study population. Table S3. Air pollution levels during the study from the Mexico City monitoring network, 1998-2004. Table S4. Effect of ozone on FEF25–75 (per 1-hr 60 ppb on the day prior to spirometric test) according to genotype. Table S5....

Longitudinal data modeling Data collected in a longitudinal study consist of repeated measures made on individuals over time. In this chapter we show how longitudinal data can be put into the multilevel framework, and we also highlight important distinctions between longitudinal and general multilevel data. Longitudinal data can be considered a spe...

Anorexia nervosa and bulimia nervosa (BN) are rare, but eating disorders not otherwise specified (EDNOS) are relatively common among female participants. Our objective was to evaluate whether BN and subtypes of EDNOS are predictive of developing adverse outcomes.
This study comprised a prospective analysis of 8594 female participants from the ongoi...

Over the past few years, association analysis has become the primary tool for finding genes that underlie complex traits. Both population-based and family-based designs are commonly used designs in genetic association studies. Recent technological advances in exome and whole genome sequencing afford the next generation of sequence-based association...

Recent advances in next-generation sequencing technologies have made it possible to generate large amounts of sequence data with rare variants in a cost-effective way. Statistical methods that test variants individually are underpowered to detect rare variants, so it is desirable to perform association analysis of rare variants by combining the inf...

Identifying population stratification and genotyping error are important for candidate gene association studies using the Transmission Disequilibrium Test (TDT). Although the TDT retains the prespecified Type I error in the presence of population stratification, the test may have decreased power in the presence of population stratification. Genotyp...

An understanding of the basic ideas of inheritance has been evident throughout the history of mankind, ever since the domestication
of animals or the practice of farming began. The Babylonians and ancient Egyptians utilized cross pollination of crops and
selection of domesticated animals for breeding, but did not develop a formal theory for the pri...

In the absence of genetic data at the molecular level, the results of heritability, aggregation and/or segregation analysis
provided the first hints about the presence of genetic effects and, consequently, the existence of a disease gene. Without
information on the etiology of the disease or gene functionality, the next natural question is: ‘Where...

The study of allele frequencies and how they vary over time and over geographic regions has led to many discoveries concerning
evolutionary history, migration, gene flow, and the correlation between allele frequencies and disease rates across populations.
This chapter covers only a few concepts from population genetics, emphasizing those most relev...

In clinical trials multiple outcomes are often used to assess treatment interventions. This paper presents an evaluation of likelihood-based methods for jointly testing treatment effects in clinical trials with multiple continuous outcomes. Specifically, we compare the power of joint tests of treatment effects obtained from joint models for the mul...

Issues of multiple-testing and statistical significance in genomewide association studies (GWAS) have prompted statistical methods utilizing prior data to increase the power of association results. Using prior findings from genome-wide linkage studies on bipolar disorder (BPD), we employed a weighted false discovery approach (wFDR; [Roeder et al. 2...

The popularity of the EM algorithm owes much to the 1977 paper by Dempster,
Laird and Rubin. That paper gave the algorithm its name, identified the general
form and some key properties of the algorithm and established its broad
applicability in scientific research. This review gives a nontechnical
introduction to the algorithm for a general scienti...

Investigators sometimes use information obtained from multiple informants about a given variable. We focus on estimating the effect of a predictor on a continuous outcome, when that (true) predictor cannot be observed directly but is measured by 2 informants. We describe various approaches to using information from 2 informants to estimate a regres...

It is useful to have robust gene-environment interaction tests that can utilize a variety of family structures in an efficient way. This article focuses on tests for gene-environment interaction in the presence of main genetic and environmental effects. The objective is to develop powerful tests that can combine trio data with parental genotypes an...

Simulation model 1.
(0.04 MB PDF)

Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequenc...

Capitalizing on the presence of both risk and protective variants.
(0.11 MB PDF)

Relationship between odds ratios and frequencies for the simulated scenarios.
(0.56 MB PDF)

Power estimates when individual variants' PAR are sampled from an exponential distribution.
(0.05 MB PDF)

Type-1 diabetes dataset.
(0.04 MB PDF)

Statistical tests of association are commonly used to confirm or exclude a relationship between disease and selected genes. Tests which are based on data from unrelated individuals are very popular, but can be biased if the sample contains individuals with different genetic ancestries. Family-based Association tests avoid the problem of bias due to...

In this chapter we review specialized and advanced topics that are beyond the scope that can be covered in detail in an introductory text book. However, the topics are important research areas and the interested reader is encouraged to follow-up our brief introduction with the specialized literature.

This book covers the statistical models and methods that are used to understand human genetics, following the historical and recent developments of human genetics. Starting with Mendels first experiments to genome-wide association studies, the book describes how genetic information can be incorporated into statistical models to discover disease gen...

The key requirement for genetic association, linkage disequilibrium (LD), is a short distance property that extends only for a limited physical distance across the human genome. As we showed in Chapter 7, if there is low LD between the genotyped marker and the DSL, there will be low power to detect association between the disease and the DSL. In th...

Compositional epistasis is said to be present when the effect of a genetic factor at one locus is masked by a variant at another locus. Although such compositional epistasis is not equivalent to the presence of an interaction in a statistical model, non-standard tests can sometimes be used to detect compositional epistasis. In this paper we conside...

The goal of linkage analysis in human disease gene mapping is to assess whether an observed genetic marker locus is physically
linked to the disease locus. This is equivalent to testing the null-hypothesis that the recombination fraction between the
marker locus and the disease locus, θ, equals ½. In this case, we say the marker locus and the disea...

Genetic association studies using population-based designs have distinct features that make them an attractive approach for
gene mapping. Similar to epidemiological studies, they typically use unrelated individuals. As a consequence, the study recruitment
is relatively easy and the statistical analysis is straight-forward to implement using standar...

Aggregation and heritability analyses are designed to show that diseases, or phenotypes more generally, have a genetic basis by investigating patterns
of phenotypic correlation between relatives; segregation analysis is used to find support for a specific genetic model underlying the inheritance patterns observed in families. They
all involve model...

The use of family data has a long history in genetics, for association as well as linkage and segregation. Perhaps the simplest
and most intuitively obvious example involving association analysis is a study comparing the genotypes in cases with the genotypes
in their unaffected siblings. By using an unaffected sibling as a control, we eliminate iss...

A genetic association analysis is not fundamentally different from any other statistical association analysis. The objective is to establish an association between two variables: a disease trait and a genetic marker. The disease trait can be dichotomous, a measured variable, such as lung function or a quantitative measure of obesity, or time-to-ons...

Our goal was to identify candidate polymorphisms that could influence overall survival (OS) in advanced non-small cell lung cancer (NSCLC) patients treated with carboplatin (CBDCA) and paclitaxel (PTX).
Chemotherapy-naïve stage IIIB or IV NSCLC patients treated with CBDCA (area under the curve = 6 mg/mL/min) and PTX (200 mg/m, 3-hour period) were e...

In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. describe how the auxiliary informati...

Genome-Wide Association Studies (GWAS) offer an exciting and promising new research avenue for finding genes for complex diseases. Traditional case-control and cohort studies offer many advantages for such designs. Family-based association designs have long been attractive for their robustness properties, but robustness can mean a loss of power. In...

Longitudinal studies are an important tool for analysing traits that change over time, depending on individual characteristics and environmental exposures. Complex quantitative traits, such as lung function, may change over time and appear to depend on genetic and environmental factors, as well as on potential gene-environment interactions. There i...

Inter-individual variations in drug response are all-too common and, throughout medical history have often posed problems, many of them serious ones. The variations could stem from multiple factors, which include those of both the host (age, genetic and environmental factors) and disease (pathophysiological phenotypes, somatic mutations in case of...

The recent emergence of massively parallel sequencing technologies has enabled an increasing number of human genome re-sequencing studies, notable among them being the 1000 Genomes Project. The main aim of these studies is to identify the yet unknown genetic variants in a genomic region, mostly low frequency variants (frequency less than 5%). We pr...

When testing for genetic effects, failure to account for a gene-environment interaction can mask the true association effects of a genetic marker with disease. Family-based association tests are popular because they are completely robust to population substructure and model misspecification. However, when testing for an interaction, failure to mode...