# Stephen W. Raudenbush's research while affiliated with University of Chicago and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (158)

In 2003, Chicago Public Schools introduced double-dose algebra, requiring two periods of math—one period of algebra and one of algebra support—for incoming ninth graders with eighth-grade math scores below the national median. Using a regression discontinuity design, earlier studies showed promising results from the program: For median-skill studen...

Early linguistic input is a powerful predictor of children’s language outcomes. We investigated two novel questions about this relationship: Does the impact of language input vary over time, and does the impact of time-varying language input on child outcomes differ for vocabulary and for syntax? Using methods from epidemiology to account for basel...

Social inequality in mathematical skill is apparent at kindergarten entry and persists during elementary school. To level the playing field, we trained teachers to assess children's numerical and spatial skills every 10 wk. Each assessment provided teachers with information about a child's growth trajectory on each skill, information designed to he...

For educational researchers, Al-Ubaydli et al. raise a crucial question: How can the science of scaling experimental innovations contribute to school improvement? By assessing how particular innovative programs work, why, for whom and under what conditions, experimenters test theories: about how children and youth learn, about how adults can collab...

Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more bro...

The present paper considers a fundamental question in evaluation research: “By how much do program effects vary across sites?” The paper first presents a theoretical model of cross-site impact variation and a related estimation model with a random treatment coefficient and fixed site-specific intercepts. This approach eliminates several biases that...

In 2003, Chicago launched “Double-Dose Algebra,” requiring students with pretest scores below the national median to take two periods of math–algebra and supplemental coursework. In many schools, assignment to Double Dose changed the peer composition of the algebra classroom. Using school-specific instrumental variables within a regression-disconti...

The present article provides a synthesis of the conceptual and statistical issues involved in using multisite randomized trials to learn about and from a distribution of heterogeneous program impacts across individuals and/or program sites. Learning about such a distribution involves estimating its mean value, detecting and quantifying its variatio...

Surveys of student perceptions produce multiple measures of classroom quality. This chapter explores which of these measures are most useful in predicting student learning. It introduces the Multilevel Variable Selection Model (MVSM) as standard methods of statistical prediction do not help much and can produce highly misleading results when the pr...

Does experience in school increase or reduce social inequality in skills? Sociologists have long debated this question. Drawing from the counterfactual account of causality, we propose that the impact of going to school on a given skill depends on the quality of the instructional regime a child will experience at school compared with the quality of...

The present paper, which is intended for a diverse audience of evaluation researchers, applied social scientists, and research funders, provides a broad overview of the conceptual and statistical issues involved in using multisite randomized trials to learn about and from variation in program effects across individuals, across policy-relevant and t...

We review findings from a four-year longitudinal study of language learning conducted on two samples: a sample of typically developing children whose parents vary substantially in socioeconomic status, and a sample of children with pre- or perinatal brain injury. This design enables us to study language development across a wide range of language l...

Differences in vocabulary that children bring with them to school can be traced back to the gestures they produced at the age of 1;2, which, in turn, can be traced back to the gestures their parents produced at the same age (Rowe & Goldin-Meadow, 2009a42.
Rowe, M. L. & Goldin-Meadow, S. (2009a). Differences in early gesture explain SES disparities...

The increasing availability of data from multisite randomized trials provides a potential opportunity to use instrumental variables (IV) methods to study the effects of multiple hypothesized mediators of the effect of a treatment. We derive nine assumptions needed to identify the effects of multiple mediators when using site-by-treatment interactio...

Many youth development programs aim to improve youth outcomes by raising the quality of social interactions occurring in groups such as classrooms, athletic teams, therapy groups, after-school programs, or recreation centers. As a result, evaluators are increasingly interested in determining whether such programs significantly improve “group qualit...

Most causal analyses in the social sciences depend on the assumption that each participant possesses a single potential outcome under each possible treatment assignment. Rubin (J Am Stat Assoc 81:961–962, 1986) labeled this the “stable unit treatment value assumption” (SUTVA). Under SUTVA, the individual-specific impact of a treatment depends neith...

Abstract This article extends single-level missing data methods to efficient estimation of a Q-level nested hierarchical general linear model given ignorable missing data with a general missing pattern at any of the Q levels. The key idea is to reexpress a desired hierarchical model as the joint distribution of all variables including the outcome t...

Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting qual...

Multisite trials can clarify the average impact of a new program and the heterogeneity of impacts across sites. Unfortunately, in many applications, compliance with treatment assignment is imperfect. For these applications, we propose an instrumental variable (IV) model with person-specific and site-specific random coefficients. Site-specific IV co...

For decades, social scientists have been trying to answer causal questions about the effectiveness of certain programs or policies. The conventional methodology for answering such causal questions relies on the “no interference between different units” assumption; that is, a unit’s outcome depends solely on the treatment that the unit is assigned t...

Children vary widely in the rate at which they acquire words--some start slow and speed up, others start fast and continue at a steady pace. Do early developmental variations of this sort help predict vocabulary skill just prior to kindergarten entry? This longitudinal study starts by examining important predictors (socioeconomic status [SES], pare...

This article addresses three questions: Does reduced class size cause higher academic achievement in reading, mathematics, listening, and word recognition skills? If it does, how large are these effects? Does the magnitude of such effects vary significantly across schools? The authors analyze data from Tennessee’s Student/Teacher Achievement Ratio...

To provide a method for any hospital to evaluate patient mortality using a hierarchical risk-adjustment equation derived from a reference sample.
American College of Surgeons National Trauma Data Bank (NTDB).
Hierarchical logistic regression models predicting mortality were estimated from NTDB data. Risk-adjusted hospital effects obtained directly...

In organizational studies involving multiple levels, the association between a covariate and an outcome often differs at different levels of aggregation, giving rise to widespread interest in ‘‘contextual effects models.’’ Such models partition the regression into within- and between-cluster components. The conventional approach uses each cluster’s...

Fixed effects models are often useful in longitudinal studies when the goal is to assess the impact of teacher or school characteristics on student learning. In this article, I introduce an alternative procedure: adaptive centering with random effects. I show that this procedure can replicate the fixed effects analysis while offering several compar...

The ability of school (or teacher) value-added models to provide unbiased estimates of school (or teacher) effects rests on a set of assumptions. In this article, we identify six assumptions that are required so that the estimands of such models are well defined and the models are able to recover the desired parameters from observable data. These a...

This article examines the power analyses for the first wave of group-randomized trials funded by the Institute of Education Sciences. Specifically, it assesses the precision and technical accuracy of the studies. The authors identified the appropriate experimental design and estimated the minimum detectable standardized effect size (MDES) for each...

A number of recent studies have used surveys of neighborhood informants and direct observation of city streets to assess aspects of community life such as collective efficacy, the density of kin networks, and social disorder. Raudenbush and Sampson (1999a) have coined the term “ecometrics” to denote the study of the reliability and validity of such...

The gap between Blacks and Whites in educational outcomes has narrowed dramatically over the past 60 years, but progress stopped around 1990. The author reviews research suggesting that increasing the quantity and quality of schooling can play a powerful role in overcoming racial inequality. To achieve that goal, he reasons, our knowledge of best i...

This volume considers the problem of quantitatively summarizing results from a stream of studies, each testing a common hypothesis. In the simplest case, each study yields a single estimate of the impact of some intervention. Such an estimate will deviate from the true effect size as a function of random error because each study uses a finite sampl...

The authors propose a strategy for studying the effects of time-varying instructional treatments on repeatedly observed student achievement. This approach responds to three challenges: (a) The yearly reallocation of students to classrooms and teachers creates a complex structure of dependence among responses; (b) a child’s learning outcome under a...

This paper provides practical guidance for researchers who are designing studies that randomize groups to measure the impacts of interventions on children. To do so, the paper: (1) provides new empirical information about the values of parameters that influence the precision of impact estimates (intra-class correlations and R-squares);(2) examines...

Many youth development programs are designed to improve youth outcomes by improving the quality of social interactions occurring in classrooms, athletic teams, therapy groups, after-school programs, recreation centers, or other group settings. In evaluating such programs, it becomes essential to assess the impact of the program on the "group qualit...

A dramatic shift in research priorities has recently produced a large number of ambitious randomized trials in K-12 education. In most cases, the aim is to improve student academic learning by improving classroom instruction. Embedded in these studies are theories about how the quality of classroom must improve if these interventions are to succeed...

Understanding the impact of “instructional regimes” on student learning is central to advancing educational policy. Research on instructional regimes has parallels with clinical trials in medicine yet poses unique challenges because of the social nature of instruction: A child’s potential outcome under a given regime depends on peers and teachers,...

Disparities in verbal ability, a major predictor of later life outcomes, have generated widespread debate, but few studies have been able to isolate neighborhood-level causes in a developmentally and ecologically appropriate way. This study presents longitudinal evidence from a large-scale study of >2,000 children ages 6–12 living in Chicago, along...

The development of model-based methods for incomplete data has been a seminal contribution to statistical practice. Under the assumption of ignorable missingness, one estimates the joint distribution of the complete data for thetainTheta from the incomplete or observed data y(obs). Many interesting models involve one-to-one transformations of theta...

Many youth development programs are designed to improve youth outcomes by improving the quality of social interactions occurring in classrooms, athletic teams, therapy groups, after-school programs, recreation centers, or other group settings. In evaluating such programs, it becomes essential to assess the impact of the program on the “group qualit...

Hierarchical data from many small clusters arise by necessity and by design. They arise by necessity when the aim is to study married couples [1], identical twins [25], siblings [12], paired comparison tasks [2], cooperative learning groups [36], multiple informants of child social behavior [20], and studies of animal reproduction [35]. They arise...

Interest has rapidly increased in studies that randomly assign classrooms or schools to interventions. When well implemented, such studies eliminate selection bias, providing strong evidence about the impact of the interventions. However, unless expected impacts are large, the number of units to be randomized needs to be quite large to achieve adeq...

The age at which a child receives a cochlear implant seems to be one of the more important predictors of his or her speech and language outcomes. However, understanding the association between age at implantation and child outcomes is complex because a child's age, length of device use, and age at implantation are highly related. In this study, we...

In many surveys, responses to earlier questions determine whether later questions are asked. The probability of an affirmative response to a given item is therefore nonzero only if the participant responded affirmatively to some set of logically prior items, known as “filter items.” In such surveys, the usual conditional independence assumption of...

This article considers the policy of retaining low-achieving children in kindergarten rather than promoting them to first grade. Under the stable unit treatment value assumption (SUTVA) as articulated by Rubin, each child at risk of retention has two potential outcomes: Y(1) if retained and Y(0) if promoted. But SUTVA is questionable, because a chi...

We examined whether retail tobacco outlet density was related to youth cigarette smoking after control for a diverse range of neighborhood characteristics.
Data were gathered from 2116 respondents (aged 11 to 23 years) residing in 178 census tracts in Chicago, Ill. Propensity score stratification methods for continuous exposures were used to adjust...

This article considers the policy of retaining low-achieving children in kindergarten rather than promoting them to first grade. Under the stable unit treatment value assumption (SUTVA) as articulated by Rubin, each child at risk of retention has two potential outcomes: Y(1) if retained and Y(0) if promoted. But SUTVA is questionable, because a chi...

Applications of group trajectory modeling summarize individual histories in a language that is broadly accessible to clinicians. This strength depends on the belief that a population consists, at least roughly, of a small number of subgroups whose members display similar records of behavior. In this view, the purpose of longitudinal research is to...

Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory...

In most visual preference surveys, citizens are shown a sample of scenes and asked to rate them on a preference scale. Scenes are then classified by type, and for each scene type, statistics are computed. In the end, results may suggest that one scene type is preferred to another, but that is about all that can be said. In this article, we offer an...

Grade retention has been controversial for many years, and current calls to end social promotion have lent new urgency to this issue. On the one hand, a policy of retaining in grade those students making slow progress might facilitate instruction by making classrooms more homogeneous academically. On the other hand, grade retention might harm high-...

In recent years, several studies in the medical and health service research literature have advocated the use of hierarchical statistical models (multilevel models or random-effects models) to analyze data that are nested (eg, patients nested within hospitals). However, these models are computer-intensive and complicated to perform. There is virtua...

Education research is an interdisciplinary effort long characterized by methodological diversity. Why, then, do we hear an urgent call for mixed methods now? Apparently, a recent shift in the applied research agenda has fostered concern that methodological pluralism is at risk. In this article, the author argues that (a) a focus on evaluating the e...

We analyzed key individual, family, and neighborhood factors to assess competing hypotheses regarding racial/ethnic gaps in perpetrating violence. From 1995 to 2002, we collected 3 waves of data on 2974 participants aged 8 [corrected] to 25 years living in 180 Chicago neighborhoods, augmented by a separate community survey of 8782 Chicago residents...

This article reveals the grounds on which individuals form perceptions of disorder. Integrating ideas about implicit bias and statistical discrimination with a theoretical framework on neighborhood racial stigma, our empirical test brings together personal interviews, census data, police records, and systematic social observations situated with- in...

Global propositions about the problems and prospects of program evaluation obscure the circumstances in which social science research can contribute to social problem solving.

The noncentrality parameter for the noncentral F is a precision-weighted sum of squares of treatment means, which is closely related to the test statistic and effect size. The two common effect size estimates are not based on the uniformly minimum variance unbiased (UMVU) estimate of the noncentrality parameter. The UMVU estimate of the noncentrali...

The question of how to estimate school and teacher contributions to student learning is fundamental to educational policy and practice, and the three thoughtful articles in this issue represent a major advance. The current level of public confusion about these issues is so severe and the consequences for schooling so great that it is a big relief t...

Multilevel statistical models have become increasingly popular among public health researchers over the past decade. Yet the enthusiasm with which these models are being adopted may obscure rather than solve some problems of statistical and substantive inference. We discuss the three most common applications of multilevel models in public health: (...

To determine the relationship between urban sprawl, health, and health-related behaviors.
Cross-sectional analysis using hierarchical modeling to relate characteristics of individuals and places to levels of physical activity, obesity, body mass index (BMI), hypertension, diabetes, and coronary heart disease.
U.S. counties (448) and metropolitan ar...

Purpose:
To determine the relationship between urban sprawl, health, and health-related behaviors.
Design:
Cross-sectional analysis using hierarchical modeling to relate characteristics of individuals and places to levels of physical activity, obesity, body mass index (BMI), hypertension, diabetes, and coronary heart disease.
Setting:
U.S. cou...

In studying correlates of social behavior, attitudes, and beliefs, a measurement model is required to combine information across a large number of item responses. Multiple constructs are often of interest, and covariates are often multilevel (e.g., measured at the person and neighborhood level). Some item–level missing data can be expected. This pa...

Many researchers who study the relations between school resources and student achievement have worked from a causal model, which typically is implicit. In this model, some resource or set of resources is the causal variable and student achievement is the outcome. In a few recent, more nuanced versions, resource effects depend on intervening influen...

As interest in the social sciences and public health increasingly turns to the integration of individual, family, and neighborhood processes, a potential mismatch arises in the quality of measures. Standing behind individual measurement are decades of research, producing measures that often have excellent statistical properties. In contrast, much l...

Differences in maternal characteristics only partially explain the lower birth weights of infants of African-American women. It is hypothesized that economic and social features of urban neighborhoods may further account for these differences. The authors conducted a household survey of 8,782 adults residing in 343 Chicago, Illinois, neighborhoods...

This paper considers the quantitative assessment of ecological settings such as neighborhoods and schools. Available administrative data typically provide useful but limited information on such settings. We demonstrate how more complete information can be reliably obtained from surveys and observational studies. Survey-based assessments are constru...

Much social and behavioral research involves hierarchical data structures. . . . Recent developments in the statistical theory of hierarchical linear models now afford an integrated set of methods for such applications.
This introductory text explicates the theory and use of hierarchical linear models (HLM) through rich, illustrative examples and...

Consider a study in which 2 groups are followed over time to assess group differences in the average rate of change, rate of acceleration, or higher degree polynomial effect. In designing such a study, one must decide on the duration of the study, frequency of observation, and number of participants. The authors consider how these choices affect st...

This article investigates the efficiency and robustness of alternative estimators of regression coefficients for three-level data. To study student achievement, researchers might formulate a standard regression model or a hierarchical model with a two- or three-level structure. Having chosen the model, the researchers might employ either a model-ba...

Highlighting resource inequality, social processes, and spatial interdependence, this study combines structural characteristics from the 1990 census with a survey of 8,872 Chicago residents in 1995 to predict homicide variations in 1996–1998 across 343 neighborhoods. Spatial proximity to homicide is strongly related to increased homicide rates, adj...

This review considers statistical analysis of data from studies that obtain repeated measures on each of many participants. Such studies aim to describe the average change in populations and to illuminate individual differences in trajectories of change. A person-specific model for the trajectory of each participant is viewed as the foundation of a...

Places the hierarchical linear models into a broader context. Arguing that restrictions imposed by software and estimation ought not to limit the researchers' imagination, the author discusses a variety of models for various types of change, in various metrics. Among these, similarities are considered between hiearchical linear and structural equat...

This article considers an analytic strategy for measuring and modeling child and adolescent problem behaviors. The strategy embeds an item response model within a hierarchical model to define an interval scale for the outcomes, to assess dimensionality, and to study how individual and contextual factors relate to multiple dimensions of problem beha...

This article compares the degree to which educational attainment and cognitive skill, individually and together, serve to explain labor force outcomes (occupational status and earnings). Although the same antecedent factors affect both of them and they both are associated with labor force outcomes, they are not redundant measures. They are affected...

The multisite trial, widely used in mental health research and education, enables experimenters to assess the average impact of a treatment across sites, the variance of treatment impact across sites, and the moderating effect of site characteristics on treatment efficacy. Key design decisions include the sample size per site and the number of site...

In accelerated longitudinal design, one samples multiple age cohorts and then collects longitudinal data on members of each cohort. The aim is to study age-outcome trajectories over a broad age span during a study of short duration. A threat to valid inference is the Age x Cohort interaction effect. S. W. Raudenbush and W. S. Chan (1993) developed...

Nested random effects models are often used to represent similar processes occurring in each of many clusters. Suppose that, given cluster-specific random effects b, the data y are distributed according to f(y|b, Θ), while b follows a density p(b|Θ). Likelihood inference requires maximization of ∫ f(y|b, Θ)p(b|Θdb with respect to Θ. Evaluation of t...

Using data collected under the Trial State Assessment (TSA) of the National Assessment of Educational Progress (NAEP), this article describes and illustrates a two-stage statistical model for investigating state-to-state variation in mathematics achievement. At the first stage, within each state, a two-level hierarchical linear model is estimated v...

Researchers commonly ask whether relationships between exogenous predictors, X, and outcomes, Y, are mediated by a third set of variables, Z. Simultaneous equations decompose the relationship between X and Y into an indirect component, operating through Z, and a direct component, the relationship between X and Y given Z. Often, X, Y, and/or Z are m...

This article assesses the sources and consequences of public disorder. Based on the videotaping and systematic rating of more than 23,000 street segments in Chicago, highly reliable scales of social and physical disorder for 196 neighborhoods are constructed. Census data, police records, and an independent survey of more than 3,500 residents are th...

This article considers social and ethnic inequality in access to resources for mathematics learning in eighth grade: favorable school disciplinary climate, advanced course offerings, teacher subject-matter preparation, and emphasis on reasoning during classroom discourse. Data are from 41 states and territories1 participating in the 1992 Trial Stat...

Bayesian analysis of hierarchically structured data with random intercept and heterogeneous within-group (Level-1) variance is presented. Inferences about all parameters, including the Level-1 variance and intercept for each group, are based on their marginal posterior distributions approximated via the Gibbs sampler Analysis of artificial data wit...

Few would deny that the civil rights and women's movements have substantially changed U.S. society. Yet ethnic and gender inequality in employment and earnings remain large. Even when comparisons are confined to persons of similar educational attainment, African Americans and Hispanic Americans earn less than European Americans, women earn less tha...

This study reports on the development of a structured interview, My Exposure to Violence (My ETV), that was designed to assess child and youth exposure to violence. Eighty participants between the ages of 9 and 24 were assessed. Data from My ETV were fit to a Rasch model for rating scales, a technique that generates interval level measures and allo...

This study reports on the development of a structured interview, My Exposure to Violence (My ETV), that was designed to assess child and youth exposure to violence. Eighty participants between the ages of 9 and 24 were assessed. Data from My ETV were fit to a Rasch model for rating scales, a technique that generates interval level measures and allo...

Empirical researchers commonly invoke instrumental variable (IV) assumptions to identify treatment effects. This paper considers what can be learned under two specific violations of those assumptions: contaminated and corrupted data. Either of these violations prevents point identification, but sharp bounds of the treatment effect remain feasible....

## Citations

... In [36], mixed linear regression analysis is applied to a large database to establish the correlation between the student's relationship and their academic performance. Finally, ref. [37] evaluated the impact of sequences of parents' input on children's language outcomes. Although all these applications are longitudinal or level studies, they are focused on selecting the best model for prediction, but no one addresses the issue of optimizing data recollection along time. ...