Ian Lundberg’s research while affiliated with Cornell University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


The origins of unpredictability in life outcome prediction tasks
  • Article

June 2024

·

19 Reads

·

6 Citations

Proceedings of the National Academy of Sciences

Ian Lundberg

·

Rachel Brown-Weinstock

·

Susan Clampet-Lundquist

·

[...]

·

Matthew J Salganik

Significance Scientists and decision-makers routinely make life outcome predictions: they use information from the past to predict what will happen to someone in the future. These predictions, whether made by human experts or algorithms, are often used to guide actions. Yet despite advances in artificial intelligence and predictive algorithms, life outcome predictions can be surprisingly inaccurate. We investigate the origins of this unpredictability through in-depth, qualitative interviews with 40 carefully selected families who are part of a multidecade research study. Their stories suggest origins of unpredictability that may apply broadly. Those who rely on predictions to inform high-stakes decisions about people should anticipate that life outcomes may be difficult to predict, even despite growing access to data and improved predictive algorithms.


Researcher reasoning meets computational capacity: Machine learning for social science

October 2022

·

40 Reads

·

28 Citations

Social Science Research

Computational power and big data have created new opportunities to explore and understand the social world. A special synergy is possible when social scientists combine human attention to certain aspects of the problem with the power of algorithms to automate other aspects of the problem. We review selected exemplary applications where machine learning amplifies researcher coding, summarizes complex data, relaxes statistical assumptions, and targets researcher attention to further social science research. We aim to reduce perceived barriers to machine learning by summarizing several fundamental building blocks and their grounding in classical statistics. We present a few guiding principles and promising approaches where we see particular potential for machine learning to transform social science inquiry. We conclude that machine learning tools are increasingly accessible, worthy of attention, and ready to yield new discoveries for social research.


Researcher reasoning meets computational capacity: Machine learning for social science

May 2022

·

19 Reads

Computational power and digital data have created new opportunities to explore and understand the social world. A special synergy is possible when social scientists combine human attention to certain aspects of the problem with the power of algorithms to automate other aspects of the problem. We review selected exemplary applications where machine learning amplifies researcher coding, summarizes complex data, relaxes statistical assumptions, and targets researcher attention. We then seek to reduce perceived barriers to machine learning by summarizing several fundamental building blocks and their grounding in classical statistics. We present a few guiding principles and promising approaches where we see particular potential for machine learning to transform social science inquiry. We conclude that machine learning tools are accessible, worthy of attention, and ready to yield new discoveries.


The Gap-Closing Estimand: A Causal Approach to Study Interventions That Close Disparities Across Social Categories

January 2022

·

50 Reads

·

31 Citations

Sociological Methods & Research

Disparities across race, gender, and class are important targets of descriptive research. But rather than only describe disparities, research would ideally inform interventions to close those gaps. The gap-closing estimand quantifies how much a gap (e.g., incomes by race) would close if we intervened to equalize a treatment (e.g., access to college). Drawing on causal decomposition analyses, this type of research question yields several benefits. First, gap-closing estimands place categories like race in a causal framework without making them play the role of the treatment (which is philosophically fraught for non-manipulable variables). Second, gap-closing estimands empower researchers to study disparities using new statistical and machine learning estimators designed for causal effects. Third, gap-closing estimands can directly inform policy: if we sampled from the population and actually changed treatment assignments, how much could we close gaps in outcomes? I provide open-source software (the R package gapclosing) to support these methods.


What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory

June 2021

·

161 Reads

·

243 Citations

American Sociological Review

We make only one point in this article. Every quantitative study must be able to answer the question: what is your estimand? The estimand is the target quantity—the purpose of the statistical analysis. Much attention is already placed on how to do estimation; a similar degree of care should be given to defining the thing we are estimating. We advocate that authors state the central quantity of each analysis—the theoretical estimand—in precise terms that exist outside of any statistical model. In our framework, researchers do three things: (1) set a theoretical estimand, clearly connecting this quantity to theory; (2) link to an empirical estimand, which is informative about the theoretical estimand under some identification assumptions; and (3) learn from data. Adding precise estimands to research practice expands the space of theoretical questions, clarifies how evidence can speak to those questions, and unlocks new tools for estimation. By grounding all three steps in a precise statement of the target quantity, our framework connects statistical evidence to theory.


Comment: Summarizing Income Mobility with Multiple Smooth Quantiles Instead of Parameterized Means

July 2020

·

10 Reads

·

2 Citations

Sociological Methodology

Studies of economic mobility summarize the distribution of offspring incomes for each level of parent income. Mitnik and Grusky (2020) highlight that the conventional intergenerational elasticity (IGE) targets the geometric mean and propose a parametric strategy for estimating the arithmetic mean. We decompose the IGE and their proposal into two choices: (1) the summary statistic for the conditional distribution and (2) the functional form. These choices lead us to a different strategy-visualizing several quantiles of the offspring income distribution as smooth functions of parent income. Our proposal solves the problems Mitnik and Grusky highlight with geometric means, avoids the sensitivity of arithmetic means to top incomes, and provides more information than is possible with any single number. Our proposal has broader implications: the default summary (the mean) used in many regressions is sensitive to the tail of the distribution in ways that may be substantively undesirable.


Does Opportunity Skip Generations? Reassessing Evidence From Sibling and Cousin Correlations

July 2020

·

14 Reads

·

17 Citations

Demography

Sibling (cousin) correlations are empirically straightforward: they capture the degree to which siblings’ (cousins’) socioeconomic outcomes are similar. At face value, these quantities seem to summarize something about how families constrain opportunity. Their meaning, however, is complicated. One empirical set of sibling and cousin correlations can be generated from a multitude of distinct theoretical processes. I illustrate this problem in the context of multigenerational mobility: the relationship between the incomes of an ancestor and a descendant separated by several generations in a family. When cousins’ outcomes are similar (an empirical fact), prior authors have favored the particular theoretical interpretation that extended kin affect life chances through pathways not involving the parents of the focal individual. I show that this evidence is consistent with alternative theories of latent transmission (measurement error) or dynamic transmission (a parent-to-child transmission process that changes over generations). Theoretical assumptions are required to lend meaning to a point estimate. Further, I show that point estimates alone may be misleading because they can be highly uncertain. To facilitate uncertainty estimation for the key test statistic, I develop a Bayesian procedure to estimate sibling and cousin correlations. I conclude by outlining how future research might use sibling and cousin correlations as effective descriptive quantities while remaining cognizant that these quantities could arise from a variety of distinct theoretical processes.


Government Assistance Protects Low‐Income Families from Eviction

June 2020

·

67 Reads

·

27 Citations

Journal of Policy Analysis and Management

A lack of affordable housing is a pressing issue for many low‐income American families and can lead to eviction from their homes. Housing assistance programs to address this problem include public housing and other assistance, including vouchers, through which a government agency offsets the cost of private market housing. This paper assesses whether the receipt of either category of assistance reduces the probability that a family will be evicted from their home in the subsequent six years. Because no randomized trial has assessed these effects, we use observational data and formalize the conditions under which a causal interpretation is warranted. Families living in public housing experience less eviction conditional on pre‐treatment variables. We argue that this evidence points toward a causal conclusion that assistance, particularly public housing, protects families from eviction.


Fig. 4. Heatmaps of the squared prediction error for each observation in the holdout data. Within each heatmap, each row represents a team that made a qualifying submission (sorted by predictive accuracy), and each column represents a family (sorted by predictive difficulty). Darker colors indicate higher squared error; scales are different across subfigures; order of rows and columns are different across subfigures. The hardest-to-predict observations tend to be those that are very different from the mean of the training data, such as children with unusually high or low GPAs (SI Appendix, section S3). This pattern is particularly clear for the three binary outcomes-eviction, job training, layoff-where the errors are large for families where the event occurred and small for the families where it did not.
Measuring the predictability of life outcomes with a scientific mass collaboration
  • Article
  • Full-text available

March 2020

·

1,070 Reads

·

260 Citations

Proceedings of the National Academy of Sciences

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.

Download

Setting the Target: Precise Estimands and the Gap Between Theory and Empirics

January 2020

·

32 Reads

·

4 Citations

The link between theory and quantitative empirical evidence is a longstanding hurdle in sociological research. Ambiguity about the role that statistical evidence plays in an argument may produce misleading conclusions and poor methodological practice. This ambiguity could be reduced if researchers would state the theoretical estimand---the central quantity at the core of a given paper---in precise language. Our approach envisions three choices in the research process: (1) choice of a theoretical estimand, which will be informative for theory, (2) choice of an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) choice of an estimation strategy to learn the empirical estimand from data. Key advantages of this approach include improved clarity on the object of interest, transparency about how empirical evidence contributes to knowledge of that quantity, and the ability to easily plug in new statistical tools for estimation.


Citations (13)


... Scientists have long prioritized knowledge about what predicts an outcome. Interest, however, is turning towards how predictable an outcome is (e.g., [16]) and specifically to decomposing 'predictability' in social systems and life prediction tasks [63]. Having metrics that can be readily used to understand the degree of randomness in a given predictive system is thus highly desirable. ...

Reference:

The InterModel Vigorish (IMV) as a flexible and portable approach for quantifying predictive accuracy with binary outcomes
The origins of unpredictability in life outcome prediction tasks
  • Citing Article
  • June 2024

Proceedings of the National Academy of Sciences

... Thus, there is a potential to urge rethinking of other areas of social science research to "a more sequential, interactive, and ultimately inductive approach to inference" [22,27]. The combination of human attention to certain aspects of the object of inquiry and the power of algorithms to automate other aspects of the same object of inquiry would allow a special synergy, capable of overcoming the limitations of each of the disciplines when used alone [31]. ...

Researcher reasoning meets computational capacity: Machine learning for social science
  • Citing Article
  • October 2022

Social Science Research

... First, traditional approaches do not clarify the assumptions that permit a causal interpretation of the effects of interest. In order to estimate how a disparity would change after some intervention, we need to think about counterfactuals (Lundberg 2022). In the present study, for instance, we ask how the gender gap in STEM would change if women had the same self-efficacy as men. ...

The Gap-Closing Estimand: A Causal Approach to Study Interventions That Close Disparities Across Social Categories
  • Citing Article
  • January 2022

Sociological Methods & Research

... Our 'estimand' or target parameter is the Average Treatment Effect on the Treated (ATT) for the lockdown and post-lockdown periods (Lundberg et al., 2021). Specifically, for the lockdown period we define ...

What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory
  • Citing Article
  • June 2021

American Sociological Review

... Одна из последних дискуссий о правильности оценивания межпоколенческой мобильности развернулась на страницах журнала «Sociological Methodology» (Lundberg, Stewart, 2020) как реакция на статью профессора Стэнфордского университета П. Митника и его коллег (Mitnik, Bryant, Weber, 2019). Авторы обратили внимание на то, что до сих пор, несмотря на огромное число работ, нет правильных оценок межпоколенческой мобильности. ...

Comment: Summarizing Income Mobility with Multiple Smooth Quantiles Instead of Parameterized Means
  • Citing Article
  • July 2020

Sociological Methodology

... One such pathway involves the concept of social transmission and influence, highlighting how relational resources beyond the parent-child relationship contribute to human behavior and status attainment (Jaeger, 2012;Mare, 2011). Although it is widely acknowledged that relevant resources reside not only in the core but also in the periphery of the kinship network, identification of their effects has proved challenging (Lundberg, 2020). Despite such challenges, our results suggest that continued research into extended kin as a source of socialization and as a resource for status attainment will remain fruitful. ...

Does Opportunity Skip Generations? Reassessing Evidence From Sibling and Cousin Correlations
  • Citing Article
  • July 2020

Demography

... Marcal (2022), for example, finds that housing affordability protects against child maltreatment. Another group of studies has evaluated the effects of housing assistance policies and programmes, such as public housing and rental assistance, on various outcomes for families and communities who experience housing insecurity (Denary et al. 2023;Fenelon et al. 2023;Kim et al. 2017;Lundberg et al. 2021). Denary et al. (2023), for instance, found that low-income tenants who receive rental assistance are less likely to experience food insecurity. ...

Government Assistance Protects Low‐Income Families from Eviction
  • Citing Article
  • June 2020

Journal of Policy Analysis and Management

... This has enabled accurate predictions of a number of different behaviors, for example, suicide attempts and risk (C. R. Cox et al., 2020;Gradus et al., 2020;Walsh et al., 2017), environmental behaviors (Lavelle-Hill et al., 2020), life outcomes Lavelle-Hill et al., 2024;Salganik et al., 2020;Savcisens et al., 2024), psychological constructs (Donnellan, Aslan, et al., 2022;Gruda & Hasan, 2019;Jach et al., 2024;Stachl et al., 2020;Youyou et al., 2015), developmental trajectories (Karch et al., 2015;Van Lissa et al., 2023), hiring decisions (Liem et al., 2018), clinical diagnoses (Dinga et al., 2018;Lei et al., 2022) as well as treatment outcomes (Chekroud et al., 2016;Fabbri et al., 2018;Jankowsky et al., 2022). ...

Measuring the predictability of life outcomes with a scientific mass collaboration

Proceedings of the National Academy of Sciences

... The two choices we raise-distributional summary and functional form-are applicable in any regression context. Researchers often equate the research goal with the coefficient of a regression model, but we advocate a more conscious choice of estimand (Lundberg, Johnson, and Stewart 2020). Research constrained to the study of parameterized means risks obscuring important sources of evidence. ...

Setting the Target: Precise Estimands and the Gap Between Theory and Empirics
  • Citing Preprint
  • January 2020

... As was highlighted in the reflections offered during the Fragile Families Challenge, bench-marks stimulate the need for an infrastructural alignment (Lundberg et al., 2019). The requirements of comparability and replicability that are inherent to benchmarks greatly increase the demands for interoperability and automation within researchers' workflows. ...

Privacy, Ethics, and Data Access: A Case Study of the Fragile Families Challenge

Socius Sociological Research for a Dynamic World