Model of an analyst's reasoning process.

Model of an analyst's reasoning process.

Source publication
Article
Full-text available
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists' gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instanc...

Contexts in source publication

Context 1
... codings led to a proposed model of the data analyst's reasoning process and workflow ( Figure 6). The model seeks to capture the iterative interplay between understandings of the dataset and hypotheses to be tested, the analyst's knowledge and beliefs, the actions and methods actually performed during the analysis, and insights gained. ...
Context 2
... a series of iterative loops, analysts engage in this ongoing retrospective development to build and interpret mental models and schemas that make sense of the data they are confronted with. The model in Figure 6 was empirically derived and, to the best of our knowledge, is the first to provide a detailed, data grounded overview of the behavioral factors involved in the data analysis process. ...
Context 3
... the DataExplained portal, each participating researcher provided step-by-step explanations for her or his analytic decisions. Qualitative analyses of these reasonings about quantitative decisions led to the model of iterative research decision making shown in Figure 6. The DataExplained website (https://dataexplained.net/) is available for researchers who wish to carefully document their analytic decisions and justifications for them, either individually or as a crowd (see also the code in Supplement 9 and video demonstration at https://goo. ...
Context 4
... is confirmed by the actual analyses that differed in all respects; no two analyses were similar with respect to all analytic choices or the number of observations. Second, it may also be that the unpredictability of the outcome of the analysis reflects the nature of research in the social sciences; arbitrary choices may result in arbitrary outcomes of the analysis (see Figure 6). Of course, it is of paramount importance for social science research to distinguish the most important cause of diverging outcomes of multi-analyst projects. ...
Context 5
... codings led to a proposed model of the data analyst's reasoning process and workflow ( Figure 6). The model seeks to capture the iterative interplay between understandings of the dataset and hypotheses to be tested, the analyst's knowledge and beliefs, the actions and methods actually performed during the analysis, and insights gained. ...
Context 6
... a series of iterative loops, analysts engage in this ongoing retrospective development to build and interpret mental models and schemas that make sense of the data they are confronted with. The model in Figure 6 was empirically derived and, to the best of our knowledge, is the first to provide a detailed, data grounded overview of the behavioral factors involved in the data analysis process. ...
Context 7
... the DataExplained portal, each participating researcher provided step-by-step explanations for her or his analytic decisions. Qualitative analyses of these reasonings about quantitative decisions led to the model of iterative research decision making shown in Figure 6. The DataExplained website (https://dataexplained.net/) is available for researchers who wish to carefully document their analytic decisions and justifications for them, either individually or as a crowd (see also the code in Supplement 9 and video demonstration at https://goo. ...
Context 8
... is confirmed by the actual analyses that differed in all respects; no two analyses were similar with respect to all analytic choices or the number of observations. Second, it may also be that the unpredictability of the outcome of the analysis reflects the nature of research in the social sciences; arbitrary choices may result in arbitrary outcomes of the analysis (see Figure 6). Of course, it is of paramount importance for social science research to distinguish the most important cause of diverging outcomes of multi-analyst projects. ...

Citations

... Interestingly, such rhetorical considerations align with a recurring theme in multiverse analysis: reasoning about multiple potential paths of analysis [57,70]. Depending on the analysis objectives [74] and the application domain [40], analysts must carefully evaluate analytic plans and execute them faithfully-a task that presents its own set of challenges [55]. ...
Preprint
Full-text available
Mining and conveying actionable insights from complex data is a key challenge of exploratory data analysis (EDA) and storytelling. To address this challenge, we present a design space for actionable EDA and storytelling. Synthesizing theory and expert interviews, we highlight how semantic precision, rhetorical persuasion, and pragmatic relevance underpin effective EDA and storytelling. We also show how this design space subsumes common challenges in actionable EDA and storytelling, such as identifying appropriate analytical strategies and leveraging relevant domain knowledge. Building on the potential of LLMs to generate coherent narratives with commonsense reasoning, we contribute Jupybara, an AI-enabled assistant for actionable EDA and storytelling implemented as a Jupyter Notebook extension. Jupybara employs two strategies -- design-space-aware prompting and multi-agent architectures -- to operationalize our design space. An expert evaluation confirms Jupybara's usability, steerability, explainability, and reparability, as well as the effectiveness of our strategies in operationalizing the design space framework with LLMs.
... Importantly, this piece of evidence represents one view of the hypothesis under one set of data. Using the same data set, the myriad choices made by the scientist (from cleaning the data, to choice of transformations of the data, to what covariates are included in a model and how they are included) can affect the direction, magnitude, and significance of an effect [5]. In addition, the data set is but a sample from a particular subpopulation. ...
... Conclusions from a single study are part of a broader effort that examines the hypothesis in different subpopulations and under different modeling choices. Assessing the plausibility of evidence from a particular study, and understanding why two studies might yield different conclusions, requires a high degree of transparency [5,6], including but not limited to access to the nature of the subpopulation under study, how the data were collected and cleaned, and what modeling choices were made. ...
Article
Full-text available
Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML–explainable ML, which allows us to “peek into the black box,” and interpretable ML, which encourages using algorithms that are inherently interpretable–have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning—like the flexibility ML algorithms can provide for fitting complex patterns—while avoiding potential pitfalls—at best, like wasted effort and money, and at worst, like misguided clinical recommendations—that may arise when trying to integrate findings from ML algorithms into domain knowledge. Key Points Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε , but differ fundamentally in motivation and approach. The hypothetico-deductive framework—in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis—is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.
... In total, these papers examine analytical variability for five different hypotheses. We identified five more published multianalyst studies, but these did not meet our inclusion criteria: the study by Bastiaansen et al. (99) details the variation in analytic decisions across analysis teams but does not report estimates pertaining to each of the proposed analysis pipelines; the study by Botvinik-Nezer et al. (100) was excluded as the primary outcome reported by analysis teams is a binary classification of whether the hypotheses are supported by the data, but no effect size measure is reported; the study by Schweinsberg et al. (101) was excluded since the individual results by analysts are not available in standardized effect-size units but only in terms of z-scores; the study by Menkveld et al. (102) was excluded as the data are yet embargoed; and the study by Breznau et al. (103) was excluded as the research teams reported various results for the same hypothesis, and it is not clear which effect size estimate to include for each team. Note that the reported variation in results across analysts is also very large for the five excluded studies. ...
Article
Full-text available
A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.
... A multiversal method is a procedure involving (1) a phase of collection of a systematically differentiated multiplicity of alternative specifications of the same unitary regression model, and (2) a comparative evaluation of fit statistics regarding the estimation of the proprieties of the regression coefficient of the main regressor variable, commonly referred as 'effect size' in studies oriented towards causal inference 3 . Multiverse Analysis is not the only possible instance of a multiversal methodology, since one can collect specifications of the scientific modelling of a unitary scientific hypothesis from many teams (Schweinsberg et al. 2021;Breznau et al. 2022). However, Multiverse Analysis as exposed in Steegen et al. (2016) remains the canonical application of multiversal methodology, whereas Vibration of Effect (Patel et al. 2015) or many teams (Breznau et al. 2022) are alternative methods with different epistemological premises. ...
Article
Full-text available
Multiverse analysis involves systematically sampling a vast set of model specifications, known as a multiverse, to estimate the uncertainty surrounding the validity of a scientific claim. By fitting these specifications to a sample of observations, statistics are obtained as analytical results. Examining the variability of these statistics across different groups of model specifications helps to assess the robustness of the claim and gives insights into its underlying assumptions. However, the theoretical premises of multiverse analysis are often implicit and not universally agreed upon. To address this, a new formal categorisation of the analytical choices involved in modelling the set of specifications is proposed. This method of indexing the specification highlights that the sampling structure of the multiversal sample does not conform to a model of independent and identically distributed draws of specifications and that it can be modelled as an information network instead. Hamming's distance is proposed as a measure of network distance, and, with an application to a panel dataset, it is shown how this approach enhances transparency in procedures and inferred claims and that it facilitates the check of implicit parametric assumptions. In the conclusions, the proposed theory of multiversal sampling is linked to the ongoing debate on how to weigh a multi-verse, including the debate on the epistemic value of crowdsourced multiverses.
... A series of empirical studies demonstrate how independent research groups proceed data analysis starting with the same dataset apply different data analysis workflows and that therefore data analysis is not determined by the given data. Instead the data analysis and outcome have shown a high variability of employed algorithms, data transformation and correction strategies as well as applied statistical techniques between the research groups resulting in different results and conclusions (Silberzahn et al. 2018;Botvinik-Never et al. 2020;Schweinsberg et al. 2021;Massey et al. 2022). ...
Chapter
Full-text available
Statistics is an important tool for sociological research. The institutionalization of sociological study programs since the second part of the twentieth century and sociology as a profession would not have been established without statistical training and quantitative analyses of social data. Monitoring and representing society, studying its structures and dynamics can only be done by applying computational skills, modern data analysis software and statistical approaches. The establishment of survey programs and durable data infrastructures, the rise of datafication and the Internet have facilitated sociological research and support conditions for teaching in statistics (especially because of open access to data, advanced statistical open access software like R and cheaper as well as more powerful computer technology). But there are other changes in sociology, in statistics, in the field of sciences as well as in society itself, which question this advantageous picture for the relation of sociology and statistics as well as the conditions and perspectives for statistics training in sociology. It is not a new experience for sociology to be challenged or to be questioned, but as a discipline, sociology has always worked out its specific contribution to developing and evaluating social policies, to analyze society in all its dimensions and to exercise social critique based on scientific analysis, too.
... In the example that they give, making the choices of what to focus upon that they did gain certain insights, but different choices might have led to other insights, possibly quite different and even apparently contradictory insights. Research suggests that this impact of researcher choices is an issue even where much more defined analytical methods are used to interpret one single data set (Schweinsberg et al., 2021). If research is to have a role in transforming society for the better, it needs to inspire trust in as many actual and potential stakeholders as possible. ...
Chapter
Full-text available
... We refer to this kind of heterogeneity in effect sizes as "design heterogeneity," and it is measured as the between-study variance in true effect sizes across the experimental designs in the random-effects meta-analysis. Previous work has documented low to moderate heterogeneity in effect sizes across populations (25)(26)(27) and substantial heterogeneity in effect sizes across analytical decisions (28)(29)(30)(31); however, for design heterogeneity, systematic evidence is scarce (22). We eliminate population heterogeneity by randomly allocating participants to different designs, and we preempt analytical heterogeneity by standardizing the analyses across designs in analytic approach B. Thus, by design, any variation not attributable to sampling variation (i.e., a study's SE) is due to design heterogeneity. ...
Article
Full-text available
Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis. competition | moral behavior | metascience | generalizability | experimental design
... However, even by committing to a single analytic path, the conclusion might still be less rigorous, as multiple justifiable paths might exist and different paths might produce diverging conclusions. In crowdsourcing data analysis, well-intentioned experts still produced largely different analysis outcomes when independently analyzing the same dataset [19,36,38]. ...
... Building these synthetic datasets allow us to control the data generating process and tease apart interesting properties of the multiverse to study each property in isolation. We also run the benchmark on a multiverse analysis in the wild [36] to demonstrate the utility of the sampling algorithms in real-world scenarios. We now describe the general scheme for constructing the synthetic dataset. ...
... We then construct non-sensitive decisions by setting every option to a baseline option. Multiverses may also contain certain rare conditions -the number of universes adopting a particular option is smaller compared to the number of universes adopting other options (e.g., [36]). We capture these by simulating procedural dependencies, which exclude invalid combinations from the Cartesian product decision space. ...
Preprint
Full-text available
A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multiverse sensitivity and (2) monitoring visualizations for assessing progress and controlling execution on the fly. We evaluate how quickly three sampling-based algorithms converge to accurately rank sensitive decisions in both synthetic and real multiverse analyses. Compared to uniform random sampling, round robin and sketching approaches are 2 times faster in the best case, while on average estimating sensitivity accurately using 20% of the full multiverse. To enable analysts to stop early to fix errors or decide when results are "good enough" to move forward, we visualize both effect size and decision sensitivity estimates with confidence intervals, and surface potential issues including runtime warnings and model quality metrics.
... The definition of possible data pre-processing choices is challenging since these choices are sometimes "hidden", i.e., they are typically not discussed in great detail in a publication and some choices are completely omitted. Two recent multi-analyst experiments (Huntington-Klein et al., 2021;Schweinsberg et al., 2021), in which multiple teams of researchers were asked to answer the same research question on the ...
Article
Full-text available
Researchers have great flexibility in the analysis of observational data. If combined with selective reporting and pressure to publish, this flexibility can have devastating consequences on the validity of research findings. We extend the recently proposed vibration of effects approach to provide a framework comparing three main sources of uncertainty which lead to instability in empirical findings, namely data pre-processing, model, and sampling uncertainty. We analyze the behavior of these sources for varying sample sizes for two associations in personality psychology. Through the joint investigation of model and data pre-processing vibration, we can compare the relative impact of these two types of uncertainty and identify the most influential analytical choices. While all types of vibration show a decrease for increasing sample sizes, data pre-processing and model vibration remain non-negligible, even for a sample of over 80000 participants. The increasing availability of large data sets that are not initially recorded for research purposes can make data pre-processing and model choices very influential. We therefore recommend the framework as a tool for transparent reporting of the stability of research findings.
... Finally, building on recent calls to move beyond WEIRD 14 sources of data (Henrich et al. 2010, Kitayama 2017 we obtained evidence for our theory from different cultural contexts, including a midincome developing country. Recent research also shows that empirical results are sensitive to measurement and analytical choices (Schweinsberg et al. 2021). Our three studies use different empirical designs, data sources, and measurements for our main variables. ...
Article
Connecting otherwise disconnected individuals and groups—spanning structural holes—can earn social network brokers faster promotions, higher remuneration, and enhanced creativity. Organizations also benefit through improved communication and coordination from these connections between knowledge silos. Neglected in prior research, however, has been theory and evidence concerning the psychological costs to individuals of engaging in brokering activities. We build new theory concerning the extent to which keeping people separated (i.e., tertius separans brokering) relative to bringing people together (i.e., tertius iungens brokering) results in burnout and in abusive behavior toward coworkers. Engagement in tertius separans brokering, relative to tertius iungens brokering, we suggest, burdens people with onerous demands while limiting access to resources necessary to recover. Across three studies, we find that tertius separans leads to abusive behavior of others, mediated by an increased experience of burnout on the part of the broker. First, we conducted a five-month field study of burnout and abusive behavior, with brokering assessed via email exchanges among 1,536 university employees in South America. Second, we examined time-separated data on self-reported brokering behaviors, burnout, and coworker abuse among 242 employees of U.S. organizations. Third, we experimentally investigated the effects of the two types of brokering behaviors on burnout and abusive behavior for 273 employed adults. The results across three studies showed that tertius separans brokering puts the broker at an increased risk of burnout and subsequent abusive behavior toward others in the workplace. Funding: E. Quintane received funding from Ernst & Young GmbH Wirtschaftsprüfungsgesellschaft.