ArticlePDF Available

New Ways of Seeing Big Data

rAcademy of Management Journal
2019, Vol. 62, No. 4, 971978.
Few topics have received as much recent attention
from researchers across disciplines, practitioners,
policymakers, and popular media as big data.Yet,
from our experiences on the Academy of Manage-
ment Journal editorial team, we believe a great deal
of ambiguity and even confusion still prevails
around key questions such as: What does big data
encompass? Does big data mean the end of theory? In
what ways does big data research differ from con-
ventional scientific methods of inquiry in manage-
ment research? What does it take to publish a big data
study in management journals?
Therefore, our aim in this editorial is to offer in-
sights into big data aligned with our editorial teams
focus on new ways of seeing.We readily ac-
knowledge that big data can stretch our theoretical
reach and expand the repertoire of methodological
approaches for studying management phenomena
in new ways. As a pervasive but emergent business
phenomenon, big data lends fruitful opportunities
for management scholars to not only challenge,
change, and extend existing theories, but also to
inform the practice of big data through system-
atic investigation. Big data also provides valuable
analytical and visualization tools to supplement,
turbocharge, and even transform some areas of
management researchsuch as the use of unstruc-
tured data, real-time data processing, and pattern
recognition. At the same time, big data also makes it
necessary to revisit some research assumptions,
practices, processes, and tools developed against the
backdrop of constrained data considerations.
We contend that the field will be in a stronger po-
sition to take advantage of big data opportunities
and to avoid the pitfallswhen we not only transfer
knowledge from other disciplines, but also en-
gage in the coproduction of knowledge on big data.
To that end, we start by clarifying the logic of big
data from a research perspective, arguing that
management researchers may enrich the perspec-
tive in some important ways. We next outline re-
search opportunities that can leverage the strengths
of big data and management scholarship to mutual
advantage. We then conclude the editorial with a
host of suggestions for overcoming the basic barriers
to publishing these research opportunities in the
fields journals. Together, the three themes(1)
enriching the perspective, (2) leveraging strengths
to mutual advantage, and (3) overcoming barriers to
publishingenable us to offer new insights about
how and in what ways management scholarship
might shape the content and evolutionary trajectory
of knowledge on big data. They also enable an in-
tegrated discussion on the core issues of big data
the paradigmatic and methodological, as well as the
conceptual and phenomenological.
Our message is one of optimism tempered with
realism. We believe that innovative research ap-
proaches concurrently leveraging the power of big
data and the plurality of theoretical and empirical
approaches can complement to advance both man-
agement research and big data practices. But, even as
the need and opportunities for such innovations are
manifold, they remain complex, challenging, and
perhaps risky pursuits for individual researchers.
We hope that this editorial, then, will serve as a
springboard for those wishing to move the conver-
sation from a one-way emphasis on the implications
of big data to a two-way dialogue for advancing both
big data and management scholarship.
A bigger-picture reflection on big data as a research
approach should, we suggest, be a part of any di-
alogue on big data in the field because of the im-
plications it holds for research questions, model
construction, designing research, data collection,
and analyzing and visualizing data. The perspective
has been characterized in several ways, as follows:
(a) from theory or small-sample data to be interpreted
by humans to processing huge amounts of data to
reach data-driven discoveries (Elragal & Klischewski,
2017); (b) from causality to patterns and correla-
tions in the data (Mayer-Sch ¨
onberger & Cukier,
2013); (c) from testing a theory to insights born from
the data (Kitchin, 2014); and (d) the prominence
and status acquired by data as commodity and
recognized output (Leonelli, 2014). These all seem
reasonable descriptions, but, in our judgment, what
Copyright of the Academy of Management, all rights reserved. Contents may not be copied, emailed, posted to a listserv, or otherwise transmitted without the copyright holders express
written permission. Users may print, download, or email articles for individual use only.
truly anchors the approach is the law of large
numbers”—the notion that, with enough data and
samples, errors (uncertainty) are bound to surren-
der to certainty (Succi & Coveney, 2018). As Cohen
(2013: 1921) stated, big datasclaimstoepiste-
mological privilege stem from its asserted fidelity
to reality at a very high level of detail.
There has been considerable concern with the
perspective as a sort of empiricism on steroidsthat
involves gathering and going through data to find
patterns and making predictions about dependen-
cies and causation (Frick´
e, 2015; Sætra, 2018). We
also observe that big data applications so far appear
to have predominantly tackled the question of what
is happening now and likely happens next. For ex-
ample, a common focus is not on why a single vari-
able might explain an outcome variable, but how the
outcome varies with many potential predictors
with or without theory as to which predictors are
relevant (Einav & Levin, 2014).
We would argue against the assertion that the
perspective diminishes the importance of causal
adequacy and depth in research. Because data are a
means to an end, big datas informativeness to reach
justifiable conclusions matters more than its volume,
velocity, or variety (Bowman, 2018). A correlational
finding, for example, may not morph into a causal
one by simply increasing the volume, variety, and
velocity of the underlying data. The real issue, how-
ever, is not data per se, but the perspective that un-
dergirds the manner in which data are considered,
collected, curated, and investigated (Coveney,
Dougherty, & Highfield, 2016). Specifically, we con-
cur with others that the claim that researchers need
not start with theory but could rather acquire more
objective insights and explanation from big data
models and analyses is tenuous and unconvincing
(Chan & Moses, 2016; Sætra, 2018).
To the contrary, given the complexity and re-
source requirements of accessing and processing big
data sets, it seems to us asking the right questions is
crucial. With no theory guiding the questions, an
explanation of what is going on, and why, may not
be adequately addressed. Moreover, an enhanced
ability to detect correlations and clusters in the
data can hardly substitute for theory to provide a
stronger foundation with which to avoid errors and
derive appropriate inferences from these correla-
tions. Without theory, thus, pure big data ap-
proaches in the management field could routinely
fail to provide conceptual accounts for the mana-
gerial phenomena and processes to which they are
appliedas has been observed with some other
disciplines as well, such as biology and medicine
(Coveney et al., 2016).
Indeed, the ideal of pure empiricism or pure in-
duction seldom works per se, and no theory can be
so good as to supplant the need for data and testing
(Calude & Longo, 2017). Thus, maybe the in-
terpretation and use of big data as a perspective
should resemble what is generally seen as abduc-
tion(the combination of deductive and inductive
logics to derive causal inferences). If so, what are the
implications for our predominant model of knowl-
edge production and use? Abductive research
involves a logic of discovery and doubt (Locke,
Golden-Biddle, & Feldman, 2008), and such dispo-
sitions and capabilities warrant further attention
with big data use. For example, how might big data
patterns serve as a source for the development of a
new theory, which is then further elaborated and
tested deductively? And, more broadly, how might a
big data perspective be made more theory driven for
investigating managerial phenomena?
The question of when and under what conditions
a big data approach could produce managerially
actionable insights better than smallerhigh-quality
data, and vice versa, is also intriguing to consider.
We suspect that, as the situational complexity and
ambiguity increases in an organizational decision,
process, or system, the comparative advantage of
big data may decrease, especially when data quality
is mixed and systematic biases (unknown) exist.
Collecting data from millions of individuals may
provide little benefit in improving predictive accu-
racy, for example, if only a subset causes the most
variance in the data. More broadly, big data might not
perform well if data quality does not permit true
replicability of the models and a rich understanding
of the specific sources of instability in the models
(Oswald & Putka, 2016). As Succi and Coveney
(2018: 11) observed:
In the end, most of [big data] comes down to more or
less sophisticated forms of curve fitting based on error
minimization. Such minimization procedures fare
well if the error landscape is smooth, but they exhibit
fragility towards corrugated ones in other situations,
which are the rule in complex systems.
Finally, the perspective inherently demands that
the process of data exploration be contextually in-
formed, but the wider context is often entirely side-
stepped. What might management researchin
particular, qualitative researcherssay about the
how and why of the context in the perspectives
enrichment? Without such enrichments, big data
972 AugustAcademy of Management Journal
models in management research could experience
slow progress and a higher failure rate, as well as
hindering the researchers ability to understand the
failuresroot causes.
Beyond a richer perspective, we also encourage
attention to big data research aligned with the fields
scholarly strengths and priorities. This is where the
two-way dialogue could lead to specific advances
and enable management researchers to envisage
the diverse routes for a sustainable synthesis of
management and big data scholarship. To facilitate
an organized approach, we next discuss a frame-
work of big data as a concept, methodology, and
Given big datas diverse uses across settings, dis-
ciplines, and applications, the concept is in danger
of becoming everything and nothing.The popular
definition in terms of data properties such as volume
and variety has created ambiguity about what might
count as big data. For example, it is not entirely clear
what determines the threshold to qualify data as
bigacross different settings and applications.
Management researchers with a strong emphasis on
clearer definitions and constructs could help ad-
vance the current definitional ambiguity by moving
the conversation toward more encompassing un-
derstanding on the domain, boundaries, and pre-
cision of big data concepts and constructs. Our own
working definition is to view big data as a label
that refers to the generation, organization, storage,
retrieval, analysis, and visualization of data sets in-
volving large volumes and a variety of data, involv-
ing new kinds of methodological, epistemological,
and politico-ethical issues and questions.
Relatedly, even as researchers have devoted at-
tention to the dimensions of big data, a consensus
is yet to emerge. Three are prevalent: volume (the
magnitude of data), variety (structural heterogeneity
in a data set), and velocity (the rate at which data are
generated and speed at which they are analyzed and
used) (Tonidandel, King, & Cortina, 2018). But, re-
searchers have also advanced other dimensions
(curiously, many start with the letter v), such as
veracity, vision, visibility, and value, among others.
Each dimension poses distinct challenges and ways
to overcome them for researchers and managers in
accessing, storing, and utilizing big data. For exam-
ple, the velocity dimension is associated with is-
sues such as transfer speed, storage scalability,
and timing, while veracity comes with issues such
as uncertainty, authenticity, trustworthiness, and
accountability. An examination of substantive re-
search questions, the level of analysis, and the theo-
retical lenses used to construct hypotheses and
propositions call for clarity regarding these di-
mensional manifestations. Such lower-level order-
ing, classification, or other aggregation of issues
and characteristics across big data dimensions could
also serve as a foundation for the development of
clearer big data constructs for testing. A conceptual
understanding of big data characteristics that could
help with the generation of big data sets with a story
about managers and organizations represents a
promising direction for qualitative researchers in the
Big data studies commonly begin with a researcher
having access to a data source or a data set on a
phenomenon, rather than with theory (Johnson,
Gray, & Sarker, 2019). Thereafter, the analysis pro-
cess involves specific issues in data access and
clean up, search, and processing that are differ-
ent from conventional approaches. Executing the
phases might call for distinct computational and
programming skills (e.g., R and Python). Data for
smallerresearch are normally produced in struc-
tured ways and captured at certain point(s). A key
challenge for the big data methodology is how to
integrate and store structured and unstructured data
in a way that would make the later analyses and vi-
sualization efficient and secure. Another challenge
is that big data sets are often not created to examine
specific questions and constructs. Thus, the re-
searcher must deal with various issues pertaining to
data construction and quality.
It is across these challenging methodological
phases of big data where we would encourage man-
agement researchers to attain a greater understand-
ing of the advantages and disadvantages of starting
with theoryin big data studies. Precisely in what
ways (and when) might theory help to guide the
various decisions pertaining to the cleaning, con-
struction, aggregation, and storage of big data sets?
For example, a multi- or meso-level theory could
inform the decision of whether a big data set should
be constructed as horizontally deep(many vari-
ables but fewer observations) rather than vertically
2019 973Simsek, Vaara, Paruchuri, Nadkarni, and Shaw
deep(fewer variables but many observations). We
would also encourage researchers to develop a
deeper understanding of each facet of the methodo-
logical process. It could, for example, be a productive
practice for the field if big data studies were to rou-
tinely contain a summary of methodological steps
undertaken, including the challenges encountered
and solutions implemented.
Even as it might be possible to examine some large
data set using traditional statistical and computa-
tional techniques, many do not scale to diverse
and unstructured data sets. Statistics focuses over-
whelmingly on inferences from data, while com-
putational architectures and algorithms that can
extract and discern valuable knowledge from com-
plex data sets are among the key considerations in
big data approaches. These architectures and algo-
rithms are used to analyze big data sets for specific
purposes, such clustering, pattern identification,
and prediction. Some of the techniques include data
mining, machine learning, neural networks, and
deep learning (convolutional, deep belief, and re-
current nets). We also observe that a straightforward
application of some of these techniques, especially
unsupervised machine learning, in management
research could result in several challenges. For ex-
ample, one strength of deep learning techniques is
to search for and then extract patterns from un-
structured data sets. By this means, questions might
be raised concerning, for instance, how to build an
explanatory model around a pattern, and how to
communicate the boundaries and constraints of the
final model.
What is also obvious to us, after reviewing the
relevant research, is that these techniques tend to be
highly specialized across different research tasks
and evolve dynamically, which makes it difficult for
individual researchers to make use of the potential of
these techniques. For example, big data visualization
techniques demand computational, statistical, and
informational knowledge. Another set of challenges
could come from the required computing power and
infrastructure, which might be not easily accessible
to individual researchers. Together, we thus suggest
that management researchers need to develop a more
systematic understanding of the advantages and
disadvantages of the available big data analytical
techniques in the context of management studies and
phenomenonfor example, how might the fields
empirical approaches be combined with big data
techniques such as experimental data or findings
and subsequent applications of machine learning
techniques to attain more generalizable insights? Or,
how might the field take advantages of the tech-
niques to calibrate covariates, or address multi-
collinearity for smallerdata studies in which the
outcome variable is complex and distal, such as or-
ganizational performance? Relatedly, the field could
benefit from a richer understanding of the challenges
and opportunities of investigating the findings from
the machine learning and other predictive tech-
niques. For example, in what ways might unsuper-
vised machine learning techniques be combined
with qualitative research, such as using the clusters
and patterns to inform concepts selection, aggrega-
tions, and initial themes?
It is also noteworthy that different technologies
and platforms might be appropriate for a variety of
purposes across the methodological tasks, such as
storage, access, and processing of data. While some
platforms exist and provide impressive capabilities
for processing big data sets, management researchers
will need to obtain a more complete understanding
of how to choose among the ever-expanding menu of
big data technologies. Thus, we encourage manage-
ment scholars to devote more attention to new re-
search designs and analytical complementarities at
the nexus of conventional and big data methodolo-
gies. We also encourage more attention toward un-
derstanding the advantages and disadvantages of big
data techniques and technologies in the context of
managerial phenomena, individually and compara-
tively vis-`
a-vis conventional techniques such as mul-
tivariate statistics.
Big data practices and applications have occurred
in diverse settings, industries, and economies. We
thus suggest that management scholars can and
should focus special attention on big data as a phe-
nomenon in organizations, institutions, and socie-
ties. There is a great need for understanding where
and in what organizational and industrial contexts
big data applications might be more consequential
for organizations and managers. More broadly, we
believe that the field should lead in the development
of new theories, approaches, and frameworks that
could help managers and their firms to better use
and extract value from big data. For example, how
might big data technologies and tools be used in
support of corporate strategy, such as by integrating
diverse data technologies across business units?
A related question concerns the firms strategic
choice regarding where and how to playin the big
data space. From a decision-making perspective,
974 AugustAcademy of Management Journal
machine-learning approaches that automatically
identify actionable patterns could help to alleviate
some of the cognitive burden on managers. This po-
tential raises several intriguing questions. What is
the nature and consequence of the trade-off between
bounded executive cognition and cognitive re-
quirements of big data? How might a capability to
quickly analyze and visualize patterns hidden in
big data shape the quality and speed of decision-
making? What types of managers are more likely to
embrace (or avoid) big data in making decisions?
Addressing these questions could lead to new the-
ory or refinements to existing theories such as the
resource-based view, organizational learning, upper-
echelons, among others, as well as help managers
improve use of big data in their own decision-
Some researchers have argued that the notion
of big data as objective and fact based is a myth
(Gitelman, 2013). Given the possibility for a sub-
jective interpretation, individual micro-foundations
might be crucial in understanding big data processes
and uses in organizations. Several broad questions
beg attention: How do individuals and groups
choose and interpret big data? What are some of the
psychological barriers to individualsadoption of
big data? These questions could be investigated by
drawing on a range of distinct theoretical lenses.
Attention-based perspectives, judgment and heuris-
tic theories, and counterfactual thinking could be
especially pertinent in understanding how individ-
uals might utilize and interpret big data.
Big data might also create some research oppor-
tunities around its own ecosystems. For example,
whereas much has been said about how big data is
revolutionizing management processes and how
decision-making teams can benefit from using it,
little has been said about the challenges and pro-
cesses of the big data teams that generate and manage
big data in organizations (Saltz, 2015). Under-
standing the novel interpersonal challenges that
big data teams face is an important direction that
also could lend considerable prescriptive value,
such as in the context of new product development.
More broadly, creating big data infrastructure re-
quires senior executives to put in place appropriate
structures and capabilities that support integration
and unification of the many islands of data and an-
alytical capabilities that could exist throughout
the organization. At the same time, this integration
creates several relational and cultural challenges,
such as resistance to sharing and combining data
because of organizational silos and disputes over
the implications of the associated analytical in-
sights. Galbraith (2014: 3) observed that, as organi-
zations embrace big data, there is a shift in power
from experienced and judgmental decision-makers
to digital decision-makers.How do organizations
structure this shift in power? Does the typical top
management team include a separate chief digital
officer or does the chief information officer wear two
hats: IT and big data? Another question concerns
how organizations might create norms and values
concerning information sharing, transparency, and
We would be remiss not to touch upon the ethical
and privacy issues surrounding big data. It has by
now become clear that the generation and storage
of big data sets involve more challenges than usu-
ally anticipated, as shown for instance with the re-
cent scandals such as Cambridge Analytica. To begin
with, the availability of data is not a guarantee
that their use would be ethical or even legal. In ad-
dition, there are issues and contradictory demands
of transparency and protection of individuals
identities and personal knowledge (e.g., Acquisti,
Brandimarte, & Loewenstein, 2015). Although legal
regulations and organizations codes can serve as
helpful markers, individual differences are critical in
understanding the propensity of individuals in going
over and beyond the minimum compliance, or, al-
ternatively, the tendency of individuals to engage in
ethical wrongdoing with regard to big data acquisi-
tion and utilization. The standards and practices
regarding individual data rights, ethics, and privacy
are in a state of development and debate globally.
These ethical complexities in big data provide op-
portunities to enrich theories in the areas of ethics
and values, such as ethical leadership, moral values,
and identity.
Moreover, although the ideals about big data speak
of openness and access to all, this is not entirely the
case. Big data is becoming an increasingly important
business in which various actors not only control the
databases but also regulate the marketing, sales, and
use of such data and analytical capabilities (Cohen,
2013). Is this going to lead to asymmetric access and a
new big data divide among researchers and practi-
tioners, and within and across societies and nations
more broadly? Several indications suggest that big
data can lead to a Matthew effect,by which we
simply mean, to paraphrase Merton (1973), that the
data-and-analytical-capability-rich might get richer,
and the data-and-analytical-capability-poor might
get poorer. Relatedly, research transparency and
replication issues could become problematic if big
2019 975Simsek, Vaara, Paruchuri, Nadkarni, and Shaw
data sets and the analytics that underpin them were
to be kept secret for a variety of reasons, such as com-
petitive advantage (Cohen, 2013).
Our discussion on big data as a research perspec-
tive and its associated research priorities in the pre-
ceding sections make it clear that big data gives rise
to some distinct issues at each stage of the research
process and designfrom starting and/or building
theory, and accessing and integrating the data, to the
analysis and reporting and visualization. It also
seems to us that big data research is developing in a
way that might be beyond a single researchers
capabilities and resources, due to data access and
management, the required computational power,
and the necessary knowledge of the analytical tools
and techniques. We believe that researchers will also
need to consider and overcome some rather basic
barriers to publishing big data studies in the fields
First, big data cannot substitute for careful and
credible research designs and the appropriate con-
sideration of research questions. With no clear and
theoretically pertinent question guiding their crea-
tion and preparation, big data sets might come across
as a large convenience sample or a fad.A key
question for researchers therefore is this: Why is big
data most appropriate in studying the research
question of interest? Researchers may thus have to
provide additional justification for the way and the
types of data and variables collected, constructed,
and aggregated. We would particularly encourage
that researchers incorporate (explicitly or implicitly)
the logic of data access and collection, integration
and aggregation, analysis, and reporting and visual-
ization to craft and communicate the research design
of their big data studies.
Second, it may be difficult if not impossible for
reviewers and other authors to replicate and extend
studies if there is little transparency about how the
data are created, manipulated, and/or analyzed.
Private companies often own and store big data sets.
Without some built-in quality checks and controls,
reliabilities and validities of variables might be sys-
tematically compromised. Systematic errors cannot
be resolved by collecting more of the same data. Re-
viewers and readers are used to seeing empirical
studies that typically use small samples wherein the
variables are operationalized in a specific fashion.
While some variables may have face validity and
require less justification, researchers might have to
find solutions about the operationalization of latent
and profile constructs embedded in big data sets.
One solution is to use the small sample contexts
to establish the validity of those measures before
employing them in the big data contexts. Another
solution might be to combine big data analysis with
other methodseither quantitative or qualitative
to establish validity and/or illuminate the key pro-
cesses or mechanisms at play. Yet another option is
to work closely with practitioner experts to ensure
strong face validity of the assumptions and ap-
proaches used by the researchers.
Third, the selection of constructs or variables in
the current empirical approaches is typically done
with guidance from the underlying theory. With big
data, the process of converting data into constructs
of interests can lack clarity and transparency be-
cause some associated techniques, such as machine
learning, might be barely guided by an explicit the-
ory. Here, one could distinguish between supervised
and unsupervised learning techniques. In super-
vised techniques, the researcher could specify the
variables to be incorporated into the model, and, so,
the approach is like the conventional small-sample
research. But, in unsupervised algorithms, the algo-
rithm will select the variables from the available
variables to be included in the model. Reviewers and
readers are not used to seeing papers that select
variables in such random(from the perspective of
the small-sample research paradigm) fashion. Re-
latedly, given that the existing paradigm uses control
variables in regressions to control for alternative in-
fluences correlated with the explanatory variables,
researchers must pay attention to how they can
convince reviewers that the patterns of associations
found from the data are reasonable and are not just
associations by chance.
Fourth, whatever analytical technique(s) is uti-
lized, but especially for machine and deep learning,
we encourage researchers to describe the content
and process of specific variables and associations
examined, rather than having them obscured within
acomputational black box.The unsupervised
and deep machine-learning techniques, by auto-
mating multiple hypothesis testing with opaque
modifiers and biases, could in fact convolute the
meaning of constructs and predictionswith the
added risk of spitting out spurious correlations at an
unprecedented scale. A related concern is that, be-
cause technologies and techniques of big data are
rapidly changing, researchers might rely on outdated
techniques and modeling. The selected tools and
976 AugustAcademy of Management Journal
technique will need to be justified vis-a-vis the
studys question and testing needed for a credible
Fifth, most big data approaches employ predic-
tive techniques rather than statistical inference
approaches. So, scholars employing big data ap-
proaches must convince reviewers and readers that
the approaches are equally good, if not better, for
testing the theories, relative to the statistical in-
ference approaches. Moreover, when simultaneous
associations among multiple variables need to be
presented in a single model, researchers must pres-
ent them in a format understandable to reviewers
trained in different paradigms. Finally, because sta-
tistical significance is irrelevant with massive sam-
ple sizes, researchers should work to justify and
demonstrate the importance of the findingsfor
example, with effect sizes and contextualized sig-
nificance. Visual representation of the results would
also likely be a necessary approach.
Despite these pitfalls, hurdles, and challenges,
we argue that management scholarship will be in
stronger position to the extent that it not only trans-
fers relevant knowledge on big data into the field, but
also actively shapes the content and evolutionary
trajectory of that knowledge. We have discussed
herein several directions for synthesizing the
strengths of big data and management scholarship to
mutual advantage. The various paradigmatic, con-
ceptual, methodological, and phenomenological is-
sues surrounding big data also signify to us that
individual researchers will need to weigh the bene-
fits and risks and proceed cautiously when pursuing
big data research.
Zeki Simsek
Clemson University
Eero Vaara
Aalto University School of Business
Srikanth Paruchuri
Pennsylvania State University
Sucheta Nadkarni
University of Cambridge
Jason D. Shaw
Nanyang Technological University
Acquisti, A., Brandimarte, L., & Loewenstein, G. 2015.
Privacy and human behavior in the age of information.
Science, 347: 509514.
Bowman, A. W. 2018. Big questions, informative data, ex-
cellent science. Statistics & Probability Letters,136:
Calude, C. S., & Longo, G. 2017. The deluge of spurious
correlations in big data. Foundations of Science, 22:
Chan, J., & Moses, B. L. 2016. Is big data challenging
criminology? Theoretical Criminology, 20: 2139.
Cohen, J. 2013. What privacy is for. Harvard Law Review,
126: 19041933.
Coveney, P. V., Dougherty, E. R., & Highfield, R. R. 2016.
Big data need big theory too. Philosophical Trans-
actions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 374: 20160153. Retrieved
Einav, L., & Levin, J. 2014. The data revolution and eco-
nomic analysis. Innovation Policy and the Economy,
14: 124.
Elragal, A., & Klischewski, R. 2017. Theory-driven or
process-driven prediction? Epistemological chal-
lenges of big data analytics. Journal of Big Data, 4: 19.
Retrieved from
e, M. 2015. Big data and its epistemology. Journal of
the Association for Information Science and Tech-
nology, 66: 651661.
Galbraith, J. R. 2014. Organization design challenges
resulting from big data. Journal of Organization De-
Gitelman, L. (Ed.) 2013. Raw datais an oxymoron
Cambridge, MA: MIT Press.
Johnson, P., Gray, P., & Sarker, S. 2019. Revisiting IS re-
search practice in the era of big data. Information and
Organization, 29: 4156.
Kitchin, R. 2014. Big Data, new epistemologies and para-
digm shifts. Big Data & Society,1:112.
Leonelli, S. 2014. What difference does quantity make? On
the epistemology of Big Data in biology. Big Data &
Locke, K., Golden-Biddle, K., & Feldman, M. 2008. Making
doubt generative: Rethinking the role of doubt in re-
search process. Organization Science, 19: 907918.
onberger, V., & Cukier, K. 2013. Big data: A
revolution that will transform how we live, work,
and think. Boston, MA: Houghton Mifflin Harcourt.
Merton, R. K. 1973. The sociology of science: Theoretical
and empirical investigations. Chicago, IL: University
of Chicago Press.
Oswald, F. L., & Putka, D. J. 2016. Statistical methods for
big data: A scenic tour. In S. Tonidandel, E. B. King, &
2019 977Simsek, Vaara, Paruchuri, Nadkarni, and Shaw
J. M. Cortina (Eds.), Big data at work: The data science
revolution and organizational psychology:4363.
New York, NY: Routledge.
Sætra, H. K. 2018. Science as a vocation in the era of
big data: The philosophy of science behind big data
and humanitys continued part in science. In-
tegrative Psychological & Behavioral Science,4:
Saltz, J. S. 2015. The need for new processes, methodolo-
gies and tools to support big data teams and improve
big data project effectiveness. In IEEE Computer So-
ciety (Ed.), 2015 IEEE international conference on
big data: 20662071. Los Alamitos, CA: IEEE Com-
puter Society.
Succi, S., & Coveney, P. V. 2018. Big data: The end of
the scientific method? Philosophical Transactions of
the Royal Society A: Mathematical, Physical and
Engineering Sciences, 377: 20180145. Retrieved from
Tonidandel, S., King, E. B., & Cortina, J. M. 2018. Big data
methods: Leveraging modern data analytic techniques
to build organizational science. Organizational Re-
search Methods, 21: 525547.
978 AugustAcademy of Management Journal
... The goal of process science is to reconcile methods, theories, and approaches of various scientific fields to establish a comprehensive understanding of processes as well as means to design interventions to processes. Our motivation to introduce process science is further complemented by new means to study processes: the ever-expanding datafication, which affects all areas of our private and professional lives, generates comprehensive data on processes dynamics; and computational techniques from various disciplines (Lazer et al., 2020;Simsek, Vaara, Paruchuri, Nadkarni, & Shaw, 2019) enable the analysis of process dynamics across various levels. Drawing on various claims that the use of digital data yields unprecedented opportunities for research (Lazer et al., 2020), process science aims at integrating data from diverse sources, including company data, environmental data, body data, and many others. ...
... Management and business research emphasize the importance of designing processes to enable business operations (Dumas, La Rosa, Mendling, & Reijers, 2018;Hammer & Champy, 1993). As different research communities have applied a process perspective to different phenomena, they developed different methods to study them, and a cross-fertilization among research fields may lead to new methods in order study how and why certain phenomena evolve and change over time (Lazer et al., 2020;Mendling et al., 2021;Simsek et al., 2019). However, scientific discourses on processes continue to be scattered across different fields (Abbott, 1995;Mendling et al., 2021). ...
Full-text available
The only constant in our world is change. Why is there not a field of science that explicitly studies continuous change? We propose the establishment of process science, a field that studies processes: coherent series of changes, both man-made and naturally occurring, that unfold over time and occur at various levels. Process science is concerned with understanding and influencing change. It entails discovering and understanding processes as well as designing interventions to shape them into desired directions. Process science is based on four key principles; it (1) puts processes at the center of attention, (2) investigates processes scientifically, (3) embraces perspectives of multiple disciplines, and (4) aims to create impact by actively shaping the unfolding of processes. The ubiquitous availability of digital trace data, combined with advanced data analytics capabilities, offer new and unprecedented opportunities to study processes through multiple data sources, which makes process science very timely.
... While it remains ambiguous on what constitutes big data, there resides a consensus on at least three dimensions that prevail in the literature; volume, variety and velocity [7]. In brief, volume refers to the size of the dataset; variety refers to the different forms and sources of data that construct the dataset; and velocity refers to the speed of data generation and analysis [5], [7], and [8]. ...
... While it remains ambiguous on what constitutes big data, there resides a consensus on at least three dimensions that prevail in the literature; volume, variety and velocity [7]. In brief, volume refers to the size of the dataset; variety refers to the different forms and sources of data that construct the dataset; and velocity refers to the speed of data generation and analysis [5], [7], and [8]. ...
... Scholars have also used Twitter posts related to the COVID-19 pandemic to construct individual's reactions (Boccia Artieri et al., 2021). From an epistemological viewpoint, what is common among these data-driven approaches is that they provide brand-new perspectives on interpreting a phenomenon and have the possibility to revamp state-of-the-art knowledge (Simsek et al., 2019). After all, many aspects of social science and social media intertwine in one way or another; while the former concerns human interaction, the latter escalates its essence to a much broader and global scale. ...
Full-text available
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
... We make several contributions to the literature. First, while platform research leveraging new data sources and techniques has greatly advanced our knowledge of the pressing phenomena of digital platforms, the literature needs more, not less, theory in guiding research questions and deriving inferences (Simsek, Vaara, Paruchuri, Nadkarni, & Shaw, 2019). We demonstrate that classic theories can help us reconceive extant platform studies along the incentive and control dimensions of organizational governance, and unveil and encapsulate specific governance mechanisms and design features that have been explored. ...
Full-text available
The burgeoning digital-platforms literature across multiple business disciplines has primarily characterized the platform as a market or network. Although the organizing role of platform owners is well recognized, the literature lacks a coherent approach to understanding organizational governance in the platform context. Drawing on classic organizational governance theories, this paper views digital platforms as a distinct organizational form where the mechanisms of incentive and control routinely take center stage. We systematically review research on digital platforms, categorize specific governance mechanisms related to incentive and control, and map a multitude of idiosyncratic design features studied in prior research onto these mechanisms. We further develop an integrative framework to synthesize the review and to offer novel insights into the interrelations among three building blocks: value, governance, and design. Using this framework as a guide, we discuss specific directions for future research and offer a number of illustrative questions to help advance our knowledge about digital platforms' governance mechanisms and design features.
... In fact, Crilly's (2017) mix-methods paper is the only exception in our selection of papers. We argue that the explosion in the production of textual information by and about organizations, coupled with methodological developments in computational linguistics and machine learning, presents us with an overlooked opportunity (Simsek, Vaara, Paruchuri, Nadkarni, & Shaw, 2019). These advances in textual analysis could allow strategy researchers to better understand the microlevel dimensions of strategic practices and processes, as well as to link these microlevel dimensions to macro-organizational and market-level phenomena. ...
Full-text available
Research summary The purpose of this introduction to the SMS Collection is to take stock of advances in language-based analyses of strategic processes and practices with an eye on the theoretical and methodological insights and opportunities. After a review of the articles included, we develop a framework that identifies four perspectives ranging from the more micro to the macro: (1) micro-level conceptual basis of strategy discourse, (2) use of language in strategy work processes in their socio-material and multimodal contexts, (3) use of language and especially narratives in long-term processes of strategic change, and (4) the rhetorical and discursive reconstruction of organizational strategies in their historical contexts. We then move on to offer a set of research opportunities and questions to form an agenda for future research. Managerial summary This paper takes stock of recent research on the role of language and communication in strategic decision-making and strategy work. The key argument is that we should not treat language merely as a window into other aspects of strategic phenomena but as a central means through which strategies are shaped and made sense of. The paper underscores that language use is a crucial part of strategy work and strategic change – to be taken seriously in its own right in research as well as practice. Another key point is that we need to develop better understanding of the new communication technologies and media that play a key role in contemporary organizations. The new theoretical ideas and methods may also inspire practitioners to develop their communication practices. This article is protected by copyright. All rights reserved.
... unstructured (e.g., voice and video) data can be collected and processed (Simsek, Vaara, Paruchuri, Nadkarni, & Shaw, 2019). For instance, machine learning can facilitate the prediction of future unknown states and events based on current and past business-related data. ...
Full-text available
/Purpose: While it is evident that the introduction of machine learning and the availability of big data have revolutionized various organizational operations and processes, existing academic and practitioner research within decision process literature has mostly ignored the nuances of these influences on human decision-making. Building on existing research in this area, this paper aims to define these concepts from a decision-making perspective and elaborates on the influences of these emerging technologies on human analytical and intuitive decision-making processes. /Design/methodology/approach: The authors first provide a holistic understanding of important drivers of digital transformation. The authors then conceptualize the impact that analytics tools built on artificial intelligence (AI) and big data have on intuitive and analytical human decision processes in organizations. /Findings: The authors discuss similarities and differences between machine learning and two human decision processes, namely, analysis and intuition. While it is difficult to jump to any conclusions about the future of machine learning, human decision-makers seem to continue to monopolize the majority of intuitive decision tasks, which will help them keep the upper hand ( vis-à-vis machines), at least in the near future. /Research implications: The work contributes to research on rational (analytical) and intuitive processes of decision-making at the individual, group and organization levels by theorizing about the way these processes are influenced by advanced AI algorithms such as machine learning. /Practical implications: Decisions are building blocks of organizational success. Therefore, a better understanding of the way human decision processes can be impacted by advanced technologies will prepare managers to better use these technologies and make better decisions. By clarifying the boundaries/overlaps among concepts such as AI, machine learning and big data, the authors contribute to their successful adoption by business practitioners. /Social implications: The work suggests that human decision-makers will not be replaced by machines if they continue to invest in what they do best: critical thinking, intuitive analysis and creative problem-solving. /Originality/value: The work elaborates on important drivers of digital transformation from a decision-making perspective and discusses their practical implications for managers.
... Indeed, our study demonstrates that BERT scores derived from advanced computational linguistic techniques (Devlin et al., 2018;Zhang et al., 2019) may allow us to predict the funding outcomes of crowdfunding projects, complementing the traditional rule-based readability score. These findings respond to scholarly calls for granularity provided by big data and advanced computational methods (e.g., George et al., 2016;Pandey and Pandey, 2019;Simsek et al., 2019). This study also joins recent research (Kaminski and Hopp, 2019;Obschonka and Audretsch, 2019) to illustrate the value of artificial intelligence in entrepreneurship research. ...
Full-text available
We explore how natural language processing can be applied to predict crowdfunding outcomes. Using the Bidirectional Encoder Representations from Transformers (BERT) technique, we find that crowdfunding projects that use a story section description with a higher average BERT score (indicating a lower quality of writing) tend to raise more funding than those with lower average BERT scores. In contrast, risk descriptions that have higher BERT scores tend to receive less funding and attract fewer backers. These relationships remain consistent after controlling for various traditional readability indices, highlighting the potential benefits of incorporating natural language processing techniques in entrepreneurship research.
Full-text available
Strategy research is premised on the centrality of technological sustenance, BI and its analytics in particular, for conducting strategizing activities. Drawing on ethnomethodological conceptualizations of reflection and reflexivity, the thesis demonstrates the value of a radical reflexive account through the application of Baudrillard’s simulation and simulacra and Peirce’s semiotics. Through its development of a radical reflexive discourse of BI as simulacra, this thesis critically examines the study of the BI–strategy couplet and the lessons to be learned from this perspective. As such, the thesis investigates the textual practices that comprise the BI and strategy research in determinist, humanist, and post-humanist writings. In light of this, the thesis argues that these treatments do not fully engage with the status and nature of BI sustenance. The findings of the thesis indicate that scholars tend to give theoretical primacy to the environment and outcome (in which BI is viewed as a prop that supplements prospective strategy formulation), or organizational context (in which BI is reduced to its capabilities that support the emergent character of strategy formation). In this context, BI itself tends to fade away into a sea of taken-for-granted assumptions regarding its nature. This taken-for-granted nature of BI sustenance is apparent in its treatment as a “black box” or “self-evident” thing. In response, this dissertation advances an agenda for postmodern and post-human scholarship in BI sustenance and strategy, in which it seeks to re-conceptualize the concept of BI in radically postmodern and post-human notion. First, BI is re-conceptualized as a socially constructed phenomenon, that is, a representation of a reality that can be known only through human images and representations. Second, the thesis theorizes BI as a “prime mover” of the doings of strategy, which will open new avenues for understanding strategy work differently as it is increasingly suffused with ubiquitous technology sustenance. The consequences of this reconceptualization for strategy emergence are explored.
Epistemology is concerned with the preconditions and goals of knowledge, helping us to understand how knowledge is generated. However, in recent years, the availability of Big Data and the advancement of Data Science (DS) methods have led to heated discussions as well as to a divided scientific community. On the one hand, there are those who believe that knowledge can now be generated exclusively from data, rendering theory no longer necessary, while the opposing voices, on the other hand, do not want to rely solely on the results that algorithms extract from the data. As such, this chapter discusses both perspectives and warns against taking a one-sided position. Rather, the goal is to understand the advantages of both sides and to profit from the synergies between the two scientific approaches.
Nonprofit organizations face increased competition in their volunteer recruitment efforts. Rapidly changing generational demographics within the US and an expanding base of digital marketing sources challenge the effective management of this critical process. Furthermore, many nonprofits lack the data and/or technical skills necessary to organize, analyze, and evaluate their volunteer recruiting processes. This study leverages legacy system data from two Midwest Big Brother Big Sister locations and finds that compared with Millennials, Boomers (2.08/1.63 times) and Gen-Xers (1.53/1.20 times) are significantly more likely to volunteer. Regression models revealed the significant influence of marketing sources by generation, providing insight for segmented marketing strategies. Findings also suggest an increased focus on marketing to Millennials results in overall lower volunteer rates. This article profiles an approach using basic dimensional data analysis and more advanced data analytics and serves as a guide for nonprofits to follow to develop insight into their volunteer recruitment efforts.
Full-text available
Through building and testing theory, the practice of research animates data for human sense-making about the world. The IS field began in an era when research data was scarce; in today's age of big data, it is now abundant. Yet, IS researchers often enact methodological assumptions developed in a time of data scarcity, and many remain uncertain how to systematically take advantage of new opportunities afforded by big data. How should we adapt our research norms, traditions, and practices to reflect newfound data abundance? How can we leverage the availability of big data to generate cumulative and generalizable knowledge claims that are robust to threats to validity? To date, IS academics have largely welcomed the arrival of big data as an overwhelmingly positive development. A common refrain in the discipline is: more data is great, IS researchers know all about data, and we are a well-positioned discipline to leverage big data in research and teaching. In our opinion, many benefits of big data will be realized only with a thoughtful understanding of the implications of big data availability and, increasingly, a deliberate shift in IS research practices. We advocate for a need to re-visit and extend traditional models that are commonly used to guide much of IS research. Based on our analysis, we propose a research approach that incorporates consideration of big data—and associated implications such as data abundance—into a classic approach to building and testing theory. We close our commentary by discussing the implications of this hybrid approach for the organization, execution, and evaluation of theory-informed research. Our recommendations on how to update one approach to IS research practice may have relevance to all theory-informed researchers who seek to leverage big data.
Full-text available
We now live in the era of big data, and according to its proponents, big data is poised to change science as we know it. Claims of having no theory and no ideology are made, and there is an assumption that the results of big data are trustworthy because it is considered free from human judgement, which is often considered inextricably linked with human error. These two claims lead to the idea that big data is the source of better scientific knowledge, through more objectivity, more data, and better analysis. In this paper I analyse the philosophy of science behind big data and make the claim that the death of many traditional sciences, and the human scientist, is much exaggerated. The philosophy of science of big data means that there are certain things big data does very well, and some things that it cannot do. I argue that humans will still be needed for mediating and creating theory, and for providing the legitimacy and values science needs as a normative social enterprise.
Full-text available
Very large databases are a major opportunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these tools is used to support a “philosophy” against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: “with enough data, the numbers speak for themselves”. The “end of science” is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this “philosophy” is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which—as we will prove—implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it.
Full-text available
Is big data science a whole new way of doing research? And what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of big data science does not lie in the sheer quantity of data involved, but rather in (1) the prominence and status acquired by data as commodity and recognised output, both within and outside of the scientific community; and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. To assess this claim, one needs to consider the ways in which data are actually disseminated and used to generate knowledge. Accordingly, this paper reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as prominent infrastructures set up to organise and interpret such data; and examine the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon. This illuminates some of the conditions under which big data need to be curated to support processes of discovery across biological subfields, which in turn highlights the difficulties caused by the lack of adequate curation for the vast majority of data in the life sciences. In closing, I reflect on the difference that data quantity is making to contemporary biology, the methodological and epistemic challenges of identifying and analyzing data given these developments, and the opportunities and worries associated to big data discourse and methods.
"For it is not the abundance of knowledge, but the interior feeling and taste of things, which is accustomed to satisfy the desire of the soul." (Saint Ignatius of Loyola). We argue that the boldest claims of big data (BD) are in need of revision and toning-down, in view of a few basic lessons learned from the science of complex systems. We point out that, once the most extravagant claims of BD are properly discarded, a synergistic merging of BD with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, non-locality and hyperdimensions which one encounters frequently in multi-scale modelling of complex systems.
The expression big data is often used in a manner which implies that immediate insight is readily available. Unfortunately, this raises unrealistic expectations. A model which encapsulates the powerful concepts of statistical thinking remains an invaluable component of good analysis.
Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.
The advent of ‘Big Data’ and machine learning algorithms is predicted to transform how we work and think. Specifically, it is said that the capacity of Big Data analytics to move from sampling to census, its ability to deal with messy data and the demonstrated utility of moving from causality to correlation have fundamentally changed the practice of social sciences. Some have even predicted the end of theory—where the question why is replaced by what—and an enduring challenge to disciplinary expertise. This article critically reviews the available literature against such claims and draws on the example of predictive policing to discuss the likely impact of Big Data analytics on criminological research and policy.