ArticlePDF Available

Abstract and Figures

Is data science a new field of study or simply an extension or specialization of a discipline that already exists, such as statistics, computer science, or mathematics? This article explores the evolution of data science as a potentially new academic discipline, which has evolved as a function of new problem sets that established disciplines have been ill-prepared to address. The authors find that this newly-evolved discipline can be viewed through the lens of a new mode of knowledge production and is characterized by transdisciplinarity collaboration with the private sector and increased accountability. Lessons from this evolution can inform knowledge production in other traditional academic disciplines as well as inform established knowledge management practices grappling with the emerging challenges of Big Data.
Content may be subject to copyright.
DOI: 10.4018/IJKM.2019040106
Volume 15 • Issue 2 • April-June 2019
This article published as an Open Access article distributed under the terms of the Creative Commons Attribution License
( which permits unrestricted use, distribution, and production in any medium,
provided the author of the original work and original publication source are properly credited.
Jennifer Priestley, Kennesaw State University, USA
Robert J. McGrath, University of New Hampshire, USA
Is data science a new field of study or simply an extension or specialization of a discipline that
already exists, such as statistics, computer science, or mathematics? This article explores the
evolution of data science as a potentially new academic discipline, which has evolved as a function
of new problem sets that established disciplines have been ill-prepared to address. The authors find
that this newly-evolved discipline can be viewed through the lens of a new mode of knowledge
production and is characterized by transdisciplinarity collaboration with the private sector and
increased accountability. Lessons from this evolution can inform knowledge production in other
traditional academic disciplines as well as inform established knowledge management practices
grappling with the emerging challenges of Big Data.
Big Data, Computer Science, Data Science, Knowledge Management, Knowledge Production, Statistics,
The terms “big data”, “data science” and “analytics” have pervaded the global common speak over
the past decade. While populist in many cases, these terms are rooted in the real practice of being
able to measure and analyze phenomena in larger amounts, faster and with a longer and more robust
historical perspective, all facilitated by technological advances and the lower cost of data storage.
Data, once defined by a numerical representation of some measurement, has today evolved into an
atomic unit that can be captured – that is measured, seen or heard – and thus extracted, analyzed
and converted into information and ultimately into new knowledge. What began only a few years
ago as a growing swell of the data ocean has become a tsunami of impacts into everyday life, or the
“datafication” of the economy (Dumont, 2016).
This datafication has resulted in many organizations sprinting to better leverage the data they
collect and capture the data they do not. The argument that knowledge, as a summation of data through
the knowledge management pyramid (Ackoff, 1989), is the only sustainable source of competitive
advantage is arguably more relevant today than when it was first posited (Drucker, 1995). It has also
led many companies to declare that they are, in fact, data and information organizations more so than
Volume 15 • Issue 2 • April-June 2019
they are purveyors of the products they sell (e.g. Capital One (Dee, 2016), Alibaba (Liyakasa, 2015)
and Ford (Blanco, 2016)). Cities too are becoming “smarter” with data-driven innovations geared at
efficient energy consumption, optimized traffic and parking, and the promotion of green and healthy
practices. And individuals are becoming more data driven, with many exploring opportunities by an
ever increasing “quantified self”; a concept related to the self-tracking of any number of physical,
behavioral, social and many other phenomena by individuals (Swan, 2013). A revolution, or perhaps
evolution, to be sure.
An unexpected consequence of these rapid (r)evolutionary changes has been the emergence of
the ubiquitous and pervasive “talent gap” – the term used to describe the challenge of organizations
to find people with the necessary skills to extract and analyze massive amounts of data (structured
and unstructured) to generate meaningful information. Simply put, the demand for these skills has
materialized so rapidly, traditional sources of supply for new talent (i.e., colleges and universities)
have been ill-equipped to develop and train talent at the scale and pace demanded.
The issues related to the emergence of data science and the associated talent gap have implications
for larger conversations related to organizational knowledge management. Jennex (2017) recognized
the role of Big Data in the revised knowledge management pyramid. The traditional pyramid first
presented by Ackoff (1989) established the framework that organizational wisdom derives from
knowledge, information, and finally from data. In the revised pyramid, Jennex places a finer lens on
the lowest level of the pyramid by calling out incremental layers between information and reality.
These new layers include “Data”, defined as “discrete facts…that can be stored in a database”
(Jennex & Bartczak, 2013), “Big Data”, defined as data that is “too big, too fast or too hard for
existing tools to process” (Madden, 2012), and “IoT”, defined as a sensor network of networks with
devices continually generating vast amounts of data and facilitating the evolving definition of what
data even is. This evolution in thinking from a simplistic single layer at the base of the pyramid to a
more detailed treatment of data within the knowledge management pyramid increases the resolution
of the lens through which reality can be detected.
It is the concepts, tools, and algorithms around “data science” that will enable a sustainable
organizational approach to the translation of the layers of data into information, knowledge and
ultimately to organizational wisdom/intelligence. However, where those organizational knowledge
activities meet societal ones and who addresses those “fault” lines become an issue as data sources
become more democratized and real time (Spender, 2007; Money & Cohen, 2018).
These types of issues have led many in academia to consider the conversations around “data
science” more formally. Is this truly a new field of study, or is data science simply an extension or
specialization of a discipline that already exists, such as statistics or computer science, or mathematics?
The answers to these questions are not trivial and have implications for both academics as well as
practitioners engaged in addressing the challenges in knowledge production and management related to
the emergence of Jennex’s more detailed treatment of data within the knowledge management pyramid.
The term “data science” has been traced back to computer scientist Peter Naur in 1960 (Naur, 1992),
but “data science” also has evolutionary seeds in statistics. In 1962, the famed statistician John W.
Tukey wrote:
For a long time I thought I was a statistician, interested in inferences from the particular to the general.
But as I have watched mathematical statistics evolve, I have … come to feel that my central interest
is in data analysis… data analysis is intrinsically an empirical science. (Tukey, 1962)
The fields of data manipulation have grown largely through methods in mathematics, statistics
and computer science during this period, with research from Peter Naur, who published “Concise
Survey of Computer Methods” in 1974; Gregory Piatetsky-Shapiro who organized and chaired the
Volume 15 • Issue 2 • April-June 2019
first Knowledge Discovery in Databases (KDD) workshop in 1989 and Usama Fayyad, Gregory
Piatetsky-Shapiro, and Padhraic Smyth, who published “From Data Mining to Knowledge Discovery
in Databases” in 1996. A reference to the term “data science” as a discipline within statistics was
made in the proceedings of the Fifth Conference of the International Federation of Classification
Societies in 1996. In 1997, during his inaugural lecture as the H. C. Carver Chair in Statistics at
the University of Michigan, Jeff Wu actually called for statistics to be renamed “data science” and
statisticians to be renamed “data scientists”.
Since the beginning of the 21st century, data stockpiles have expanded exponentially, largely due to
advents in processing and storage that is both efficient and economical at scale, leading to the drive to
collect, analyze and display data and information in “real time”, offering an unprecedented opportunity
to conduct a new form of knowledge discovery. Examples include artificial intelligence, machine
learning, deep learning, scientific workflows and redefining what data actually is with the ability to study
the kinds of data that are represented in the lower levels of Jennex’s revised knowledge management
pyramid (e.g., voice, image and text). With this shift has also come rethinking from scholars within the
contributing disciplines, such as William S. Cleveland’s “Data Science: An Action Plan for Expanding
the Technical Areas of the Field of Statistics” (2001) and Thomas H. Davenport and Jeanne Harris’s
“Competing on Analytics” (2007). These authors, and others, view the emerging discipline of data
science as a transformed and new field of science that has extended beyond the walls of the academy
into industry all the way to the more granular level of curiosity fueled by societal connectivity.
It has traditionally been assumed that the strength of an academic discipline resides in its purity,
integration and in its distinctness (GlobalHigherEd, 2017). However, scientific discovery and
knowledge expansion is less likely to occur in the well-established and distinct “center” of an academic
discipline and more likely to occur on the less-established edges. Consider the development of relatively
nascent, but increasingly accepted entrants into the academy such as biochemistry, astrobiology and
environmental science. These new and emerging disciplines evolved at the intersection of fringes of
their “parent disciplines” that were created through new developments and changing societal needs
– much like species evolving to take advantage of a changing climate or ecosystem.
Most papers, essays and books on the birth of disciplines invariably begin with the current
operational definition of the phenomenon (said new discipline), quickly followed by a history
of the development and transformational evolution of the field to its current state. While
helpful from a historical perspective, this type of inquiry does not lend context to the process
of creation for new disciplines, the factors that have led to them or provide any indication
regarding the discipline’s trajectory.
These new disciplines are what Gibbons et al. foretold as manifestations of a new mode of
knowledge production and scientific inquiry (Baber & Gibbons, 1995). The organizing principal of
this evolved mode of knowledge production is that it affects not only what knowledge is produced
but also how it is produced, the context in which is it pursued, the way it is organized, the reward
systems it utilizes and the mechanisms that control quality.
Informed by Gibbons’ framework, the evolution of data science as an academic discipline can be
contextualized using the lens of academia with an eye towards a sustained and formal contribution
to the revised knowledge management pyramid (Jennex, 2017).
Any discussion of data science as an evolving academic discipline must be caveated with an
acknowledgement that there are many in the related sciences who do not recognize anything here
more than an extension of their community’s contributions. In this discussion, a nascent scientific
discipline is posited for consideration – but not one that is in anyway displacing traditional and well-
established disciplines such as statistics or computer science. To that point, Gibbons et al. emphasize
that an evolved mode of knowledge production will emerge alongside and extend, rather than replace,
the traditional, established mode.
Volume 15 • Issue 2 • April-June 2019
The differences amongst defining attributes between the evolved mode of knowledge production
– referred to by Gibbons as “Mode 2” – and the traditional mode of knowledge production – referred
to as “Mode 1” are summarized in Table 1.
The authors posit that the evolution of data science can be explained using Gibbons’ framework
of a “Mode 2” discipline of knowledge production; without the attributes present in Mode 2, data
science would not/could not become a unique academic discipline. Each of these modal attributes
and the impact on the evolution of data science are explored below.
Knowledge Creation
The relevant contrast between the two modes is between problem solving organized around the
codes of practice relevant to an established discipline and problem solving organized around a
particular application.
Data science is generally accepted as an application-driven discipline that inherited its academic
DNA from two more theoretically-driven disciplines – statistics and computer science, with both
emanating from applied mathematics. In evolutionary biology, the term “heterosis” refers to the
improved fitness of a hybrid offspring. For academic disciplines, this refers to the phenomenon that
knowledge production drawing from research with input from multiple disciplines is of superior
quality to research that limits its scope to a single academic silo (Cohen & Lloyd, 2014).
The datafication of the economy, as referenced above, created unforeseen data-centric challenges
and opportunities, pushing both statistics and computer science out of their centers and into their
respective fringes. Much of the traditional core of statistics, which has been established to accommodate
data that is small, structured and static becomes less relevant where problems are defined by data that
is large, unstructured and in-motion. Money and Cohen (2018) similarly recognize the shortcomings
of traditional statistics to the Big Data environment:
…the present analysis techniques are rooted in statistical analysis, and significance tests that are
irrelevant due to the population sampling and subsequent generalization to an entire population. In
the contrasting Big Data analysis, Big Data sets or streams are not samples, they are massive and
represent the majority of or a full population. The concept of statistical significance is not particularly
relevant to Big Data.
Similarly, while the core of computer science enables the efficient capture and storage of massive
amounts of structured and unstructured data, the discipline is less able to accommodate the need to
translate that data into information through modeling, classification and visualization.
The datafication of all sectors of the economy – healthcare, manufacturing, finance, and retail –
has been the source of these ongoing data-centric challenges and opportunities. These applications
were the catalysts to fuse statistics and computer science at their fringes, creating an academic
heterosis in response to the emergence of a new problem set, for which each siloed discipline was
ill-equipped to address.
Table 1. Gibbons four modal attributes of knowledge production
Attribute Mode 1(Traditional) Mode 2 (Evolved)
Knowledge Creation Primarily theoretically driven Primarily application driven
Span of Engagement Single discipline Transdisciplinary
Diversity of Engagement Primarily academic Academic/Private sector
Quality Control Centralized Open accountability and reflexive
Volume 15 • Issue 2 • April-June 2019
The greatest “proof” of a discipline earning academic citizenship as a mode of knowledge
production is the advent of peer-review publication outlets. There are two notable aspects to the
academic journals dedicated to data science both related to the core of data science being a discipline
in application.
The first aspect is that data science is emerging as a “horizontal” discipline, which crosses
multiple “vertical” domain applications.
Regardless of the journal, the emphasis is on the application of data science to the domain rather
than on the domain in question. Importantly, where there were theoretical articles, the emphasis
was on the fused fringes of new algorithms or incremental improvements to emerging data science
methods like machine learning or deep learning.
The second aspect of these peer-reviewed articles is that about a quarter of the authors were not
affiliated with a university. A particularly relevant example of this emerging trend is one of the citations
in this paper – Money and Cohen (2018) – is authored by an academic (Money) and a practitioner
(Cohen). This creates interesting challenges to the foundations of how evidence has traditionally
been explored, vetted and ultimately adopted as premise by both the academy and society – an issue
explored later in this paper.
Span of Engagement
The evolved mode of knowledge production is grounded in a transdisciplinary approach to research
as opposed to a uni-disciplinary approach. It is worth noting here that there are conceptual differences
across the terms “multi-discipline”, “inter-discipline” and “trans-discipline” which are more than just
semantic. Multidisciplinarity draws on knowledge from different disciplines but stays within their
boundaries. Interdisciplinarity analyzes, synthesizes and harmonizes links between disciplines into
a coordinated and coherent whole. Transdisciplinarity integrates sciences in context and application,
and transcends their traditional boundaries (Choi & Pak, 2006). New knowledge produced in a
transdisciplinary context likely does not retro-actively fit into any one of the disciplines that contributed
to the solution. The concept of transdisciplines was crystalized by Stichweh in 2001:
Transdisciplines by their nature have the characteristic of bringing together fields that otherwise
appear to have little in common, thereby helping to re-unify science as it was before science was cast
into separate silos of knowledge and research traditions in the 19th century.
Table 2. A sampling of academic journals with “data science” or “Big Data” in the title
Journal 2017 Articles Sample of Areas of Application
Data Science Journal ( 17 Nuclear Power, Insurance, Climate Change,
Federal Data Policy, Sustainability, Social Science
International Journal of Data Science and
Analytics ( 61
Minority Discrimination, ADHD, MRIs,
Recommender Systems, Cell Phones in Smart
Cities, Social Network Analysis
Big Data ( 18
Propaganda and Politics, Profit-Driven Analysis,
Social and Technical Tradeoffs in Big Data,
Minority Discrimination, Education
Journal of Data Science ( 20 Credit Scoring, Anemia in Married Females,
European Football, Chemotherapy, Clinical Trials
International Journal of Data Science
( 9
Supply Chain Management, Tuberculosis
Screening, Minority Discrimination, Financial
Distress Detection
Volume 15 • Issue 2 • April-June 2019
In an academic environment, transdiscipline knowledge production will typically occur outside of
the traditional discipline-oriented school or department. While a professor of biology and a professor of
chemistry may co-author a research paper (an example of multidisciplinarity), a stream of biochemistry
research which engages multiple faculty and graduate students is better accommodated through a
separate university unit like a center or an institute, which can also facilitate collaboration with entities
engaged in similar research outside the university. In addition, over time, these biochemists develop
their own distinct theoretical structures, methods and outlets of dissemination of their research,
which traditional biologists or traditional chemists would likely not value at the same level as their
communities’ traditional outlets.
Data science is following a similar path of disciplinary evolution. A sample of masters-level
programs in data science at 45 universities across the United States revealed that 21 were housed
outside of a siloed academic college and were housed in a “Mode 2” knowledge production unit.
Similarly, a sample of Ph.D.-level programs in data science, revealed that five out of 13 were
housed in a “Mode 2” knowledge production unit.
The non-siloed units which house these programs are characterized by explicit statements of
transdisciplinarity, allowing for short term appointments and fellowships from faculty and external
contributors to defined research initiatives. These units also become a source of dissemination of
research and knowledge production in some cases bypassing the peer review process altogether and
creating a searchable repository for research products (e.g., Digital Commons).
Diversity of Engagement
Mode 2 knowledge production is diverse in terms of the skills and experience that contributors
bring to it. This is a function of the applied orientation. Gibbons’ framework identifies that Mode
2 is marked by an increase in the types of units that collaborate to produce knowledge; no longer
only universities but non-university research centers, government agencies, industrial laboratories,
consultancies and private sector organizations. Weinberger (2012) too suggests that the emergence of
Big Data is changing the shape and evolution of knowledge management to less structured “networks”
both within and across organizations. In such environments, patterns of funding exhibit a similar
Figure 1. U.S. masters level programs in data science by university location1
Volume 15 • Issue 2 • April-June 2019
diversity, being assembled from a variety of organizations with a diverse range of expectations and
requirements, but all aligned with application.
As evidenced above, the number of centers, institutes and other transdisciplinary units with an
emphasis on research and the dissemination of knowledge in data science is growing. However, while
some of these units exist as transdisciplinary within the bounds of the university, a scan of most of
these units reveals a roster of diverse participants. For example, the National Consortium for data
science (NCDS) housed at the University of North Carolina, includes fellows from North Carolina
State University and Drexel University, but also thought leaders in data science from IBM, Deloitte
and the Environmental Protection Agency (EPA). The NCDS provides the following description of
their organization:
The National Consortium For Data Science (NCDS) is a collaboration of leaders in academia,
industry and government formed to address the data challenges of the 21st century. The NCDS was
founded as a mechanism to help the U.S. take advantage of the ever increasing flow of digital data in
ways that result in new jobs and industries, new advances in healthcare, transformative discoveries
in science, and competitive advantages for U.S. industry.
A second example is the MIT-IBM Watson AI Lab. The Lab is a joint venture between MIT
and IBM, housed on the MIT campus. The venture brings together MIT students and faculty with
IBM researchers:
The MIT-IBM Watson AI Lab is one of the largest long-term university-industry AI collaborations to
date. Our Lab is a place where scientists, professors and students collaborate to drive the frontiers
of AI...the lab’s scientists and engineers will focus on fundamental scientific breakthroughs, publish
their results, and help guide the development of AI. A distinct objective of the lab is also to encourage
MIT faculty and students to launch new companies that will focus on commercializing AI inventions
and technologies that are developed at the lab.
There is another reason these transdisciplinary or multi-stakeholder entities will grow in
importance. As Money and Cohen (2018) identify, the very work of managing networks of streaming
Figure 2. U.S. Ph.D. level programs in data science by university location1
Volume 15 • Issue 2 • April-June 2019
data ecosystems – the IoT layer in Jennex’s revised pyramid – that will fuel future AI such as smart
commerce, smart cities and more will be dependent on a continual set of data activities which are
intensive but for which the benefits accrue downstream beyond any one of the data purveyors. These
include data quality, statistical and algorithmic validity, data integrity and managing architecture and
its development.
The types of diverse needs of thought, skills and orientation are becoming representative of
the research, discovery and application products of data science. This is true, in part, because of the
pace of technology, the size of the data and the multidimensional aspects of data-centric challenges,
with no one contributor having the skills, knowledge or funding to address on a uni-lateral basis.
It is also indicative of an employment trend that has been growing for decades – more than 50% of
individuals who earn a Ph.D. in a scientific discipline work in the for-profit sector (US News and
World Report, 2016). This has contributed to a more permeable membrane between the academic
and private sectors, and has expanded the breadth of collaboration and funding in scientific research,
scholarship and knowledge production.
Quality Control
The definition of “quality” research in traditional (Mode 1) knowledge production has been well
established for decades. This has been particularly true in the sciences. Traditionally, the process of
establishing agreed upon “evidence” was a prolonged process largely controlled by universities and/
or science-based government agencies, or what is roughly termed “the Academy”. It is a term dating
back to the teachings of Plato, but which is more currently defined as “a body of established opinion
widely accepted as authoritative in a particular field”. Within the academy, research is conducted
according to defined methods (taught by the academy), building on prior research, and utilizing the
most accepted design and analytical methods. The proposals for such research were often funded
by agencies, but also foundations, and the results were often compiled and published via the peer
review process in scientific journals for the academic community to discuss. The peer review process
consists of journals of varying reach and impact with editorial boards of reviewers who give unbiased
feedback into a paper’s merit, method and findings. These boards are also populated from members
of the academy.
Membership in the academy, at minimum, typically requires that one have a terminal degree
from an accredited university. The process has been criticized over the years as having a number
of inherent limitations, such as professional competition, disagreement on foundational beliefs, and
more currently, as being subject to outside influences due to the proliferation of online and for-profit
“journals” that do not subscribe to original tenets of the process or that allow for gaming of the process
(Kassirer & Campion, 1994; Ferguson & Oransky, 2014).
Socially distributed science and knowledge presents a new challenge where evidence verification is
not necessarily expertly derived via the peer review process but through impact and utilization measures
such as code being used, libraries accessed, and verified impact of methods in real application. Papers
still exist, but in open source with open comment.
This, of course, gives rise to questions of the quality control of such findings and methods for
which the academy was aptly designed to address. For example, what is the potential for corruption
of the process; were the methods transparent and reproducible (and does it matter if the outcome is
verifiable); and, what are the agreed upon standards for the process?
Generations of early career academics have had to pursue a single path to recognition and/or
academic tenure through the process of peer review. In this context, the process of quality control
is maintained by those deemed competent by the academy to act as “peers”. The influence of
these intellectual gatekeepers to traditional knowledge production cannot be overstated, including
professional control through peer review to define what problems and techniques are even deemed
important. Non-traditional findings or approaches to new challenges may be dismissed because they
are evaluated with rubrics from the “core” rather than from the “fringe”.
Volume 15 • Issue 2 • April-June 2019
This traditional peer review process is one in which quality and control mutually re-enforce one
another and can suffer from intellectual incest. In evolutionary terms, inbreeding occurs when an
offspring has a parent pairing where considerable genetic material is shared, which leads to a decrease
in species’ fitness for survival. Single, highly siloed diciplines are subject to intellectual inbreeding,
resulting in echo chambers that lead to low impact research (Cohen & Lloyd, 2014).
An examination of 2016-2017 published journal articles and conference proceedings indicates
that multidisciplinary research generated more citations per article than any individual discipline and
trails only the medical discipline in h-index score (
In the evolved Mode of knowledge production, additional criteria are added as a function of the
application orientation, where parties with a range of intellectual and practical interests will contribute
to the evaluation of the definition of “quality”. Since quality is no longer limited strictly to the review
of academic “peers” and centralized control is now distributed in this mode there could be a concern
of lower quality production that “falls through the cracks”. However, because the control process is
more broadly based and will experience multidimensional scrutiny from a new and expanded definition
of “peers”, an assumption of decrease in quality does not follow.
Knowledge production from data science is particularly subject to multidimensional scrutiny
because of growing public awareness and concern about data-centric issues related to the environment,
public health, and privacy. Any data-centric project that engages health care data will be subject to the
Health Insurance Portability and Accountability Act (HIPAA), education data will be subject to the
Family Education Rights and Privacy Act (FERPA) as well as an Institutional Review Board (IRB)
and consumer financial data will be subject to a myriad of regulations including the Payment Card
Industry Data Security Standard (PCI DSS), the Fair Credit Reporting Act, the Graham Leech Bliley
Act, as well as the communications of the Consumer Financial Protection Bureau. An expanding
number of public interest groups will likely express interest (and in some cases concern) related to
the impacts of data-centric research. Those who work regularly with data have a built-in sensitivity
to the impact, the perception and reception of their knowledge production – because they have to.
This sensitivity has created a discovery environment that is more reflexive and more responsive to a
broader audience than any earlier discipline; the need to be socially accountable effectively precludes
the production of data science knowledge in Mode 1.
Figure 3. Impact of research by discipline
Volume 15 • Issue 2 • April-June 2019
Data science is evolving as a separate academic discipline in a Mode 2 framework for knowledge
production. Specifically, the application-driven problem sets across all sectors of the economy
have contributed to the fusing of computer science and statistics to form this new discipline, which
is transdisciplinary, characterized by collaboration with the private sector and exhibits increased
accountability and reflexivity.
However, one possible outcome of evolution is always extinction. Unlike creatures that can
completely die out, knowledge – particularly in a digital age – never really completely disappears.
Consider the knowledge related to replacing a typewriter ribbon or programming a VCR. However,
academic disciplines are on a path to extinction when three things happen: (1) they consistently
experience low student enrollment numbers because of a lack of demand for the skills and the
knowledge produced, (2) little or no impactful research is being conducted or published, and (3), the
tools and skills of that discipline are no longer thought useful to society.
While the immediate future of data science as a mode of knowledge production is bright, its
long-term future is not ensured. Universities and the larger academy are faced with a new discipline
that will require a new orientation towards knowledge production, which is currently modeled by
data science. Any new orientation towards knowledge production comes with inherent challenges
at multiple levels:
1. The data driven ecosystem will require collaborative and convergent models to support growing
technological abilities and needs as data becomes faster, interconnected, larger and more integrated
into virtually all aspects of human life. These will require thoughtful models of incentives to
ensure those data and systems continue to produce and inform valuable knowledge outputs;
2. Research silos, while constructed for important reasons, may no longer reflect the intellectual
landscape of society. (Gill, 2013; Gazzaniga, 1998) Silos further contribute to intellectual
inbreeding and to research echo chambers. The future of high-impact, relevant research is in
transdisciplinary communities. The process of promotion and tenure needs to embrace and
reflect a more transdisciplinary orientation as well as more democratic methods of knowledge
production and quality control of evidence creation;
3. The academy has traditionally developed and employed a number of research “specialists” where
new research “generalists” may be needed that span areas of impact and implication to work
across silos in order to examine highly integrated societal contexts. (Cohen & Lloyd, 2014) For
example, medicine and health care in the United States has begun to focus on the socioeconomic,
sociodemographic, and social determinants of health outcomes and spending beyond those drivers
of cost and quality derived from health services utilization (Joynt, 2017);
4. Universities are typically slow to respond to societal or industry focused problems. New fields
such as data science rely on both private and public contexts as well as on transdisciplinary
partnerships, which require more flexibility and responsiveness (Borrego & Newswander, 2011);
5. University structures need to be nimbler and made to include more joint faculty focused
on transdisciplinary work. Institutes, centers and possibly schools can accomplish this,
however it requires rethinking the function and structure of both research and teaching as
well as service and their relationship to the connections between home departments and
transdisciplinary focal areas;
6. Increasingly more students who pursue a Ph.D. in a given field may never go into academia,
however they may engage in meaningful impactful research. The membrane between
academia and the private sector is becoming increasingly more permeable. The exponential
pace of change and growth in technology and data complexity places added emphasis on
universities to be a more active part of the division of labor that is knowledge creation.
Not all industries can or should replicate the resources necessary to engage in knowledge
Volume 15 • Issue 2 • April-June 2019
production as these are often duplicative across industries and would increase the cost of
production. Universities can partner with organizations and institutions to fill this role.
Thus, Ph.D. programs need to not just accommodate this new orientation, but embrace
and leverage it to produce graduates who can contribute to meaningful research in larger,
more impactful ways than just writing for academic journals.
This paper began with the question of whether data science is a new field of scientific study or
rather a specialization within an existing discipline such as statistics or computer science. Informed
by Gibbons et al.’s framework for evolved knowledge production, the authors posit that data science is
emerging as a new transdisciplinary field and an evolved mode of knowledge production. The factors
that have contributed to its (to date) success can be used to inform and reconsider the other currently
siloed academic disciplines to meet the demands of a 21st century education, with particular emphasis
on collaboration with the private sector and increased accountability and reflexivity.
Volume 15 • Issue 2 • April-June 2019
Ackoff, R. L. (1989). From Data to Wisdom. Journal of Applied Systems Analysis, 16(1), 3–9.
Baber, Z., Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1995). The new
production of knowledge: The dynamics of science and research in contemporary societies. Contemporary
Sociology, 24(6), 751. doi:10.2307/2076669
Bender, E. (2016). Challenges: Crowdsourced solutions. Nature, 533(7602), S62–S64. doi:10.1038/533S62a
Blanco, S. (2016). CEO Mark Fields says Ford will become a data company. Retrieved from http://www.autoblog.
Borrego, M., & Newswander, L. K. (2011). Analysis of interdisciplinary faculty job postings by institutional
type, rank, and discipline. Journal of the Professoriate, 5(2).
Choi, B. C., & Pak, A. W. (2006). Multidisciplinarity, interdisciplinarity and transdisciplinarity in health research,
services, education and policy: Definitions, objectives, and evidence of effectiveness. Medecine Clinique et
Experimentale [Clinical and Investigative Medicine], 29(6), 351–364. PMID:17330451
Cleveland, W. S. (2001). Data Science: An action plan for expanding the technical areas of the field of statistics.
International Statistical Review, 69(1), 21–26. doi:10.1111/j.1751-5823.2001.tb00477.x
Cohen, E., & Lloyd, S. (2014). Disciplinary evolution and the rise of the transdiscipline. Informing Science: The
International Journal of an Emerging Transdiscipline, 17, 189–215. doi:10.28945/2045
Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., & Players, F. etal. (2010). Predicting protein
structures with a multiplayer online game. Nature, 466(7307), 756–760. doi:10.1038/nature09304 PMID:20686574 (2017). Data Science Journal. Retrieved from (2017). The National Consortium for Data Science. Retrieved from http://
Davenport, T. H., & Harris, J. G. (2007). Competing on analytics. Boston, MA: Harvard Business Review Press.
Dee, S. (2015). How does capital one differentiate itself in the card industry? Forbes. Retrieved from https://www.forbes.
Drucker, P. (1995). The information executives truly need. Harvard Business Review, 73(1), 54–62.
Dumont, J. (2016). The data revolution. Techcrunch. Retrieved from
Ferguson, C., Marcus, A., & Oransky, I. (2014). Publishing: The peer-review scam. Nature, 515(7528), 480–482.
doi:10.1038/515480a PMID:25428481
Gardner, E. (2015). Look beyond academia to find jobs with a science Ph.D. USNews. Retrieved from https://
Gazzaniga, M. S. (1998). How to Change the University. Science, 282(5387), 237–237. doi:10.1126/
Gill, T. G. (2013). Culture, complexity, and informing: How shared beliefs can enhance our search for fit-ness
Informing Science. The International Journal of an Emerging Transdiscipline, 16, 71–98. doi:10.28945/1778
Global Higher Ed. (2007) The challenges of creating hybrid disciplines and careers: a view from Sweden.
Retrieved from
Hand, E. (2010). Citizen science: People power. Nature, 466(7307), 685–687. doi:10.1038/466685a PMID:20686547 (2017). International Journal of data science (IJDS) - Interscience Publishers. Retrieved
Volume 15 • Issue 2 • April-June 2019
James Manyika, M. C. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey. Retrieved
from (2017). Journal of data science. Retrieved from
Jennex, M. (2017). Big Data, the Internet of Things, and the Revised Knowledge Pyramid. The Data Base for
Advances in Information Systems, 48(4), 69–79. doi:10.1145/3158421.3158427
Jennex, M., & Bartczak, S. E. (2013). A Revised Knowledge Pyramid. International Journal of Knowledge
Management, 9(3), 19–30. doi:10.4018/ijkm.2013070102
Joynt, K. E., De Lew, N., Sheingold, S. H., Conway, P. H., Goodrich, K., & Epstein, A. M. (2017). Should
medicare value-based purchasing take social risk into account? The New England Journal of Medicine, 376(6),
510–513. doi:10.1056/NEJMp1616278 PMID:28029802
Kassirer, J. P., & Campion, E. W. (1994). Peer Review: Cured and understudied, but indispensable. Journal of
the American Medical Association, 272(2), 96–97. doi:10.1001/jama.1994.03520020022005 PMID:8015140
Kuhn, T. S., & Hawkins, D. (1963). The structure of scientific revolutions. American Journal of Physics, 31(7),
554–555. doi:10.1119/1.1969660 (2017). Big Data Retrieved from (2017). International Journal of data science and Analytics - Springer Retrieved from https://
Liyakasa, K. (2015). Alibaba CEO: ‘We’re not just an e-commerce company We’re a data company.’ Retrieved from
Madden, S. (2012). From Databases to Big Data. IEEE Internet Computing, 16(3), 4–6. doi:10.1109/MIC.2012.50
Money, W. H., & Cohen, S. J. (2018). Our Knowledge Management Hubble May Need Glasses: Designing for
Unknown Real-Time Big Data System Faults. International Journal of Knowledge Management, 14(1), 30–50.
Naur, P. (1992). Computing: A Human Activity -- Selected Writings From 1951 To 1990. Addison-Wesley, New York: ACM Press. (2017). Netflix Prize: Home. Retrieved from
Press, G. (2013). A very short history of data science. Forbes. Retrieved from
Rijmenam, M. (2015). A short history of big data. Retrieved from (2017). Scimago Journal and Country Rank. Retrieved from
Segal, I. (1962). Scientific discovery and the rate of invention. In The rate and direction of inventive activity:
economic and social factors (pp. 441-458). Princeton, NJ: Princeton University Press.
Spender, J.-C., & Scherer, A. G. (2007). The Philosophical Foundations of Knowledge Management: Editors’
Introduction. Organization, 4(1), 5–28. doi:10.1177/1350508407071858
Stichweh, R. (2001). History of Scientific Disciplines. In J. Wright (Ed.), History of International Encyclopedia of the
Social and Behavioral Sciences (pp. 287–290). Elsevier Science Direct. doi:10.1016/B978-0-08-097086-8.03048-8
Swan, M. (2013). The quantified self: Fundamental disruption in big data science and biological discovery. Big
Data, 1(2), 85–99. doi:10.1089/big.2012.0002 PMID:27442063
Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33(1), 1–67. doi:10.1214/
1 Graduate programs in the United States with “data science” in the title
... Os avanços no processamento das informações aumentaram a busca pela análise e exposição dos dados em tempo real, proporcionando uma nova maneira de produzir o conhecimento e envolver as pessoas. Nesse cenário, surgem novas atividades, nas quais as pessoas ainda não possuem o devido preparo para sua execução e, como tal, tem implicações diretas na GC (Priestley & McGrath, 2019). A transferência do conhecimento entre campos especializados é importante, mas para que ela ocorra os especialistas devem estar aptos para se comunicarem com os que estão fora de sua área de domínio, de modo a proporcionar vantagens aos envolvidos. ...
... Em vista disso, uma transferência de conhecimento bidirecional, da ciência para a sociedade e da sociedade para a ciência, facilita a mudança no comportamento e do gerenciamento de projetos (Breukers et al., 2011). Com este contexto, a GC tem um desafio, encontrar pessoas habilitadas para processar as informações (Priestley & McGrath, 2019). ...
... Especificamente, a PTD visa resolver um problema socialmente relevante com a aprendizagem mútua entre as disciplinas, pesquisadores e sociedade, de modo a facilitar a tomada de decisões e aplicar seus resultados (Peterson et al., 2010;Sigel et al., 2014). Outro aspecto é que as universidades não devem demorar a responder aos problemas sociais ou setoriais, precisam ser ágeis e incluir docentes focados no trabalho transdisciplinar (Priestley & McGrath, 2019). Mitchell, Cordell e Fam (2015) categorizaram os resultados da PTD em três domínios, (a) situação que melhora o campo de investigação; (b) cria fluxos de conhecimento tanto científicos como não científicos, com a participação ativa dos interessados; e (c) aprendizagem mútua e transformacional dos pesquisadores, participantes e mantenedores que aumenta a probabilidade de mudança permanente. ...
Conference Paper
Full-text available
A gestão do conhecimento caracteriza-se pela capacidade das organizações em gerir seu capital intelectual. A abordagem transdisciplinar rompe as barreiras disciplinares para criar unidade de conhecimento. Ambas perpassam as diferentes áreas do conhecimento e unificam os saberes. O objetivo foi analisar como a gestão do conhecimento tem se apropriado da abordagem transdisciplinar, por meio da revisão integrativa da literatura. A utilização da abordagem transdisciplinar facilita a gestão do conhecimento, nos quesitos transferência e uso do conhecimento, nas tomadas de decisões e nas resoluções dos problemas. Como conclusão, a gestão do conhecimento tem muito a evoluir com a abordagem transdisciplinar, porque não é apenas união de conhecimentos, mas uma transformação cultural, uma nova forma de se posicionar diante da complexidade dos problemas.
... A drive arose to track and quantify seemingly everything and created a "datafication" of society [10], especially the increasing use of big data to explain and predict human behaviors [11]. These also helped created the transdisciplinary field of data science since professionals were needed who could become "scientists of data" rather than subject matter experts who were familiar with their specific type of data [12]. Over the past decade, analytics performed by data scientists have worked to increase knowledge in this space and influenced individual, family, and business decisions [13]. ...
... At their most basic, data are discrete facts that are unaggregated, raw, and have not been converted into information (Jennex & Bartczak, 2013). More broadly, data are an abstraction of a realworld phenomenon (Kelleher & Tierney, 2018) that are typically conceptualized as the numerical depiction of measurable phenomena (Priestley & McGrath, 2019). For this paper, data are "an atomic unit that can be captured-measured, seen or heard-and thus extracted, analyzed, and converted into information and ultimately into new knowledge" (Priestley & McGrath, 2019, p. 97). ...
Full-text available
The growth and expansion of "Big Data" is fundamentally changing public service delivery. Big Data is getting "bigger," and public organizations will have new opportunities to cultivate and challenges to address. To understand the effects of the growth of data on public organizations, we introduce the Public Data Primacy (PDP) theoretical framework, which builds on existing scholarship through four propositions about data, technology, and its use in the public sector. The framework posits that public sector work will become increasingly data-centric as data continues to get "bigger." Ultimately, the PDP leads to two predictions about the public sector. First, we predict that the primacy of data in the delivery of public services is inevitable. Second, this forthcoming reality will require public servants to adopt new models of public service oriented around data. The PDP theoretical framework provides a systematic lens in which public administration scholarship can evaluate the future of data growth and its impacts upon public service delivery. Keywords Big data · AI · Public service delivery The growth of "Big Data" is fundamentally altering how public organizations make decisions and provide public services. Public employees plan and deliver public
... So, data science has become a combination of several scientific disciplines that closely link with mathematics. Today, there is a plan to establish the position of data science as knowledge (Lausen, Böhmer, & Krolak-Schwerdt, 2015;Priestley & McGrath, 2019;Oyamada, 2019;Vedula & Hager, 2020). That is by establishing a Study Program of Data Science at Universitas Sumatera Utara (Nasution, Sitompul, & Nababan, 2020). ...
Full-text available
Physics, biology, chemistry, for example, do not already only study programs in many universities but are at the forefront of the development of science, and that is science itself. The development of the basic foundation becomes the mother of science, such as mathematics, as a suspicion of the birth of new sciences. The interests related to the life and welfare of humans always invites the possibility of the birth of science in concepts, but mathematics confirms it. In certain cases, as an application, or it gives birth to answers in technology form, which is distorted according to the social demands of society. Every science has its methodology, which is based on the logic that helps to reason. A natural model is an approach to generating methods, as well as the principles that develop in artificial intelligence based on mathematics. This principle not only changes the way of perceiving data, but it also establishes new definitions of data, and changes statistics, optimization, and other fields. Each science also has implications for other sciences but also has applications that are also related to other sciences. Every science supports technology to improve social welfare. Historically, century by century, many new scientific fields were born, and it has evidence from the publication of scientific works to books as scientific standards. Then, the teaching curricula supported its initial dissemination and its development through research with evidence of scientific publications either through papers in proceedings of scientific meetings or articles in the journals. A study program about science in the related faculty is mandatory, and it is as the spearhead in further scientific and technological development. So everything starts and ends at one point, namely the birth of new science, as the birth of data science. This paper describes the birth of science by involving historical traces. The reflection of scientific development is starting from the foundation to become a scientific field, namely data science. It also illustrates the roadmap of a scientific indirectly.
The new generation of digital intelligence technology enables knowledge creation, dissemination, and application to undergoing parallel changes. Scientific systems face an increasingly uncertain, diverse, and complex environment, making adopting multidisciplinary, interdisciplinary, and transdisciplinary approaches to research issues inevitable. Existing scientific systems follow linear value streams, leading to problems, such as inefficiency, unfairness, and knowledge monopoly. Decentralized science (DeSci) is a new scientific development paradigm based on Web3, Metaverses, and decentralized autonomous organizations and operations (DAOs) technologies, that can solve organizational and management problems in scientific systems through organizing, coordinating, and executing techniques. However, new economic theories and methods are still needed to effectively solve the problem of linear value flow in scientific systems. Metaeconomics based on the parallel intelligence theory, also known as decentralized economics (DeEco), provides a new approach and idea for redesigning the economic system of scientific markets. Thus, this article proposes a research framework and core mechanisms of DeSci MetaMarkets based on parallel economic theory to provide effective and practical methodologies for scientific system governance.
Full-text available
This is a report on how much lack or neglect of requirements formulation affects data quality in information systems. The issue of data quality in information systems is centred around their conceptual data models as they capture all the data perspectives that the systems manage. Investigations on case studies In a way of triangulation were attempted to find out that there are over 30 percent (near 40 percent) of unnecessary data replication or redundancy when elicitation of requirements is poor, which is serious enough to impel the success or failure of information systems.
Full-text available
Using big data in organizations has the potential to improve innovation, accuracy, and efficiency. Big data is also connected with risks for both the organization and society at large. It is therefore vital to improve our understanding of the potential consequences of implementing and using big data. The researchers studied the Swedish Transport Administration to understand their attitude towards implementing big data to predict, for example, the need for road maintenance. The analysis identified four moral dilemmas that the organization deals with in connection to big data. The researchers discuss these dilemmas from the perspective of practical wisdom. Practical wisdom is manifested in context-dependent actions connected to open-mindedness, reflection and judgment. It can be summed up as “the reasonable thing to do” in a unique situation where “not-knowing” is a helpful resource when making wise decisions. This paper seeks to shed light on the importance of practical wisdom when implementing big data.
Full-text available
Background: Although electronic health record systems are critical for healthcare management, there has been genuine concern about the quantity and quality of data generated by these systems inhibiting its full implementation. Objectives: The purpose of this article was to explore the experiences of healthcare workers (HCWs) and challenges facing the acceleration of the District Health Information System (DHIS) in the KwaZulu-Natal (KZN) province of South Africa. Methods: In this study, an interpretive research paradigm was followed to explore the current state of electronic health in South Africa from the experiences of HCWs in the KZN province. Semi-structured focus group interviews conducted with 20 participants drawn from the district office, clinical nurse practitioners and data capturers allowed thematic analysis of data using a systems approach to link the perspectives HCWs to the design of the DHIS. Results: The participants held the view that e-health is crucial for monitoring disease trends, policy development, planning and allocation of infrastructure, information technology (IT), financial and human resources. Nevertheless, the participants highlighted a concern surrounding e-health regulations, ethics and data confidentiality; data quality and lack of interoperability of Health Information Systems (HIS). This concern was attributed to data fragmentation, internal politics and lack of coordination of the data system. Conclusions: The study suggests that good quality data – from an integrated DHIS, is highly critical for the effective utilisation, implementation and acceleration of e-health systems in the province to support epidemiological surveillance and modelling of outbreaks, such as the COVID-19 pandemic.
Full-text available
This article analyzes the properties of unknown faults in knowledge management and Big Data systems processing Big Data in real-time. These faults introduce risks and threaten the knowledge pyramid and decisions based on knowledge gleaned from volumes of complex data. The authors hypothesize that not yet encountered faults may require fault handling, an analytic model, and an architectural framework to assess and manage the faults and mitigate the risks of correlating or integrating otherwise uncorrelated Big Data, and to ensure the source pedigree, quality, set integrity, freshness, and validity of the data. New architectures, methods, and tools for handling and analyzing Big Data systems functioning in real-time will contribute to organizational knowledge and performance. System designs must mitigate faults resulting from real-time streaming processes while ensuring that variables such as synchronization, redundancy, and latency are addressed. This article concludes that with improved designs, real-time Big Data systems may continuously deliver the value of streaming Big Data. Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Full-text available
The knowledge pyramid has been used for several years to illustrate the hierarchical relationships between data, information, knowledge, and wisdom. An earlier version of this paper presented a revised knowledge-KM pyramid that included processes such as filtering and sense making, reversed the pyramid by positing there was more knowledge than data, and showed knowledge management as an extraction of the pyramid. This paper expands the revised knowledge pyramid to include the Internet of Things and Big Data. The result is a revision of the data aspect of the knowledge pyramid. Previous thought was of data as reflections of reality as recorded by sensors. Big Data and the Internet of Things expand sensors and readings to create two layers of data. The top layer of data is the traditional transaction / operational data and the bottom layer of data is an expanded set of data reflecting massive data sets and sensors that are near mirrors of reality. The result is a knowledge pyramid that appears as an hourglass.
The scientific discipline, as the primary unit of internal differentiation of science, is an invention of nineteenth century society. There exists a long semantic prehistory of disciplina as a term for the ordering of knowledge for purposes of instruction in schools and universities. But only the nineteenth century established real disciplinary communication systems. They are based on specialization of scientists, role differentiation in the organizations of science, the emergence of standard forms of scientific publication, and the rise of the research imperative which demands an incessant search for novelties. All these structural changes coalesce in the disciplinary community as a new type of communication system in science. After that the discipline functions as unit of structure formation in the social system of science, in systems of higher education, as a subject domain for teaching and learning in schools, and finally, as a designation of occupational and professional roles. Although the processes of differentiation in science have been going on ever since, the scientific discipline as a basic unit of structure formation is stabilized by these plural roles in different functional contexts of modern society. Finally, the individual discipline is embedded in an internal environment of other disciplines. The continuous mutual observation and interaction of these disciplines is the most important factor in the dynamics of modern science.
A new report shows that Medicare beneficiaries with social risk factors had worse outcomes than other beneficiaries on many quality measures, and providers that disproportionately served such beneficiaries tended to have worse performance on quality measures.
Open competitions bring new minds, skills and collaborations to problems in biomedical research.