ArticlePDF Available

Abstract

Purpose Many organizations are seeking unicorn data scientists, that rarest of breeds that can do it all. They are said to be experts in many traditionally distinct disciplines, including mathematics, statistics, computer science, artificial intelligence, and more. The purpose of this paper is to describe authors’ pursuit of these elusive mythical creatures. Design/methodology/approach Qualitative data were collected through semi-structured interviews with managers/directors from nine Australian state and federal government agencies with relatively mature data science functions. Findings Although the authors failed to find evidence of unicorn data scientists, they are pleased to report on six key roles that are considered to be required for an effective data science team. Primary and secondary skills for each of the roles are identified and the resulting framework is then used to illustratively evaluate three data science Master-level degrees offered by Australian universities. Research limitations/implications Given that the findings presented in this paper have been based on a study with large government agencies with relatively mature data science functions, they may not be directly transferable to less mature, smaller, and less well-resourced agencies and firms. Originality/value The skills framework provides a theoretical contribution that may be applied in practice to evaluate and improve the composition of data science teams and related training programs.
Unicorn data scientist:
the rarest of breeds
SašaBaškarada and Andy Koronios
University of South Australia, Mawson Lakes, Australia
Abstract
Purpose Many organizations are seeking unicorn data scientists, that rarest of breeds that can do it all.
They are said to be experts in many traditionally distinct disciplines, including mathematics, statistics,
computer science, artificial intelligence, and more. The purpose of this paper is to describe authorspursuit of
these elusive mythical creatures.
Design/methodology/approach Qualitative data were collected through semi-structured interviews with
managers/directors from nine Australian state and federal government agencies with relatively mature data
science functions.
Findings Although the authors failed to find evidence of unicorn data scientists, they are pleased to report
on six key roles that are considered to be required for an effective data science team. Primary and secondary
skills for each of the roles are identified and the resulting framework is then used to illustratively evaluate
three data science Master-level degrees offered by Australian universities.
Research limitations/implications Given that the findings presented in this paper have been based on a
study with large government agencies with relatively mature data science functions, they may not be directly
transferable to less mature, smaller, and less well-resourced agencies and firms.
Originality/value The skills framework provides a theoretical contribution that may be applied in practice
to evaluate and improve the composition of data science teams and related training programs.
Keywords Data analytics, Skills, Definition, Framework, Data science, Business analytics
Paper type Research paper
1. Introduction
Data science is an emerging applied discipline focused on facilitating organizational decision
making through the development of statistical models that extract knowledge from raw data
(Patil and Davenport, 2012). Extracted knowledge may describe what happened, explain why
something happened, and predict what may or is likely to happen. In spite of the current
popularity of data science, many organizations lack clear understanding of the required roles
(e.g. data scientist, data analyst, data engineer, business expert, system expert, and software
engineer) and skills (e.g. domain, information technology, and quantitative) (Linden et al., 2015;
Harris et al., 2013). For instance, it is frequently stated that a data scientist is someone who is
better at programming than a statistician and better at statistics than a computer scientist.
Noting that data science requires domain knowledge and a broad set of quantitative skills,
Waller and Fawcett (2013) highlight that there is a dearth of literature on the topic and many
questions(p. 77). Accordingly, they call for more research on skills that are needed by data
scientists. Given the breadth of skills required, they conclude that it may not be realistic to
expect any one person to possess all the relevant expertise. Nevertheless, short of including
virgins[1] in their employee benefits packages, companies are making every effort to attract
data scientists who can to it all (Press, 2015). Given their almost mythical status, a growing
number of data professionals are starting to refer to such rare individual, who are said to excel
in a wide range of traditionally distinct disciplines, as unicorns (Stodder, 2015; Bertolucci, 2013).
Yet, it is surprising to note that scholarly literature has so far failed to investigate the nature of
this potentially new species. That is the purpose of this paper.
2. Literature review
There is a substantial overlap between data science, data analytics, and Big Data
organizational capabilities (Laney et al., 2015). Nevertheless, data science is generally viewed
Program
Vol. 51 No. 1, 2017
pp. 65-74
© Emerald Publishing Limited
0033-0337
DOI 10.1108/PROG-07-2016-0053
Received 11 July 2016
Revised 8 December 2016
Accepted 15 December 2016
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0033-0337.htm
65
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
as a set of fundamental principles that support and guide the principled extraction of
information and knowledge from data(Provost and Fawcett, 2013). Proposed core skills
include mathematics and statistics (e.g. data mining, hypothesis testing, and predictive
analytics), computer science (e.g. data structures and algorithms), and domain expertise
(Dhar, 2013; Finzer, 2013). Other required skills include data integration, transformation,
and loading, as well as data visualization (Yang and Liu, 2013).
By tracing the evolution of business intelligence and analytics from structured
content in relational database management systems and data warehouses (currently widely
adopted in industry), through unstructured Web 2.0-based content (currently
widely researched in industry and academia), to mobile and sensor-based
content (an emerging area of research), Chen et al. (2012) identify a number of relevant
skills, including analytical and IT skills (drawing from statistics and computer science),
business and domain knowledge (e.g. accounting, finance, management, marketing,
logistics, and operation management), and communication skills required to interact with
relevant decision makers. They conclude that due to its emphasis on key data management
and information technologies, business-oriented statistical analysis and management
science techniques, and broad business discipline exposure(p. 1182), the discipline of
information systems is uniquely placed to provide a valuable contribution (Lee and
Mirchandani, 2010; Stevens et al., 2011).
A study commissioned by the Joint Information Systems Committee, a UK
non-departmental public body focused on championing the importance and potential of
digital technologies in UK education and research, found that data scientists need a
wide range of skills, including domain expertise, computing, and people skills (Swan and
Brown, 2008). The study noted that although there is some variation in the skills that are
possessed by data scientists, they are all expected to have at a substantial competency in the
domain in which they operate.
Based on a survey with 250 respondents, Harris et al. (2013) identified five skill groups
that are applicable to data scientists, including business, machine learning and Big Data,
math and operations research, programming, and statistics. Depending on their level of
competence in each of these skills sets, data scientists may then either tackle the entire
analytics process, or predominantly focus on technical problems of managing data, research
methods and statistics, or deriving business value through analytics.
Although there is a widespread belief that hard scientists ( particularly physicists) tend
to produce best data scientists (Loukides, 2011), other professionals that may assume data
science roles include data mining experts, operations researchers, statisticians, actuaries,
econometricians, equity analysts, process control engineers, and the like (Linden et al., 2015).
Even though library engagement in research data management is a relatively recent
phenomenon (Corrall et al., 2013), and the relevant roles and responsibilities are yet to be
settled (Corrall, 2012; Cox and Pinfield, 2014; Madrid, 2013; Xia and Wang, 2014;
Cox and Corrall, 2013; Cassella and Morando, 2012), librarians and information science
professionals may contribute vital data curation, preservation, and archiving skills to
ensure safe custody of research outputs (Swan and Brown, 2008; Pryor and Donnelly, 2009).
Based on a systematic review of 600 peer-reviewed library and information science papers
published between 2000 and 2014 in English, Vassilakaki and Moniarou-Papaconstantinou
(2015) identify six roles that information professionals have adopted, two of which are of
particular relevance to data science. Technology specialists may facilitate the development,
management, and promotion of institutional repositories for research output, while
knowledge mangers may also contribute to the management of such repositories as well as
facilitate relevant communication and knowledge flows throughout the organization.
Corrall (2010) notes the emergence of composite, hybrid and blended library and information
science professionals as evidenced by overlapping roles and broad skillsets. She classifies
66
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
data science as a hybrid specialty comprising aspects of information technology and media
(conduit), library and information science (content), and academic and professional
discipline (context) expertise. Librarians and information science professionals may also
provide support for public engagement with science, as well as facilitate public access to
research data sets (Lyon, 2012).
Undergraduate data science degrees have focused on data visualization, data
manipulation/wrangling, computational statistics, machine learning, as well as related
topics like spatial analysis, text mining, network science, and Big Data (Baumer, 2015;
Hardin et al., 2015). Others have also emphasized oral and written communication as well as
social, ethical, and legal issues (Anderson et al., 2014). Given the multidisciplinary nature ofthe
topic, it has been observed that the teaching of individual data science courses as part of more
general (e.g. business) degrees presents a particular challenge as many students may lack
relevant background knowledge (Wang and Gu, 2016).
3. Method
Qualitative data were collected through semi-structured interviews (DiciccoBloom and
Crabtree, 2006) with nine managers/directors from nine Australian state and federal
government agencies with relatively mature data science functions. Their official titles
included such descriptors as research, innovation, analytics, and policy (e.g. manager policy
and research, and director enterprise analytics). While some were responsible for relatively
small teams comprising approximately five people, others were responsible for several
dozen specialist staff. As this study adopts a qualitative rather than a quantitative
approach, there was no requirement to select a statistically representative sample.
Instead, the interviewees were identified through personal contacts and selected based on
their professional roles and willingness/availability to participate in the study. Being
semi-structured, the interviews were guided by a number of high-level questions
(see Appendix) pertaining to the nature of the relevant work being conducted in each
agency, key roles and skills, team composition, and broad challenges and opportunities.
Ad hoc probing and follow-up questions were employed to seek clarification, elicit
additional information, and explore emerging themes (Baškarada, 2014). Data analysis,
which occurred concurrently with data collection, employed the constant comparative
method to identify and categorize key constructs (Glaser, 1965).
The resulting framework was then used to illustratively evaluate three data science
Master-level degrees offered by three Australian universities. As the objective was not to
produce any universal generalizations, but instead to simply illustrate the applicability of
the framework developed, the universities were selected in a haphazard manner. Several
universities/degrees were excluded from the analysis because they did not provide
sufficiently detailed course descriptions online.
4. Results and discussion
The authors failed to find evidence of a unicorn data scientist. In other words,
all interviewees agreed that it is unrealistic to expect one person to have the same level of
expertise in a number of distinct disciplines as more specialized experts can. Instead they all
sought to build effective multidisciplinary teams. Six key roles that are considered to be
required for an effective data science team are outlined below.
4.1 Roles
Six key roles that are considered to be required for an effective data science team
include domain expert, data engineer, statistician, computer scientist, communicator,
and team leader.
67
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
4.1.1 Domain expert. This study confirmed the absolute importance of domain expertise,
one of the most frequently quoted data science skills in the literature (Linden et al., 2015;
Waller and Fawcett, 2013; Dhar, 2013; Finzer, 2013; Chen et al., 2012; Swan and Brown, 2008;
Laney et al., 2015). One of the participants observed: You have to have a very good
understanding of processes and government policy. Its absolutely critical.Without domain
expertise, data scientists (or data science teams) lack context needed to interpret raw data
into meaningful information (Baškarada and Koronios, 2013). Furthermore, the ability to ask
relevant questions, generate relevant hypotheses, and ultimately interpret results is
underpinned by deep domain expertise. Given the large variety and complexity of many
organizations, continuous access to other business experts outside of the data science team
is also important (Linden et al., 2015). For instance, a number of interviewees explained how
they validateanalytical insights through workshops with subject matter experts.
Accordingly, domain experts work very closely with all other team members with the
possible exception of computer scientists.
4.1.2 Data engineer. Literature identifies both data preparation and data quality as
critical inputs to effective analytics (Herschel et al., 2015; Randall and Beyer, 2014;
Baškarada and Koronios, 2014). Most of the interviewees referred to the garbage in,
garbage outprinciple, emphasizing the importance of high-quality data. It has been
observed that depending on the volume, velocity, variety, and veracity of data, data
preparation, which includes extraction, cleaning, enrichment, and transformation,
can consume up to 80 percent of effort (Linden et al., 2015). This was confirmed by many
interviewees, with one of them noting: Most of our effort goes on data wrangling and
cleaning. All these data in data lakes are of no use unless they are first properly prepared.
In contrast to business intelligence systems, which operate on semantically consistent data
warehouses (which transform all data into a common format), data science teams may
operate on semantically inconsistent data lakes (which keep all data in their original format)
(Heudecker and White, 2014). Accordingly, in contrast to business intelligence systems,
which require relatively infrequent data preparation (only when the data warehouse is built
or modified), data science efforts require ongoing data preparation. As a result, having a
data engineer as a permanent member of a team is much more important in the context of
data science than in the context of business intelligence.
4.1.3 Statistician. Statisticians are at the core of data science teams. They form a bridge
between domain experts, data engineers, and computer scientists. For instance, they may
refine and formalize questions and ideas from domain experts, request relevant data from data
engineers, and guide computer scientists in relation to data analysis. One of the interviewees
observed: Data science is statistics plus. Statistics is at the core of everything we do.Given
that data science efforts are increasingly undertaken in the context of Big Data, statisticians
require special expertise for dealing with large data sets. For instance, they need to be able to
identify and maximize opportunities for automation. According to one participant: Itsnot
just your traditional stats. With Big Data the focus is shifting to data mining and machine
learning.In addition to traditional skills like experimental design and hypothesis testing,
these statisticians also require a solid understanding of skills that are at the intersection of
statistics and computer science. As such, their approach needs to be much more applied than
the approach traditionally followed by academic statisticians. For instance, academic
statisticians are traditionally conservative in terms of being very cautious about making
inferences to unobserved events and entities. This conservativism may still have its place in
some applications of data science (e.g. health and public policy), but may need to be relaxed in
other problem domains with less inherent risk. As one participant observed: We dontneed
academic rigor. Its much more important to produce something quickly. We dontneeda
100% solution; 80% is usually good enough.
68
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
4.1.4 Computer scientist. Exponential growth in the volume, velocity and variety of
data has led to the development of new Big Data software tools and technologies
(e.g. Apache Hadoop, Map reduce, and Spark). Computer scientists require proficiency in
such tools and technologies, relevant programming languages like R and Python, as well as
cluster and cloud computing in order to implement and optimize (e.g. in the context of
real-time analytics) processing (e.g. sorting, aggregating, searching, matching,
and concatenating) and analysis of large data sets. One of the interviewees noted:
These are very complex tools, and they are rapidly evolving, too. We need someone to keep
on top of all the latest developments, like the Apache stack. Thats a full-time job.Given
that much (perhaps most) data are unstructured; computer scientists also require skills in
text analytics and natural language processing. In general, study participants placed strong
emphasis on agile processes and open source software.
4.1.5 Communicator. From a practical perspective, data science is largely pointless
unless it can affect organizational change. Accordingly, the ability to effectively communicate
with relevant decision makers becomes critical. This includes exploration of relevant
problems and opportunities as well as communication of eventual results. As relevant decision
makers frequently do not have advanced statistical skills, any findings need to be presented
in a form that it visually appealing, easy to understand, and ultimately convincing. At the
same time, complexity, simplifying assumptions, and contextual dependencies also need to be
appreciated and effectively communicated. It was frequently observed that decision-makers
do not want data, they want answers.As such, storytelling becomes a critical skill.
An interviewee observed: We need to challenge their (decision-makers) assumptions, change
their mental models. These are time-poor people, so we need to be able to capture their
attention quickly.This requires a different approach to the one followed by academic
statisticians who have traditionally communicated with other statisticians. Communicators
form a bridge between data science teams and relevant decision makers. Internally, within the
data science team, communicators form a bridge between the team leader, the statistician, and
the domain expert.
4.1.6 Team leader. This role is most like the mythical unicorn in the sense that the team
leader requires some understanding of all the other roles in order to bring everyone together,
manage resources, tasks, and deliverables. One interviewee observed: I have been around
for a while. I have been in similar roles for more than 30 years.In addition to requiring
extensive project management expertise, the team leader is responsible for ensuring that
any ethical, privacy, and security norms and expectations are adhered to. Working closely
with the communicator, the team leader is responsible for developing relevant business
cases and estimating expected return on investment.
4.2 Primary and secondary skills
Although each role is associated with, and based on, primary expertise, no role can operate
in isolation. In other words, in order to enable interaction within a data science team,
each role requires one or more secondary skills. Table I details primary and secondary skills
for each of the roles identified in this paper. As such, it identifies the degree of interaction
between the roles. It also highlights that interactions between the roles are asymmetric,
and identifies the degree of asymmetry.
For instance, although the data engineer requires some domain expertise in order to be able
to seek relevant information and guidance from the domain expert, domain experts may be
able to provide such information and guidance even if they have no data preparation skills.
As there is no need for any direct interaction between the domain expert and the
computer scientist, domain experts generally do not require any computer science skills and
vice versa. Domain experts do, however, require some statistical skills in order to facilitate
69
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
identification/generation of relevant questions and hypotheses, as well as interpretation of
results. They do not necessarily require any specialist communication expertise. Besides
requiring some domain expertise in order to be able to extract, clean, enrich, and transform
relevant data, data engineers do not necessarily require any other secondary skills. Statisticians
requiresomedomainexpertiseinordertobeabletorefineandformalizequestionsandideas
from domain experts, and some data preparation expertise in order to be able to provide
informed guidance to data engineers. They do not necessarily require specialized computer
science, communication, or management skills. Computer scientists do not necessarily require
any domain expertise, communication, or management skills. They do, however, require some
data preparation skills, and reasonably advanced statistical skills. Communicators require
significant domain expertise in order to effectively communicate with relevant decision makers.
They also require some statistical skills in order to be able to present analytical findings in a
form that is visually appealing, easy to understand, and ultimately convincing. As noted above,
team leaders are most like the mythical unicorns in the sense that they require some
understanding of all the other roles in order to bring everyone together, manage resources,
tasks, and deliverables.
Table I indicates that the team leader role requires the greatest breadth of skills, and that
domain expertise and statistics are at the core of data science. They are closely followed by
data preparation skills. In contrast to the core skills, computer science, communication,
and management skills are somewhat less central, although not necessarily less important.
4.3 Culture
A scientific (as opposed engineering) approach to data science implies a certain culture.
For instance, as outcomes are by definition not known at the start (as opposed to
engineering where one starts with a predefined outcome), failure needs to be expected and
accepted. This requires supportive leadership and cultural environment, with sufficient time
and resources to test new approaches and ideas, as well as a mechanism for implementing
good ideas. The primary focus of data science teams should be on developing proof of
concept prototypes. Accordingly, such teams should not be expected to deliver mature,
production-level products. Instead, separate software engineering teams should be engaged
for that purpose. A scientific approach also implies that organizational data science efforts
represent an iterative journey rather than a destination.
5. Applying the framework
Next, we use the above framework to illustratively evaluate three data science Master-level
degrees offered by three Australian universities. Two of those (case A and case C)
have a duration of two years full-time, while the third one (case B) has a duration of
one year full-time. Table II details relevant academic and professional admission
requirements for each case. Table III details courses comprising each degree and skills
developed in each course. The mapping between courses and applicable skills was based on
course descriptions provided on the universitieswebsites.
Table I.
Primary and
secondary skills
70
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Cases A and C have a reasonable coverage across the skill areas identified in our framework.
Case B, on the other hand, addresses domain expertise and management skill requirements
only in the final capstone project. The lack of coverage in case B may partly be attributed to
its shorter duration of one year full-time, in contract to two years full-time for cases A and C.
Nevertheless, most courses, with the exception of electives are generally very broad, thus,
offering limited opportunity for specialization. This is particularly acute in case B, where the
two elective courses do not even have to be related to data science. It may be argued that this
limitation is somewhat offset by the academic admission requirements, which serve to select
students with quantitative expertise. However, given the lack of any communication and
management prerequisites for admission, it may be argued that cases B and C do not
provide sufficient opportunities for deep specialization in these skill areas. In contrast,
Case Academic Professional
A A Bachelors degree in mathematics, computer science, physics,
engineering, accounting, finance, or economics
At least three years of
professional experience
B An Honors degree, a graduate certificate, or a graduate diploma in
mathematics, computer science, statistics, physics, engineering,
economics, or finance None
C A Bachelors degree in mathematics or information technology,
or a graduate certificate/diploma in data science None
Table II.
Admission
requirements
Case Course DE DP S CS C M
A Introduction to Data Science ||||||
Statistics for Data Science ||||
Data and Algorithms |||||
Project Management ||
Visualization and Communication |
Evidence-Based Decision Making ||||
Project 1 ||||
Project 2 ||||||
Specialized Elective ×4||||||
B Introduction to Data Science ||||
Data Mining |||
Elective ×2||||
Information Visualization ||
Computational Statistics ||
Capstone Project ||||||
C Big Data || |||
Programming for Data Science ||
Elective ×2|||||
Predictive Analytics ||||
Machine Learning ||
Project 1 ||||||
Social Media Analytics ||
Customer Analytics |||
Project 2 ||||
Advanced Analytics 1 ||
Advanced Analytics 2 |||
Capstone Project ||||||
Notes: DE, domain expertise; DP, data preparation; S, statistics; CS, computer science; C, communication;
M, management
Table III.
Courses
71
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
case A offers two courses specifically focused on project management, and visualization and
communication. Furthermore, four specialized case A elective courses provide an
opportunity for deeper specialization in several relevant skill areas, including research
methods, advanced statistics, databases, and software development, ethics, law, policy, and
so on. In addition, a variety of case studies are used to illustrate the relevance of domain
expertise. The admission requirement of at least three years of professional experience is also
useful for ensuring some familiarity with teamwork and management concepts. In case C, two
elective courses are used to provide an opportunity to students with an information
technology background to develop skills in statistics and probabilities, and to students with a
mathematics background to develop skills in relational databases and warehouses, as well as
business intelligence and analytics.
The above analysis indicates that these universities aim to produce quasi-unicorns. Given
the limited opportunity for specialization in the skill areas identified in our framework, prior
academic and professional experience becomes critical. Without deep expertise in any of the
roles identified in our framework, such graduates may not be able to effectively contribute to
multidisciplinary data science teams. Nevertheless, they may prove valuable to smaller
agencies and firms with limited resources who may have to rely on such quasi-unicorns.
6. Conclusion
While many universities are now offering degrees in data science, and many organizations are
seeking to hire individual data scientists, the findings presented in this paper suggest that it
may be more beneficial to view data science from a multidisciplinary team perspective.
The paper identified six key roles considered essential for an effective data science team, and
shared skills required for effective within-team interaction. The skills framework provides a
theoretical contribution that may be applied in practice to evaluate and improve the
composition of multidisciplinary data science teams and related training programs. However,
given that our findings have been based on a study with large government agencies with
relatively mature data science functions, they may not be directly transferable to less mature,
smaller, and less well-resourced agencies and firms, who may instead have to rely on individual
unicorndata scientists. Given that the illustrative case studies highlighted a potential gap in
opportunities for academic specialization in relation to the roles identified in our framework,
future studies may wish to explore how higher education institutions may effectively partner
with private and public organizations in order to address this potential problem.
Note
1. Those unfamiliar with the reference may wish to note that according to medieval lore unicorns are
only tamable by virgins who, as a result, may be used by hunters as unicorn bait.
References
Anderson, P., Bowring, J., Mccauley, R., Pothering, G. and Starr, C. (2014), An undergraduate degree in
data science: curriculum and a decade of implementation experience,Proceedings of the
45th ACM Technical Symposium on Computer Science Education, ACM, pp. 145-150.
Baškarada, S. (2014), Qualitative case study guidelines,The Qualitative Report, Vol. 19 No. 40, pp. 1-25.
Baškarada, S. and Koronios, A. (2013), Data, information, knowledge, wisdom (DIKW): a semiotic
theoretical and empirical exploration of the hierarchy and its quality dimension,Australasian
Journal of Information Systems, Vol. 18 No. 1, pp. 5-24.
Baškarada, S. and Koronios, A. (2014), A critical success factor framework for information quality
management,Information Systems Management, Vol. 31 No. 4, pp. 276-295.
Baumer, B. (2015), A data science course for undergraduates: thinking with data,The American
Statistician, Vol. 69 No. 4, pp. 334-342.
72
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Bertolucci, J. (2013), Are you recruiting a data scientist, or unicorn?,InformationWeek, available at:
www.informationweek.com/big-data/big-data-analytics/are-you-recruiting-a-data-scientist-or-
unicorn/d/d-id/899843 (accessed November 12, 2015).
Cassella, M. and Morando, M. (2012), Fostering new roles for librarians: skills set for repository
managers results of a survey in Italy,Liber Quarterly, Vol. 21 Nos 3/4, pp. 407-428.
Chen, H., Chiang, R.H. and Storey, V.C. (2012), Business intelligence and analytics: from Big Data to
big impact,MIS Quarterly, Vol. 36 No. 4, pp. 1165-1188.
Corrall, S. (2010), Educating the academic librarian as a blended professional: a review and case
study,Library Management, Vol. 31 Nos 8/9, pp. 567-593.
Corrall, S. (2012), Roles and responsibilities: libraries, librarians and data, in Pryor, G. (Ed.), Managing
Research Data, Facet, London, pp. 141-151.
Corrall, S., Kennan, M.A. and Afzal, W. (2013), Bibliometrics and research data management services:
emerging trends in library support for research,Library Trends, Vol. 61 No. 3, pp. 636-674.
Cox, A.M. and Corrall, S. (2013), Evolving academic library specialties,Journal of the American
Society for Information Science and Technology, Vol. 64 No. 8, pp. 1526-1542.
Cox, A.M. and Pinfield, S. (2014), Research data management and libraries: current activities and
future priorities,Journal of Librarianship and Information Science, Vol. 46 No. 4, pp. 299-316.
Dhar, V. (2013), Data science and prediction,Communications of the ACM, Vol. 56 No. 12, pp. 64-73.
DiciccoBloom, B. and Crabtree, B.F. (2006), The qualitative research interview,Medical Education,
Vol. 40 No. 4, pp. 314-321.
Finzer, W. (2013), The data science education dilemma,Technology Innovations in Statistics
Education, Vol. 7 No. 2, pp. 1-9.
Glaser, B.G. (1965), The constant comparative method of qualitative analysis,Social Problems, Vol. 12
No. 4, pp. 436-445.
Hardin, J., Hoerl, R., Horton, N.J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R., Roback, P.
and Temple Lang, D. (2015), Data science in statistics curricula: preparing students to
think with data’”,The American Statistician, Vol. 69 No. 4, pp. 343-353.
Harris, H., Murphy, S. and Vaisman, M. (2013), Analyzing the Analyzers: An Introspective Survey of Data
Scientists and Their Work,OReilly Media.
Herschel, G., Linden, A. and Duncan, A.D. (2015), Seven Best Practices for Your Big Data Analytics
Projects, Gartner, Stamford, CT.
Heudecker, N. and White, A. (2014), The Data Lake Fallacy: All Water and Little Substance, Gartner,
Stamford, CT.
Laney, D., Kart, L., Jain, A. and Linden, A. (2015), How Data Scientist Skills and Qualifications Differ
from Those of BI Analysts and Statisticians, Gartner, Stamford, CT.
Lee, K. and Mirchandani, D. (2010), Dynamics of the importance of IS/IT skills,Journal of Computer
Information Systems, Vol. 50 No. 4, pp. 67-78.
Linden, A., Kart, L., Randall, L., Beyer, M.A. and Duncan, A.D. (2015), Staffing Data Science Teams,Gartner,
Stamford, CT.
Loukides, M. (2011), What is Data Science? OReilly Media, Inc.
Lyon, L. (2012), The informatics transform: re-engineering libraries for the data decade,International
Journal of Digital Curation, Vol. 7 No. 1, pp. 126-138.
Madrid, M.M. (2013), A study of digital curator competences: a survey of experts,The International
Information & Library Review, Vol. 45 Nos 3/4, pp. 149-156.
Patil, T. and Davenport, D. (2012), Data scientist: the sexiest job of the 21st century,Harvard Business
Review, available at: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
(accessed December 11, 2015).
Press, G. (2015), The hunt for unicorn data scientists lifts salaries for all data analytics professionals,
Forbes, available at: www.forbes.com/sites/gilpress/2015/10/09/the-hunt-for-unicorn-data-
scientists-lifts-salaries-for-all-data-analytics-professionals/ (accessed November 12, 2015).
73
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Provost, F. and Fawcett, T. (2013), Data science and its relationship to Big Data and data-driven
decision making,Big Data, Vol. 1 No. 1, pp. 51-59.
Pryor, G. and Donnelly, M. (2009), Skilling up to do data: whose role, whose responsibility, whose
career?,International Journal of Digital Curation, Vol. 4 No. 2, pp. 158-170.
Randall, L. and Beyer, M.A. (2014), Data Preparation is Not an Afterthought, Gartner, Stamford, CT.
Stevens, D., Totaro, M. and Zhu, Z. (2011), Assessing IT critical skills and revising the MIS
curriculum,The Journal of Computer Information Systems, Vol. 51 No. 3, pp. 85-95.
Stodder, D. (2015), Chasing the data science unicorn,TDWI, available at: https://tdwi.org/articles/
2015/01/06/chasing-the-data-science-unicorn.aspx (accessed November 12, 2015).
Swan, A. and Brown, S. (2008), The skills, role and career structure of data scientists and curators:
an assessment of current practice and future needs, Report to the JISC, Key Perspectives,
Playing Place.
Vassilakaki, E. and Moniarou-Papaconstantinou, V. (2015), A systematic literature review informing library
and information professionalsemerging roles,New Library World, Vol. 116 Nos 1/2, pp. 37-66.
Waller, M.A. and Fawcett, S.E. (2013), Data science, predictive analytics, and Big Data: a revolution
that will transform supply chain design and management,Journal of Business Logistics, Vol. 34
No. 2, pp. 77-84.
Wang, J. and Gu, L. (2016), Challenges of teaching data science in a business school,Issues in
Information Systems, Vol. 17 No. 3, pp. 209-217.
Xia, J. and Wang, M. (2014), Competencies and responsibilities of social science data librarians:
an analysis of job descriptions,College & Research Libraries, Vol. 75 No. 3, pp. 362-388.
Yang, L. and Liu, X. (2013), Teaching business analytics,Frontiers in Education Conference IEEE,
IEEE, pp. 1516-1518.
Appendix. High-level interview questions
(1) Could you please tell us about your organization/agency?
(2) Could you please tell us about your group/team?
How many members?
What are their skills/roles?
How do they work together?
(3) Could you please tell us about your role in your group/team?
(4) What are some of the key challenges facing your group/team?
(5) What do you see as potential future opportunities for your group/team?
(6) What do you look for when you hire data scientists?
(7) What are your thoughts on individual data scientists who excel in all the required skills?
Have you come across any/many such individuals?
Corresponding author
SašaBaškarada can be contacted at: baskarada@gmail.com
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
74
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
This article has been cited by:
1. BaškaradaSaša, Saša Baškarada, KoroniosAndy, Andy Koronios. Strategies for maximizing
organizational absorptive capacity. Industrial and Commercial Training, ahead of print. [Abstract]
[Full Text] [PDF]
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
... (10) BI&A skills: Even before COVID 19 pandemic, there was a growing recognition of the socalled myth of 'data science unicorns' [4] that is unrealistic expectations about the knowledge and skills of data scientists and other related professionals (e.g. data specialists, analytics experts). ...
... As Zhang [68] explains, "the data science unicorn is a somewhat mythical person who is a leader in data science, technology and business" (p. 1). Thus, their skills and knowledge were expected to include very distinct disciplines, such as statistics, mathematics, IT, AI, programming, data management, business and subject matter expertise and contextual understanding of data [4,68]. However, such professionals exist "only rarely, if at all' [14:1]. ...
... As companies hiring data scientists state that it is increasingly difficult to find a socalled "unicorn data scientist" [9], we conducted our analyses using companies' job postings for a data scientist position, job seekers' CVs for that position, and a curriculum from a master's program in data science. However, our investigated methods and our final recommendation system can be applied to other job positions as well. ...
Conference Paper
Full-text available
Usually employers, job seekers and educational institutions use AI in isolation from one another. However, skills are the common ground between these three parties which can be analyzed with the help of AI: (1) Employers want to automatically check which of their required skills are covered by appli-cants' CVs and know which courses their employees can take to acquire missing skills. (2) Job seekers want to know which skills from job postings are missing in their CV, and which study programs they can take to acquire missing skills. (3) In addition, educational institutions want to make sure that skills required in job postings are covered in their curricula and they want to recommend study programs. Consequently, we investigated several natural language processing techniques to extract, vectorize, cluster and compare skills, thereby connecting and supporting employers, job seekers and educational institutions. Our application Skill Scanner uses our best algorithms and outputs statistics and recommendations for all groups. The results of our survey demonstrate that the majority finds that with the help of Skill Scanner, processes related to skills are carried out more effectively, faster, fairer, more explainably, and in a more supported manner. 89% of all participants are not averse to apply our recommendation system for their tasks. 67% of job seekers would certainly use it.
... DS is becoming the new needed literacy for a broader range of professionals [8]. However, existing tools and technologies require a specific set of skills and competences, not always present in IT teams or widespread over other professional stakeholders and domain specialists [4,8,19]. The shortage of data scientists, "the sexiest job of XXI Century" [7] and "the hottest profession" [20] along with the increasing difficulties to deal with diverse (big) data quality urges for integrated IT solutions to support the full complex DS process. ...
Conference Paper
Our design research goal is to improve the user experience and effectiveness of an integrated IT solution for supporting the creative and collaborative Data Science (DS) life-cycle process. The work is being done as a Design Science Research (DSR) project, in real-life context. Within a fast-pace development environment, with scarce access to end-users, we combined hands-on sessions and semi-structured user interviews into a fast-forward design insights technique ([aka insightz]) to capture: i) people interests and expectations about the tool (leading to design improvements) and ii) stakeholders' insights about the DS process itself (leading to process and business innovation). We propose these insightz workshops and the user research approach as a design technique to define and to communicate design principles and guidelines between different stakeholders, namely, UI/UX and engineering teams.
... Following Liu et al. [62], we term these people data workers. Data workers, in contrast with aspirational "unicorn" [6,25] data scientists, are found in all sectors, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ...
Article
Full-text available
Working with data in table form is usually considered a preparatory and tedious step in the sensemaking pipeline; a way of getting the data ready for more sophisticated visualization and analytical tools. But for many people, spreadsheets – the quintessential table tool – remain a critical part of their information ecosystem, allowing them to interact with their data in ways that are hidden or abstracted in more complex tools. This is particularly true for data workers [61], people who work with data as part of their job but do not identify as professional analysts or data scientists. We report on a qualitative study of how these workers interact with and reason about their data. Our findings show that data tables serve a broader purpose beyond data cleanup at the initial stage of a linear analytic flow: users want to see and “get their hands on” the underlying data throughout the analytics process, reshaping and augmenting it to support sensemaking. They reorganize, mark up, layer on levels of detail, and spawn alternatives within the context of the base data. These direct interactions and human-readable table representations form a rich and cognitively important part of building understanding of what the data mean and what they can do with it. We argue that interactive tables are an important visualization idiom in their own right; that the direct data interaction they afford offers a fertile design space for visual analytics; and that sense making can be enriched by more flexible human-data interaction than is currently supported in visual analytics tools.
... Whilst the contribution of statistics to the progression of scientific knowledge across many disciplines continues to be acknowledged in this era of Big Data (McNutt, 2014), many organisations are actively seeking to employ data scientists. In fact, Baškarada and Koronios (2017) note that many organisations often seek "unicorn data scientists", a rare breed, almost mythical creatures that are experts in multiple specialties, from mathematics to computer science and artificial intelligence (AI). There are, however, commentators who remain critical and skeptical of these broad-based portrayals of data scientists as corporate saviours. ...
Article
The importance and relevance of the discipline of statistics with the merits of the evolving field of data science continues to be debated in academia and industry. Following a narrative literature review with over 100 scholarly and practitioner-oriented publications from statistics and data science, this article generates a pragmatic perspective on the relationships and differences between statistics and data science. Some data scientists argue that statistics is not necessary for data science as statistics delivers simple explanations and data science delivers results. Therefore, this article aims to stimulate debate and discourse among both academics and practitioners in these fields. The findings reveal the need for stakeholders to accept the inherent advantages and disadvantages within the science of statistics and data science. The science of statistics enables data science (aiding its reliability and validity), and data science expands the application of statistics to Big Data. Data scientists should accept the contribution and importance of statistics and statisticians must humbly acknowledge the novel capabilities made possible through data science and support this field of study with their theoretical and pragmatic expertise. Indeed, the emergence of data science does pose a threat to statisticians, but the opportunities for synergies are far greater.
... Following Liu et al. [62], we term these people data workers. Data workers, in contrast with aspirational "unicorn" [6,25] data scientists, are found in all sectors, and have diverse levels of data expertise and experience coupled with deep domain knowledge. The work of data workers often expands past boundaries of traditional sensemaking or analysis structures, and encompasses an expansive set of tasks and skills [4,23]. ...
Preprint
Full-text available
Working with data in table form is usually considered a preparatory and tedious step in the sensemaking pipeline; a way of getting the data ready for more sophisticated visualization and analytical tools. But for many people, spreadsheets -- the quintessential table tool -- remain a critical part of their information ecosystem, allowing them to interact with their data in ways that are hidden or abstracted in more complex tools. This is particularly true for data workers: people who work with data as part of their job but do not identify as professional analysts or data scientists. We report on a qualitative study of how these workers interact with and reason about their data. Our findings show that data tables serve a broader purpose beyond data cleanup at the initial stage of a linear analytic flow: users want to see and "get their hands on" the underlying data throughout the analytics process, reshaping and augmenting it to support sensemaking. They reorganize, mark up, layer on levels of detail, and spawn alternatives within the context of the base data. These direct interactions and human-readable table representations form a rich and cognitively important part of building understanding of what the data mean and what they can do with it. We argue that interactive tables are an important visualization idiom in their own right; that the direct data interaction they afford offers a fertile design space for visual analytics; and that sense making can be enriched by more flexible human-data interaction than is currently supported in visual analytics tools.
Article
Collaboration between data scientists and domain experts is necessary for the success of healthcare ML projects. Our present concern is the relationship between data scientists and clinicians, which often faces tensions commonly encountered by multidisciplinary teams. It is important to be able to prevent these differences from creating divisions that can derail a project. In this paper, we focus on understanding the interplay between these roles and where conflict can arise due to communication issues, varying incentives, and differing perspectives.
Chapter
This chapter is based on a nonexhaustive treatment of selected issues. The first subsection identifies the main professional groups involved in data-related activities in academic libraries and beyond. In the second subsection, educational issues are targeted with regard to the education of information professionals for research data management, as well as for data literacy. The third subsection examines similarities and convergences between the pedagogy of educating for literacies. Pedagogical approaches, including social-constructionist, cognitive, and connectivist approaches, as well as the Scholarship of Teaching, are characterized here briefly. Inquiry-based learning and the ideas for breaking out of silos are also portrayed, and the tasks of visualization are described as well.
Thesis
Based on recent developments caused by the big data revolution, data science has massively increased its importance for businesses. Within the marketing context, various types of customer data have become available in enormous amounts and need to be processed as efficiently as possible for creating valuable knowledge. Therefore, data scientists’ performance has become crucial for marketing departments to achieve competitive advantages in the modern highly digitalized economy. Within the raising field of data science, machine learning has become an outstanding trend since these approaches are able to automatically solve numerous classification and prediction problems with enormous performance. Thus, machine learning is seen as a key technology which will radically transform business practice in the future. Even though machine learning has already been applied to various marketing tasks, research is still at an early stage requiring further investigations of how marketing can successfully benefit from machine learning applications. Besides these data-driven opportunities provided by digitalization, technostress has evolved into an enormous downside of digitalized workplaces, leading to a significant decrease in employees’ performance. However, existing research lacks to provide evidence about different coping strategies and their potential to support employees in overcoming technostress. Furthermore, research currently fails to consider technostress regarding both highly digitalized occupational groups like data scientists and respective workplace environments for providing a deeper understanding of how employees suffer from stress caused by the use of digital technologies. Due to these recent challenges for data scientists, this cumulative thesis provides useful insights and new opportunities by focusing on machine learning and technostress issues as two aspects which promise major potentials for enhancing data scientists’ performance in today’s marketing contexts. Five research papers are included for effectively tackling both fields of research: three papers deliver both methodological and empirical findings for extending machine learning in marketing research by examining model architectures as well as applying machine learning to recent marketing problems. In addition, two research papers contribute to research by providing knowledge about technostress issues of data scientists as a heterogeneous and highly digitalized occupational group as well as examining different coping strategies for effectively overcoming stress due to the use of digital technologies. Beyond that, the findings deliver practical implications for marketing managers who aim to improve the performance of data scientists in a contemporary marketing environment.
Chapter
Full-text available
Data science, a new discovery paradigm, is potentially one of the most significant advances of the early twenty-first century. Originating in scientific discovery, it is being applied to every human endeavor for which there is adequate data. While remarkable successes have been achieved, even greater claims have been made. Benefits, challenge, and risks abound. The science underlying data science has yet to emerge. Maturity is more than a decade away. This claim is based firstly on observing the centuries-long developments of its predecessor paradigms—empirical, theoretical, and Jim Gray’s Fourth Paradigm of Scientific Discovery (Hey et al., The fourth paradigm: data-intensive scientific discovery Edited by Microsoft Research, 2009) (aka eScience, data-intensive, computational, procedural)—and secondly on my studies of over 150 data science use cases, several data science-based startups, and, on my scientific advisory role for Insight (https://www.insight-centre.org/), a Data Science Research Institute (DSRI) that requires that I understand the opportunities, state of the art, and research challenges for the emerging discipline of data science. This chapter addresses essential questions for a DSRI: What is data science? What is world-class data science research? A companion chapter (Brodie, On Developing Data Science, in Braschler et al. (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019) addresses the development of data science applications and of the data science discipline itself.
Article
Full-text available
Purpose The purpose of this paper is to discuss strategies for maximizing organizational absorptive capacity. Design/methodology/approach The views presented here have been derived from authors’ extensive research and professional experience. Support for the claims made is provided through anecdotal evidence and related literature. Findings The viewpoint discusses how organizational absorptive capacity may be maximized through actions and interactions of a wide range of individual, managerial, organizational, and inter-organizational factors. Originality/value The viewpoint may assist practitioners with developing strategies for improving vicarious learning. From a theoretical perspective, the claims made in the paper present fertile ground for future empirical testing.
Article
Full-text available
U.S. business executives and educators need to be continuously aware of the knowledge and skills required for IS/IT professionals to meet current and future technological trends. This paper attempts to investigate the dynamics of the importance of IS/IT skills from the perspective of 70 IS/IT managers using latent growth curve modeling. The overall results suggest that 1) the importance of most IS/IT skills is continually increasing over time, 2) that wireless communications and applications, mobile commerce applications and protocols, IS security, Web applications, services, and protocols, and data management are the top five rapidly growing skills; 3) that IS security, data management, project management and other business skills, Web applications, services, and protocols, and wireless communications and applications are expected to be the most important five skills in the future. Based on these results, implications and recommendations for IS/IT educators, researchers, and practitioners are provided.
Article
Full-text available
Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.
Article
Full-text available
Reviews opportunities and challenges for libraries and librarians in the research data arena, with reference to published reports and case studies of emerging practice, supplemented by evidence from university and library websites. Looks at connections between research data management (RDM) and established library roles and responsibilities to explore whether RDM represents an incremental step in professional practice or a paradigm shift in collection development and service delivery requiring fundamental rethinking of roles, responsibilities, and competencies to create “next-generation librarianship,” drawing on experiences and opinions of practitioners in the field. Also discusses professional education and continuing development needs for library engagement with research data, referring particularly to initiatives in the USA.
Article
Full-text available
Although widely used, the qualitative case study method is not well understood. Due to conflicting epistemological presuppositions and the complexity inherent in qualitative case-based studies, scientific rigor can be difficult to demonstrate, and any resulting findings can be difficult to justify. For that reason, this paper discusses methodological problems associated with qualitative case-based research and offers guidelines for overcoming them. Due to its nearly universal acceptance, Yin's six-stage case study process is adopted and elaborated on. Moreover, additional principles from the wider methodological literature are integrated and explained. Finally, some modifications to the dependencies between the six case study stages are suggested. It is expected that following the guidelines presented in this paper may facilitate the collection of the most relevant data in the most efficient and effective manner, simplify the subsequent analysis, as well as enhance the validity of the resulting findings. The paper should be of interest to students (honour, masters, doctoral), academics, and practitioners involved with conducting and reviewing qualitative case-based studies.
Article
The aim of this research was to define competences for digital curators, and to validate a Delphi process in the context of Library, Archives, Museum curriculum development. The objective for the study was to obtain consensus regarding competence statements for Library, Archives and Museum digital curators. The Delphi method, a research technique, typically used to develop a consensus of opinion for topic areas in which there is little previously documented knowledge, was used in specifying the digital curator competences in LAM context. Three rounds of questionnaires with controlled feedback with space for comments and/or suggestions were sent to panel members. Five point Likert scale was employed in the questionnaire. Consensus was determined when a competence statement received a mode higher than 3, an average mean more than 3.5, and a standard deviation smaller than 1.0. Response rates for rounds I, II and III were: 70% (n = 16), 87.5% (n = 14), and 94% (n = 15) respectively. Of the 18 digital curator competences listed in the first round questionnaire, 13 (70%) achieved consensus as being necessary digital curator competences required of advanced level digital curator. Other inputs of respondents like comments and suggestions were also analyzed. An additional 23 digital curator competence statements were also suggested by the panel in round I and further developed in subsequent rounds. In round II, 12 (30%) competence statements achieved consensus. The final round and editing of competence statements led to 20 statements that describe what a well-prepared digital curator trained to participate in digital curation work should be able to do.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.