Content uploaded by Sasa Baskarada
Author content
All content in this area was uploaded by Sasa Baskarada on Jan 06, 2018
Content may be subject to copyright.
Unicorn data scientist:
the rarest of breeds
SašaBaškarada and Andy Koronios
University of South Australia, Mawson Lakes, Australia
Abstract
Purpose –Many organizations are seeking unicorn data scientists, that rarest of breeds that can do it all.
They are said to be experts in many traditionally distinct disciplines, including mathematics, statistics,
computer science, artificial intelligence, and more. The purpose of this paper is to describe authors’pursuit of
these elusive mythical creatures.
Design/methodology/approach –Qualitative data were collected through semi-structured interviews with
managers/directors from nine Australian state and federal government agencies with relatively mature data
science functions.
Findings –Although the authors failed to find evidence of unicorn data scientists, they are pleased to report
on six key roles that are considered to be required for an effective data science team. Primary and secondary
skills for each of the roles are identified and the resulting framework is then used to illustratively evaluate
three data science Master-level degrees offered by Australian universities.
Research limitations/implications –Given that the findings presented in this paper have been based on a
study with large government agencies with relatively mature data science functions, they may not be directly
transferable to less mature, smaller, and less well-resourced agencies and firms.
Originality/value –The skills framework provides a theoretical contribution that may be applied in practice
to evaluate and improve the composition of data science teams and related training programs.
Keywords Data analytics, Skills, Definition, Framework, Data science, Business analytics
Paper type Research paper
1. Introduction
Data science is an emerging applied discipline focused on facilitating organizational decision
making through the development of statistical models that extract knowledge from raw data
(Patil and Davenport, 2012). Extracted knowledge may describe what happened, explain why
something happened, and predict what may or is likely to happen. In spite of the current
popularity of data science, many organizations lack clear understanding of the required roles
(e.g. data scientist, data analyst, data engineer, business expert, system expert, and software
engineer) and skills (e.g. domain, information technology, and quantitative) (Linden et al., 2015;
Harris et al., 2013). For instance, it is frequently stated that a data scientist is someone who is
better at programming than a statistician and better at statistics than a computer scientist.
Noting that data science requires domain knowledge and a broad set of quantitative skills,
Waller and Fawcett (2013) highlight that “there is a dearth of literature on the topic and many
questions”(p. 77). Accordingly, they call for more research on skills that are needed by data
scientists. Given the breadth of skills required, they conclude that it may not be realistic to
expect any one person to possess all the relevant expertise. Nevertheless, short of including
virgins[1] in their employee benefits packages, companies are making every effort to attract
data scientists who can to it all (Press, 2015). Given their almost mythical status, a growing
number of data professionals are starting to refer to such rare individual, who are said to excel
in a wide range of traditionally distinct disciplines, as unicorns (Stodder, 2015; Bertolucci, 2013).
Yet, it is surprising to note that scholarly literature has so far failed to investigate the nature of
this potentially new species. That is the purpose of this paper.
2. Literature review
There is a substantial overlap between data science, data analytics, and Big Data
organizational capabilities (Laney et al., 2015). Nevertheless, data science is generally viewed
Program
Vol. 51 No. 1, 2017
pp. 65-74
© Emerald Publishing Limited
0033-0337
DOI 10.1108/PROG-07-2016-0053
Received 11 July 2016
Revised 8 December 2016
Accepted 15 December 2016
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0033-0337.htm
65
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
as “a set of fundamental principles that support and guide the principled extraction of
information and knowledge from data”(Provost and Fawcett, 2013). Proposed core skills
include mathematics and statistics (e.g. data mining, hypothesis testing, and predictive
analytics), computer science (e.g. data structures and algorithms), and domain expertise
(Dhar, 2013; Finzer, 2013). Other required skills include data integration, transformation,
and loading, as well as data visualization (Yang and Liu, 2013).
By tracing the evolution of business intelligence and analytics from structured
content in relational database management systems and data warehouses (currently widely
adopted in industry), through unstructured Web 2.0-based content (currently
widely researched in industry and academia), to mobile and sensor-based
content (an emerging area of research), Chen et al. (2012) identify a number of relevant
skills, including analytical and IT skills (drawing from statistics and computer science),
business and domain knowledge (e.g. accounting, finance, management, marketing,
logistics, and operation management), and communication skills required to interact with
relevant decision makers. They conclude that due to its emphasis on “key data management
and information technologies, business-oriented statistical analysis and management
science techniques, and broad business discipline exposure”(p. 1182), the discipline of
information systems is uniquely placed to provide a valuable contribution (Lee and
Mirchandani, 2010; Stevens et al., 2011).
A study commissioned by the Joint Information Systems Committee, a UK
non-departmental public body focused on championing the importance and potential of
digital technologies in UK education and research, found that data scientists need a
wide range of skills, including domain expertise, computing, and people skills (Swan and
Brown, 2008). The study noted that although there is some variation in the skills that are
possessed by data scientists, they are all expected to have at a substantial competency in the
domain in which they operate.
Based on a survey with 250 respondents, Harris et al. (2013) identified five skill groups
that are applicable to data scientists, including business, machine learning and Big Data,
math and operations research, programming, and statistics. Depending on their level of
competence in each of these skills sets, data scientists may then either tackle the entire
analytics process, or predominantly focus on technical problems of managing data, research
methods and statistics, or deriving business value through analytics.
Although there is a widespread belief that hard scientists ( particularly physicists) tend
to produce best data scientists (Loukides, 2011), other professionals that may assume data
science roles include data mining experts, operations researchers, statisticians, actuaries,
econometricians, equity analysts, process control engineers, and the like (Linden et al., 2015).
Even though library engagement in research data management is a relatively recent
phenomenon (Corrall et al., 2013), and the relevant roles and responsibilities are yet to be
settled (Corrall, 2012; Cox and Pinfield, 2014; Madrid, 2013; Xia and Wang, 2014;
Cox and Corrall, 2013; Cassella and Morando, 2012), librarians and information science
professionals may contribute vital data curation, preservation, and archiving skills to
ensure safe custody of research outputs (Swan and Brown, 2008; Pryor and Donnelly, 2009).
Based on a systematic review of 600 peer-reviewed library and information science papers
published between 2000 and 2014 in English, Vassilakaki and Moniarou-Papaconstantinou
(2015) identify six roles that information professionals have adopted, two of which are of
particular relevance to data science. Technology specialists may facilitate the development,
management, and promotion of institutional repositories for research output, while
knowledge mangers may also contribute to the management of such repositories as well as
facilitate relevant communication and knowledge flows throughout the organization.
Corrall (2010) notes the emergence of composite, hybrid and blended library and information
science professionals as evidenced by overlapping roles and broad skillsets. She classifies
66
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
data science as a hybrid specialty comprising aspects of information technology and media
(conduit), library and information science (content), and academic and professional
discipline (context) expertise. Librarians and information science professionals may also
provide support for public engagement with science, as well as facilitate public access to
research data sets (Lyon, 2012).
Undergraduate data science degrees have focused on data visualization, data
manipulation/wrangling, computational statistics, machine learning, as well as related
topics like spatial analysis, text mining, network science, and Big Data (Baumer, 2015;
Hardin et al., 2015). Others have also emphasized oral and written communication as well as
social, ethical, and legal issues (Anderson et al., 2014). Given the multidisciplinary nature ofthe
topic, it has been observed that the teaching of individual data science courses as part of more
general (e.g. business) degrees presents a particular challenge as many students may lack
relevant background knowledge (Wang and Gu, 2016).
3. Method
Qualitative data were collected through semi-structured interviews (Dicicco‐Bloom and
Crabtree, 2006) with nine managers/directors from nine Australian state and federal
government agencies with relatively mature data science functions. Their official titles
included such descriptors as research, innovation, analytics, and policy (e.g. manager policy
and research, and director enterprise analytics). While some were responsible for relatively
small teams comprising approximately five people, others were responsible for several
dozen specialist staff. As this study adopts a qualitative rather than a quantitative
approach, there was no requirement to select a statistically representative sample.
Instead, the interviewees were identified through personal contacts and selected based on
their professional roles and willingness/availability to participate in the study. Being
semi-structured, the interviews were guided by a number of high-level questions
(see Appendix) pertaining to the nature of the relevant work being conducted in each
agency, key roles and skills, team composition, and broad challenges and opportunities.
Ad hoc probing and follow-up questions were employed to seek clarification, elicit
additional information, and explore emerging themes (Baškarada, 2014). Data analysis,
which occurred concurrently with data collection, employed the constant comparative
method to identify and categorize key constructs (Glaser, 1965).
The resulting framework was then used to illustratively evaluate three data science
Master-level degrees offered by three Australian universities. As the objective was not to
produce any universal generalizations, but instead to simply illustrate the applicability of
the framework developed, the universities were selected in a haphazard manner. Several
universities/degrees were excluded from the analysis because they did not provide
sufficiently detailed course descriptions online.
4. Results and discussion
The authors failed to find evidence of a unicorn data scientist. In other words,
all interviewees agreed that it is unrealistic to expect one person to have the same level of
expertise in a number of distinct disciplines as more specialized experts can. Instead they all
sought to build effective multidisciplinary teams. Six key roles that are considered to be
required for an effective data science team are outlined below.
4.1 Roles
Six key roles that are considered to be required for an effective data science team
include domain expert, data engineer, statistician, computer scientist, communicator,
and team leader.
67
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
4.1.1 Domain expert. This study confirmed the absolute importance of domain expertise,
one of the most frequently quoted data science skills in the literature (Linden et al., 2015;
Waller and Fawcett, 2013; Dhar, 2013; Finzer, 2013; Chen et al., 2012; Swan and Brown, 2008;
Laney et al., 2015). One of the participants observed: “You have to have a very good
understanding of processes and government policy. It’s absolutely critical.”Without domain
expertise, data scientists (or data science teams) lack context needed to interpret raw data
into meaningful information (Baškarada and Koronios, 2013). Furthermore, the ability to ask
relevant questions, generate relevant hypotheses, and ultimately interpret results is
underpinned by deep domain expertise. Given the large variety and complexity of many
organizations, continuous access to other business experts outside of the data science team
is also important (Linden et al., 2015). For instance, a number of interviewees explained how
they “validate”analytical insights through workshops with subject matter experts.
Accordingly, domain experts work very closely with all other team members with the
possible exception of computer scientists.
4.1.2 Data engineer. Literature identifies both data preparation and data quality as
critical inputs to effective analytics (Herschel et al., 2015; Randall and Beyer, 2014;
Baškarada and Koronios, 2014). Most of the interviewees referred to the “garbage in,
garbage out”principle, emphasizing the importance of high-quality data. It has been
observed that depending on the volume, velocity, variety, and veracity of data, data
preparation, which includes extraction, cleaning, enrichment, and transformation,
can consume up to 80 percent of effort (Linden et al., 2015). This was confirmed by many
interviewees, with one of them noting: “Most of our effort goes on data wrangling and
cleaning. All these data in data lakes are of no use unless they are first properly prepared.”
In contrast to business intelligence systems, which operate on semantically consistent data
warehouses (which transform all data into a common format), data science teams may
operate on semantically inconsistent data lakes (which keep all data in their original format)
(Heudecker and White, 2014). Accordingly, in contrast to business intelligence systems,
which require relatively infrequent data preparation (only when the data warehouse is built
or modified), data science efforts require ongoing data preparation. As a result, having a
data engineer as a permanent member of a team is much more important in the context of
data science than in the context of business intelligence.
4.1.3 Statistician. Statisticians are at the core of data science teams. They form a bridge
between domain experts, data engineers, and computer scientists. For instance, they may
refine and formalize questions and ideas from domain experts, request relevant data from data
engineers, and guide computer scientists in relation to data analysis. One of the interviewees
observed: “Data science is statistics plus. Statistics is at the core of everything we do.”Given
that data science efforts are increasingly undertaken in the context of Big Data, statisticians
require special expertise for dealing with large data sets. For instance, they need to be able to
identify and maximize opportunities for automation. According to one participant: “It’snot
just your traditional stats. With Big Data the focus is shifting to data mining and machine
learning.”In addition to traditional skills like experimental design and hypothesis testing,
these statisticians also require a solid understanding of skills that are at the intersection of
statistics and computer science. As such, their approach needs to be much more applied than
the approach traditionally followed by academic statisticians. For instance, academic
statisticians are traditionally conservative in terms of being very cautious about making
inferences to unobserved events and entities. This conservativism may still have its place in
some applications of data science (e.g. health and public policy), but may need to be relaxed in
other problem domains with less inherent risk. As one participant observed: “We don’tneed
academic rigor. It’s much more important to produce something quickly. We don’tneeda
100% solution; 80% is usually good enough.”
68
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
4.1.4 Computer scientist. Exponential growth in the volume, velocity and variety of
data has led to the development of new Big Data software tools and technologies
(e.g. Apache Hadoop, Map reduce, and Spark). Computer scientists require proficiency in
such tools and technologies, relevant programming languages like R and Python, as well as
cluster and cloud computing in order to implement and optimize (e.g. in the context of
real-time analytics) processing (e.g. sorting, aggregating, searching, matching,
and concatenating) and analysis of large data sets. One of the interviewees noted:
“These are very complex tools, and they are rapidly evolving, too. We need someone to keep
on top of all the latest developments, like the Apache stack. That’s a full-time job.”Given
that much (perhaps most) data are unstructured; computer scientists also require skills in
text analytics and natural language processing. In general, study participants placed strong
emphasis on agile processes and open source software.
4.1.5 Communicator. From a practical perspective, data science is largely pointless
unless it can affect organizational change. Accordingly, the ability to effectively communicate
with relevant decision makers becomes critical. This includes exploration of relevant
problems and opportunities as well as communication of eventual results. As relevant decision
makers frequently do not have advanced statistical skills, any findings need to be presented
in a form that it visually appealing, easy to understand, and ultimately convincing. At the
same time, complexity, simplifying assumptions, and contextual dependencies also need to be
appreciated and effectively communicated. It was frequently observed that “decision-makers
do not want data, they want answers.”As such, storytelling becomes a critical skill.
An interviewee observed: “We need to challenge their (decision-makers’) assumptions, change
their mental models. These are time-poor people, so we need to be able to capture their
attention quickly.”This requires a different approach to the one followed by academic
statisticians who have traditionally communicated with other statisticians. Communicators
form a bridge between data science teams and relevant decision makers. Internally, within the
data science team, communicators form a bridge between the team leader, the statistician, and
the domain expert.
4.1.6 Team leader. This role is most like the mythical unicorn in the sense that the team
leader requires some understanding of all the other roles in order to bring everyone together,
manage resources, tasks, and deliverables. One interviewee observed: “I have been around
for a while. I have been in similar roles for more than 30 years.”In addition to requiring
extensive project management expertise, the team leader is responsible for ensuring that
any ethical, privacy, and security norms and expectations are adhered to. Working closely
with the communicator, the team leader is responsible for developing relevant business
cases and estimating expected return on investment.
4.2 Primary and secondary skills
Although each role is associated with, and based on, primary expertise, no role can operate
in isolation. In other words, in order to enable interaction within a data science team,
each role requires one or more secondary skills. Table I details primary and secondary skills
for each of the roles identified in this paper. As such, it identifies the degree of interaction
between the roles. It also highlights that interactions between the roles are asymmetric,
and identifies the degree of asymmetry.
For instance, although the data engineer requires some domain expertise in order to be able
to seek relevant information and guidance from the domain expert, domain experts may be
able to provide such information and guidance even if they have no data preparation skills.
As there is no need for any direct interaction between the domain expert and the
computer scientist, domain experts generally do not require any computer science skills and
vice versa. Domain experts do, however, require some statistical skills in order to facilitate
69
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
identification/generation of relevant questions and hypotheses, as well as interpretation of
results. They do not necessarily require any specialist communication expertise. Besides
requiring some domain expertise in order to be able to extract, clean, enrich, and transform
relevant data, data engineers do not necessarily require any other secondary skills. Statisticians
requiresomedomainexpertiseinordertobeabletorefineandformalizequestionsandideas
from domain experts, and some data preparation expertise in order to be able to provide
informed guidance to data engineers. They do not necessarily require specialized computer
science, communication, or management skills. Computer scientists do not necessarily require
any domain expertise, communication, or management skills. They do, however, require some
data preparation skills, and reasonably advanced statistical skills. Communicators require
significant domain expertise in order to effectively communicate with relevant decision makers.
They also require some statistical skills in order to be able to present analytical findings in a
form that is visually appealing, easy to understand, and ultimately convincing. As noted above,
team leaders are most like the mythical unicorns in the sense that they require some
understanding of all the other roles in order to bring everyone together, manage resources,
tasks, and deliverables.
Table I indicates that the team leader role requires the greatest breadth of skills, and that
domain expertise and statistics are at the core of data science. They are closely followed by
data preparation skills. In contrast to the core skills, computer science, communication,
and management skills are somewhat less central, although not necessarily less important.
4.3 Culture
A scientific (as opposed engineering) approach to data science implies a certain culture.
For instance, as outcomes are by definition not known at the start (as opposed to
engineering where one starts with a predefined outcome), failure needs to be expected and
accepted. This requires supportive leadership and cultural environment, with sufficient time
and resources to test new approaches and ideas, as well as a mechanism for implementing
good ideas. The primary focus of data science teams should be on developing proof of
concept prototypes. Accordingly, such teams should not be expected to deliver mature,
production-level products. Instead, separate software engineering teams should be engaged
for that purpose. A scientific approach also implies that organizational data science efforts
represent an iterative journey rather than a destination.
5. Applying the framework
Next, we use the above framework to illustratively evaluate three data science Master-level
degrees offered by three Australian universities. Two of those (case A and case C)
have a duration of two years full-time, while the third one (case B) has a duration of
one year full-time. Table II details relevant academic and professional admission
requirements for each case. Table III details courses comprising each degree and skills
developed in each course. The mapping between courses and applicable skills was based on
course descriptions provided on the universities’websites.
Table I.
Primary and
secondary skills
70
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Cases A and C have a reasonable coverage across the skill areas identified in our framework.
Case B, on the other hand, addresses domain expertise and management skill requirements
only in the final capstone project. The lack of coverage in case B may partly be attributed to
its shorter duration of one year full-time, in contract to two years full-time for cases A and C.
Nevertheless, most courses, with the exception of electives are generally very broad, thus,
offering limited opportunity for specialization. This is particularly acute in case B, where the
two elective courses do not even have to be related to data science. It may be argued that this
limitation is somewhat offset by the academic admission requirements, which serve to select
students with quantitative expertise. However, given the lack of any communication and
management prerequisites for admission, it may be argued that cases B and C do not
provide sufficient opportunities for deep specialization in these skill areas. In contrast,
Case Academic Professional
A A Bachelor’s degree in mathematics, computer science, physics,
engineering, accounting, finance, or economics
At least three years of
professional experience
B An Honors degree, a graduate certificate, or a graduate diploma in
mathematics, computer science, statistics, physics, engineering,
economics, or finance None
C A Bachelor’s degree in mathematics or information technology,
or a graduate certificate/diploma in data science None
Table II.
Admission
requirements
Case Course DE DP S CS C M
A Introduction to Data Science ||||||
Statistics for Data Science ||||
Data and Algorithms |||||
Project Management ||
Visualization and Communication |
Evidence-Based Decision Making ||||
Project 1 ||||
Project 2 ||||||
Specialized Elective ×4||||||
B Introduction to Data Science ||||
Data Mining |||
Elective ×2||||
Information Visualization ||
Computational Statistics ||
Capstone Project ||||||
C Big Data || |||
Programming for Data Science ||
Elective ×2|||||
Predictive Analytics ||||
Machine Learning ||
Project 1 ||||||
Social Media Analytics ||
Customer Analytics |||
Project 2 ||||
Advanced Analytics 1 ||
Advanced Analytics 2 |||
Capstone Project ||||||
Notes: DE, domain expertise; DP, data preparation; S, statistics; CS, computer science; C, communication;
M, management
Table III.
Courses
71
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
case A offers two courses specifically focused on project management, and visualization and
communication. Furthermore, four specialized case A elective courses provide an
opportunity for deeper specialization in several relevant skill areas, including research
methods, advanced statistics, databases, and software development, ethics, law, policy, and
so on. In addition, a variety of case studies are used to illustrate the relevance of domain
expertise. The admission requirement of at least three years of professional experience is also
useful for ensuring some familiarity with teamwork and management concepts. In case C, two
elective courses are used to provide an opportunity to students with an information
technology background to develop skills in statistics and probabilities, and to students with a
mathematics background to develop skills in relational databases and warehouses, as well as
business intelligence and analytics.
The above analysis indicates that these universities aim to produce quasi-unicorns. Given
the limited opportunity for specialization in the skill areas identified in our framework, prior
academic and professional experience becomes critical. Without deep expertise in any of the
roles identified in our framework, such graduates may not be able to effectively contribute to
multidisciplinary data science teams. Nevertheless, they may prove valuable to smaller
agencies and firms with limited resources who may have to rely on such quasi-unicorns.
6. Conclusion
While many universities are now offering degrees in data science, and many organizations are
seeking to hire individual data scientists, the findings presented in this paper suggest that it
may be more beneficial to view data science from a multidisciplinary team perspective.
The paper identified six key roles considered essential for an effective data science team, and
shared skills required for effective within-team interaction. The skills framework provides a
theoretical contribution that may be applied in practice to evaluate and improve the
composition of multidisciplinary data science teams and related training programs. However,
given that our findings have been based on a study with large government agencies with
relatively mature data science functions, they may not be directly transferable to less mature,
smaller, and less well-resourced agencies and firms, who may instead have to rely on individual
“unicorn”data scientists. Given that the illustrative case studies highlighted a potential gap in
opportunities for academic specialization in relation to the roles identified in our framework,
future studies may wish to explore how higher education institutions may effectively partner
with private and public organizations in order to address this potential problem.
Note
1. Those unfamiliar with the reference may wish to note that according to medieval lore unicorns are
only tamable by virgins who, as a result, may be used by hunters as unicorn bait.
References
Anderson, P., Bowring, J., Mccauley, R., Pothering, G. and Starr, C. (2014), “An undergraduate degree in
data science: curriculum and a decade of implementation experience”,Proceedings of the
45th ACM Technical Symposium on Computer Science Education, ACM, pp. 145-150.
Baškarada, S. (2014), “Qualitative case study guidelines”,The Qualitative Report, Vol. 19 No. 40, pp. 1-25.
Baškarada, S. and Koronios, A. (2013), “Data, information, knowledge, wisdom (DIKW): a semiotic
theoretical and empirical exploration of the hierarchy and its quality dimension”,Australasian
Journal of Information Systems, Vol. 18 No. 1, pp. 5-24.
Baškarada, S. and Koronios, A. (2014), “A critical success factor framework for information quality
management”,Information Systems Management, Vol. 31 No. 4, pp. 276-295.
Baumer, B. (2015), “A data science course for undergraduates: thinking with data”,The American
Statistician, Vol. 69 No. 4, pp. 334-342.
72
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Bertolucci, J. (2013), “Are you recruiting a data scientist, or unicorn?”,InformationWeek, available at:
www.informationweek.com/big-data/big-data-analytics/are-you-recruiting-a-data-scientist-or-
unicorn/d/d-id/899843 (accessed November 12, 2015).
Cassella, M. and Morando, M. (2012), “Fostering new roles for librarians: skills set for repository
managers –results of a survey in Italy”,Liber Quarterly, Vol. 21 Nos 3/4, pp. 407-428.
Chen, H., Chiang, R.H. and Storey, V.C. (2012), “Business intelligence and analytics: from Big Data to
big impact”,MIS Quarterly, Vol. 36 No. 4, pp. 1165-1188.
Corrall, S. (2010), “Educating the academic librarian as a blended professional: a review and case
study”,Library Management, Vol. 31 Nos 8/9, pp. 567-593.
Corrall, S. (2012), “Roles and responsibilities: libraries, librarians and data”, in Pryor, G. (Ed.), Managing
Research Data, Facet, London, pp. 141-151.
Corrall, S., Kennan, M.A. and Afzal, W. (2013), “Bibliometrics and research data management services:
emerging trends in library support for research”,Library Trends, Vol. 61 No. 3, pp. 636-674.
Cox, A.M. and Corrall, S. (2013), “Evolving academic library specialties”,Journal of the American
Society for Information Science and Technology, Vol. 64 No. 8, pp. 1526-1542.
Cox, A.M. and Pinfield, S. (2014), “Research data management and libraries: current activities and
future priorities”,Journal of Librarianship and Information Science, Vol. 46 No. 4, pp. 299-316.
Dhar, V. (2013), “Data science and prediction”,Communications of the ACM, Vol. 56 No. 12, pp. 64-73.
Dicicco‐Bloom, B. and Crabtree, B.F. (2006), “The qualitative research interview”,Medical Education,
Vol. 40 No. 4, pp. 314-321.
Finzer, W. (2013), “The data science education dilemma”,Technology Innovations in Statistics
Education, Vol. 7 No. 2, pp. 1-9.
Glaser, B.G. (1965), “The constant comparative method of qualitative analysis”,Social Problems, Vol. 12
No. 4, pp. 436-445.
Hardin, J., Hoerl, R., Horton, N.J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R., Roback, P.
and Temple Lang, D. (2015), “Data science in statistics curricula: preparing students to
‘think with data’”,The American Statistician, Vol. 69 No. 4, pp. 343-353.
Harris, H., Murphy, S. and Vaisman, M. (2013), Analyzing the Analyzers: An Introspective Survey of Data
Scientists and Their Work,O’Reilly Media.
Herschel, G., Linden, A. and Duncan, A.D. (2015), Seven Best Practices for Your Big Data Analytics
Projects, Gartner, Stamford, CT.
Heudecker, N. and White, A. (2014), The Data Lake Fallacy: All Water and Little Substance, Gartner,
Stamford, CT.
Laney, D., Kart, L., Jain, A. and Linden, A. (2015), How Data Scientist Skills and Qualifications Differ
from Those of BI Analysts and Statisticians, Gartner, Stamford, CT.
Lee, K. and Mirchandani, D. (2010), “Dynamics of the importance of IS/IT skills”,Journal of Computer
Information Systems, Vol. 50 No. 4, pp. 67-78.
Linden, A., Kart, L., Randall, L., Beyer, M.A. and Duncan, A.D. (2015), Staffing Data Science Teams,Gartner,
Stamford, CT.
Loukides, M. (2011), What is Data Science? O’Reilly Media, Inc.
Lyon, L. (2012), “The informatics transform: re-engineering libraries for the data decade”,International
Journal of Digital Curation, Vol. 7 No. 1, pp. 126-138.
Madrid, M.M. (2013), “A study of digital curator competences: a survey of experts”,The International
Information & Library Review, Vol. 45 Nos 3/4, pp. 149-156.
Patil, T. and Davenport, D. (2012), “Data scientist: the sexiest job of the 21st century”,Harvard Business
Review, available at: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
(accessed December 11, 2015).
Press, G. (2015), “The hunt for unicorn data scientists lifts salaries for all data analytics professionals”,
Forbes, available at: www.forbes.com/sites/gilpress/2015/10/09/the-hunt-for-unicorn-data-
scientists-lifts-salaries-for-all-data-analytics-professionals/ (accessed November 12, 2015).
73
Unicorn data
scientist
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
Provost, F. and Fawcett, T. (2013), “Data science and its relationship to Big Data and data-driven
decision making”,Big Data, Vol. 1 No. 1, pp. 51-59.
Pryor, G. and Donnelly, M. (2009), “Skilling up to do data: whose role, whose responsibility, whose
career?”,International Journal of Digital Curation, Vol. 4 No. 2, pp. 158-170.
Randall, L. and Beyer, M.A. (2014), Data Preparation is Not an Afterthought, Gartner, Stamford, CT.
Stevens, D., Totaro, M. and Zhu, Z. (2011), “Assessing IT critical skills and revising the MIS
curriculum”,The Journal of Computer Information Systems, Vol. 51 No. 3, pp. 85-95.
Stodder, D. (2015), “Chasing the data science unicorn”,TDWI, available at: https://tdwi.org/articles/
2015/01/06/chasing-the-data-science-unicorn.aspx (accessed November 12, 2015).
Swan, A. and Brown, S. (2008), “The skills, role and career structure of data scientists and curators:
an assessment of current practice and future needs”, Report to the JISC, Key Perspectives,
Playing Place.
Vassilakaki, E. and Moniarou-Papaconstantinou, V. (2015), “A systematic literature review informing library
and information professionals’emerging roles”,New Library World, Vol. 116 Nos 1/2, pp. 37-66.
Waller, M.A. and Fawcett, S.E. (2013), “Data science, predictive analytics, and Big Data: a revolution
that will transform supply chain design and management”,Journal of Business Logistics, Vol. 34
No. 2, pp. 77-84.
Wang, J. and Gu, L. (2016), “Challenges of teaching data science in a business school”,Issues in
Information Systems, Vol. 17 No. 3, pp. 209-217.
Xia, J. and Wang, M. (2014), “Competencies and responsibilities of social science data librarians:
an analysis of job descriptions”,College & Research Libraries, Vol. 75 No. 3, pp. 362-388.
Yang, L. and Liu, X. (2013), “Teaching business analytics”,Frontiers in Education Conference IEEE,
IEEE, pp. 1516-1518.
Appendix. High-level interview questions
(1) Could you please tell us about your organization/agency?
(2) Could you please tell us about your group/team?
•How many members?
•What are their skills/roles?
•How do they work together?
(3) Could you please tell us about your role in your group/team?
(4) What are some of the key challenges facing your group/team?
(5) What do you see as potential future opportunities for your group/team?
(6) What do you look for when you hire data scientists?
(7) What are your thoughts on individual data scientists who excel in all the required skills?
•Have you come across any/many such individuals?
Corresponding author
SašaBaškarada can be contacted at: baskarada@gmail.com
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
74
PROG
51,1
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)
This article has been cited by:
1. BaškaradaSaša, Saša Baškarada, KoroniosAndy, Andy Koronios. Strategies for maximizing
organizational absorptive capacity. Industrial and Commercial Training, ahead of print. [Abstract]
[Full Text] [PDF]
Downloaded by University of South Australia At 02:07 06 January 2018 (PT)