Kai R. LarsenUniversity of Colorado Boulder | CUB · Division of Management & Entrepreneurship
Kai R. Larsen
Ph.D. Information Science and Behavioral Analytics
About
117
Publications
214,285
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,762
Citations
Introduction
Kai R. Larsen is a Professor of Information Systems at Leeds School of Business, University of Colorado Boulder. He is a courtesy faculty member in the department of Information Science and a Fellow of the Institute of Behavioral Science, and a Research Advisor to the Gallup organization. Kai is most known for providing a practical solution to Edward Thorndike's (1904) Jingle Fallacy and for his contributions to the Semantic Theory of Survey Response (STSR).
Additional affiliations
October 2015 - present
June 2000 - present
Publications
Publications (117)
The scholarly information-seeking process for behavioral research consists of three phases: search, access, and processing of past research. Existing IT artifacts, such as Google Scholar, have in part addressed the search and access phases, but fall short of facilitating the processing phase, creating a knowledge inaccessibility problem. We propose...
The problem of detecting whether two behavioral constructs reference the same real world phenomenon has existed for over 100 years; we term discordant naming of constructs the Construct Identity Fallacy (CIF). We designed and evaluated the Construct Identity Detector (CID), the first tool with large-scale construct identity detection properties and...
The goal of a review article is to present the current state of knowledge in a research area. Two important initial steps in writing a review article are boundary identification (identifying a body of potentially relevant past research) and corpus construction (selecting research manuscripts to include in the review). Using the Technology Acceptanc...
Provides an accessible introduction to machine learning for business. The examples are built around the DataRobot Automated Machine Learning platform, but focus is on the principles of machine learning.
We present a framework for ontology-based knowledge synthesis from research articles to support researchers in conducting literature reviews and gaining comprehensive insights into the state of the Information Systems discipline. Building on calls for an academic knowledge infrastructure, we performed a design science research project to (1) develo...
The psychometric approach in IS offers a foundational framework for a broad spectrum of research endeavors, typically relying on construct validation to confirm that a series of indicators accurately measures the intended construct. However, a longstanding issue with construct validity, unaddressed since its introduction by Cronbach and Meehl in 19...
In the Information Systems (IS) discipline, central contributions of
research projects are often represented in graphical research models, clearly
illustrating constructs and their relationships. Although thousands of such
representations exist, methods for extracting this source of knowledge are still in
an early stage. We present a method for (1)...
With the proliferation of collaborative and mobile technologies, online citizen science has become a booming approach to research and public engagement. Citizen science refers to various forms of engaging non-credentialed volunteers (citizens) in different aspects of scientific research, such as data collection, analysis and, more rarely, developme...
The construct and instrument development process relies significantly on human judgment in the initial stages of the process, specifically in developing construct definition statements, and in developing measurement instruments with high content validity. Natural language processing (NLP) techniques can be used to support human judgment and improve...
Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticated text indistin- guishable from that produced by a human, are applicable across a wide range of contexts. The technology presents opportunities as well as, often ethical and legal, challenges, and has the potential for both positive and negative impacts...
Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticated text indistinguishable from that produced by a human, are applicable across a wide range of contexts. The technology presents opportunities as well as, often ethical and legal, challenges, and has the potential for both positive and negative impacts f...
Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticated text indistinguishable from that produced by a human, are applicable across a wide range of contexts. The technology presents opportunities as well as, often ethical and legal, challenges, and has the potential for both positive and negative impacts f...
We introduce the Information Systems Ontology (ISO), a new ontology for the Information Systems (IS) discipline designed to enable automated knowledge synthesis and meta-analysis of research findings in IS. We constructed ISO in a methodical manner, following known best practices for ontology construction. We also conducted a series of ontology ref...
The 21st century has introduced the 4th Industrial Revolution, which describes an industrial paradigm shift that alters social, economic, and political environments simultaneously. Innovative technologies such as blockchain, artificial intelligence, and advanced mobile networks power this digital revolution. These technologies provide a unique comp...
Academic disciplines are often organized according to the behaviors they examine. While most research on a behavior tends to exist within one discipline, some behaviors are examined by multiple disciplines. Better understanding of behaviors and their relationships should enable knowledge transfer across disciplines and theories, thereby dramaticall...
After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall...
Having evaluated all the measures and selected the best model for this case, and much of the machine learning process has been clarified, our understanding of the problem context is still relatively immature. That is, while we have carefully specified the problem, we still do not fully understand what drives that target. Convincing management to su...
The methodological tools available for psychological and organizational assessment are rapidly advancing through natural language processing (NLP). Computerized analyses of texts are increasingly available as extensions of traditional psychometric approaches. The present Research Topic is recognizing the contributions but also the challenges in pub...
This section covers the first steps of a the Machine Learning Life Cycle Model; how to specify a business problem, acquire subject matter expertise, define prediction target, define unit of analysis, identify success criteria, evaluate risks, and finally, decide whether to continue a project. Focus is on who will use the model, whether management i...
Machine learning is involved in search, translation, detecting depression, likelihood of college dropout, finding lost children, and to sell all kinds of products. While barely beyond its inception, the current machine learning revolution will affect people and organizations no less than the Industrial Revolution’s effect on weavers and many other...
Access to additional and relevant data will lead to better predictions from algorithms until we reach the point where more observations (cases) are no longer helpful to detect the signal, the feature(s), or conditions that inform the target. In addition to obtaining more observations, we can also look for additional features of interest that we do...
This section covers the final section of the machine learning life cycle. Consider these the most important steps of the entire process. This is the point at which we have the greatest potential to help our organization reap the benefits of machine learning. In traditional information systems development, 60–80% of the cost of a system comes during...
In Automated Machine Learning for Business , we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math,...
This chapter reviews the person-situation dimension in behavior prediction through the semantic theory of survey responses (STSR). This theory proposes that the most likely source of variation in correlations between scores on Likert-scale items is overlap in meaning. We review and explain a growing number of empirical studies that support this: Up...
Research in design science has always acknowledged the need for evaluating its knowledge outcomes, with particular emphasis on assessing the efficacy and utility of the artifacts produced. However, the need to demonstrate the validity of the research process and outcomes has not received as much attention. This research examines scientific approach...
The rapid and wide dissemination of up-to-date, localized information is a central issue during disasters. Being attributed to the original 140-character length, Twitter provides its users with quick-posting and easy-forwarding features that facilitate the timely dissemination of warnings and alerts. However, a concern arises with respect to the te...
This study uses latent semantic analysis (LSA) to explore how prevalent measures of motivation are interpreted across very diverse job types. Building on the Semantic Theory of Survey Response (STSR), we calculate “semantic compliance” as the degree to which an individual’s responses follow a semantically predictable pattern. This allows us to exam...
Research in design science has always acknowledged the need for evaluating its knowledge outcomes, with particular emphasis on assessing the efficacy and utility of the artifacts produced. However, the need to demonstrate the validity of the research process and outcomes has not received as much attention. This research examines scientific approach...
Trust and distrust are crucial aspects of human interaction that determine the nature of many organizational and business contexts. Because of socialization-borne familiarity that people feel about others, trust and distrust can influence people even when they do not know each other. Allowing that some aspects of the social knowledge that is acquir...
Likert scale surveys are frequently used in cross-cultural studies on leadership. Recent publications using digital text algorithms raise doubt about the source of variation in statistics from such studies to the extent that they are semantically driven. The Semantic Theory of Survey Response (STSR) predicts that in the case of semantically determi...
An important element of rigor in the information systems (IS) discipline are research validities. Broadly, validity deals with the quality of scientific research and dependability of scientific findings. Research validities provide procedural templates to collect and analyze evidence and justify the arguments and conclusions of a research study. Th...
In this article, we provide a review of research-curation and knowledge-management efforts that may be leveraged to advance research and education in psychological science. After reviewing the approaches and content of other efforts, we focus on the metaBUS project’s platform, the most comprehensive effort to date. The metaBUS platform uses standar...
Validity is among the most foundational and widely used concepts in science. Much has been written on the subject, yet, we continue to lack established definitions of research validities. This paper presents preliminary results for developing a general ontology of research validity. In this paper, we assembled the largest data set of validities and...
Likert-scale surveys are frequently used in cross-cultural studies on leadership. Recent publications using digital text algorithms raise doubt about the source of variation in statistics from such studies to the extent that they are semantically driven. The Semantic Theory of Survey Response (STSR) predicts that in the case of semantically determi...
Over the last century, the social and behavioral sciences have accumulated a vast storehouse of knowledge with the potential to transform society and all its constituents. Unfortunately, this knowledge has accumulated in a form (e.g., journal articles) that makes it extremely difficult to search, categorize, analyze and integrate across studies due...
Validity and reliability are among the most widely used concepts in science. Broadly, both deal with the quality of scientific research and dependability of scientific findings. Many volumes have been written on the subject, with countless mentions and uses of the terms in scientific papers. Yet, confusion reigns supreme and we continue to lack est...
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have stu...
Complete dataset as used in the article.
All variables necessary to replicate our results with two exceptions: a) The MLQ items are copyright protected and only included with item numbers and semantic values, b) the replications with 2–500 dimensions are too extensive to be replicated here.
(XLSX)
The goal of a review article is to present the current state of knowledge in a research area. Two important initial steps in writing a review article are boundary identification (identifying a body of potentially relevant past research) and corpus construction (selecting research manuscripts to include in the review). We present a theory-as-discour...
This chapter outlines the types of Automated Machine Learning, available tools and platforms, provides criteria for evaluating AutoML tools, and provides 30 machine learning principles and their conversion to AutoML. Chapter from: Kai R. Larsen and Daniel Becker, Automated Machine Learning for Business. Oxford University Press, 2019
There are contexts in which one or more features, while entirely legitimate for modeling, are illegitimate for model evaluation. More specifically, a functional model could be built and put into production using such features, but these features would have occurred at or even after the data in the validation set, introducing a target leakage proble...
Many experts consider target leakage one of the most insidious problems of automated machine learning. In this book, the term "target leakage" (aka. data leakage) will be defined in a broader scope than usual, as this provides an opportunity to discuss related issues of importance. Our definition of target leakage considers a target leak to have oc...
This is a methodological presentation of the relationship between semantics and survey statistics in human resource development (HRD) research. This study starts with an introduction to the semantic theory of survey response (STSR) and proceeds by offering a guided approach to conducting such analyses. The reader is presented with two types of sema...
Background: Academic disciplines are often organized according to the behaviors they examine. For example, behavioral medicine addresses health-related behaviors, such as smoking, drug abuse, and exercise, whereas a more specialized discipline such as information systems focuses on technology use. While most research on a behavior tends to exist wi...
In this paper, we use Latent Semantic Analysis to explore the design battles in smartphones. Using newspaper coverage from 1992-2012, we build a semantic model of the media coverage to identify article clusters. Cluster membership gives us visibility into trends in coverage over the course of the study. We find that five distinct periods can be ide...
The semantic theory of survey responses (STSR) proposes that the prime source of statistical covariance in survey data is the degree of semantic similarity (overlap of meaning) among the items of the survey. Because semantic structures are possible to estimate using digital text algorithms, it is possible to predict the response structures of Liker...
The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to...
In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then
projecti...
Word co-occurrences in text carry lexical information that can be harvested by data-mining tools such as latent semantic analysis (LSA). This research perspectives article (RPA) demonstrates the potency of using such embedded information by demonstrating that the Technology Acceptance Model (TAM) can be reconstructed significantly by analyzing unre...
Literature reviews (LRs) are recognized for their increasing impact in the information systems literature.
Methodologists have drawn attention to the question of how we can leverage the value of LRs to preserve and
generate knowledge. The panelists who participated in the discussion of “Standalone Literature Reviews in IS
Research: What Can Be Lear...
Assessing the similarity of proposed theoretical constructs to each other and those previously known and studied is imperative in theoretical research. In this paper we turn to theories of similarity judgement from cognitive psychology for the understanding of the process of establishing similarity between one or more constructs. Then, guided by th...
A central goal of behavioral medicine is the creation of evidence-based interventions for promoting behavior change. Scientific knowledge about behavior change could be more effectively accumulated using “ontologies.” In information science, an ontology is a systematic method for articulating a “controlled vocabulary” of agreed-upon terms and their...
Abstract: The traditional understanding of data from Likert scales is that the quantifications involved are resulting from measures of attitude strength. Building on our recently proposed a semantic theory of survey response (STSR), we claim that survey responses tap two different sources; a mixture of attitudes plus the semantic structure of the s...
Abstract: The semantic theory of survey response (STSR) proposes that the prime source of statistical co-variance in survey data is the degree of semantic similarity (overlap of meaning) among the items of the survey. The present study applies STSR in an experimental way by mimicking real survey responses through the use semantic information. A sam...
The accumulated literature base in the behavioral sciences represents a great source of knowledge on human behaviors, and yet the same literature has grown beyond human comprehension. We address this information overload problem by proposing a novel IT artifact-TheoryOn. Based on the design science paradigm, we identify five design requirements. We...
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have stu...
People may confuse leadership with heroism due to the semantic overlap between their descriptions. This may explain some facets of fascination with leadership and obstructions to differentiated viewpoints of leadership as a group phenomenon. Building on the semantic theory of survey response (STSR), we are able to show how prevalent measures of cha...
Is survey data a source of new information, or could surveys just be begging their questions? The authors of this opinion piece suspect that survey data in leadership research do not reflect attitudes to workplace phenomena. Instead, they may just be assessments of the similarity of the language in the applied items. In a recent article in the jour...
The accumulated literature base in the behavioral sciences represents the IS discipline’s greatest source of knowledge, and yet the same literature has grown beyond human comprehension. An experiment is conducted showing the inability of experts to retrieve relevant constructs using full-text search. To address this inability to access the body of...
Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language proc...
Through an evaluation of hospital websites, we show that 98% of the top 148 U.S. hospitals share visitor health-diagnostic meta-data with close to 200 private web tracking companies such as Google, Facebook, and ScorecardResearch, including data on current and future patients. The data sharing was pervasive, varied a great deal by hospital, and was...
Theory identity is a fundamental problem for researchers seeking to determine theory quality, create theory ontologies and taxonomies, or perform focused theory-specific reviews and meta-analyses. We demonstrate a novel machine-learning approach to theory identification based on citation data and article features. The multi-disciplinary ecosystem o...
Introduction to Theories in IS Research Minitrack.
We propose an automatic construct-level citation extraction system (ACCE) to refine citations from the paper level to the construct level. This paper follows the design science paradigm (Hevner et al. 2004; March and Smith 1995; Nunamaker et al. 1991). The remaining sections are organized as follows. We first analyze the tasks involved in extractin...
Advancement in science requires clarity of constructs.Like other fields in behavioral science, addiction research is being held back by researchers' use of different terms to mean similar things (synonymy) and the same term to mean different things (polysemy). Journals can help researchers to stay focused on novel and significant research questions...
Purpose: We applied internomological network (INN) analysis, a novel approach that classifies constructs based on their underlying meaning, to constructs from the National Cancer Institute (NCI)’s Grid-enabled measures (GEM) database. Seven expert raters sorted these constructs using Michie’s Theoretical Domains Framework (TDF). Our objectives were...
This research presents a meta-theoretic analysis of a nomological net for the purpose of identification of potential pathways for theory integration and multi-level theory development. Success in these two areas holds the potential to reduce theory clutter in IS and related social sciences. As a proof-of-concept, we identify theory domains that sha...