Galit Shmueli

Galit Shmueli
National Tsing Hua University | NTHU · Institute of Service Science

PhD

About

221
Publications
121,527
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,560
Citations

Publications

Publications (221)
Preprint
Full-text available
Data-driven organizations around the world routinely use forecasting methods to improve their planning and decision-making capabilities. Although much research exists on the harms resulting from traditional machine learning applications, little has specifically focused on the ethical impact of time series forecasting. Yet forecasting raises unique...
Article
We join the important effort of embracing diverse views on causality in a prior Editor’s Comment (Mithas, et al. 2022a). Specifically, we aim to expand the discussion around a major causal framework and toolkit that, we believe, is largely missing and needed in empirical studies in the field of information systems (IS): that of causal diagrams and...
Chapter
Count data is used for prediction in a variety of applications in transportation, marketing, retail, telecom, and other areas. Data in these fields display a wide range of dispersion: from overdispersion, where the variance is larger than the mean, through equidispersion, to underdispersion, where the variance is smaller than the mean. The Conway-...
Article
The editorial process at our leading information systems journals has been pivotal in shaping and growing our field. But this process has grown long in the tooth and is increasingly frustrating and challenging its various stakeholders: editors, reviewers, and authors. The sudden and explosive spread of AI tools, including advances in language model...
Article
Transportability is a structural causal modeling approach aimed at “transporting” a causal effect from a randomized experimental study in one population to a different population where only observational data are available. It allows for extracting much more value from randomized control trials because under some conditions, it allows the estimatio...
Chapter
This chapter explores personalization and its connection to the philosophical concept of the person, arguing that a deeper understanding of the human person and a good society is essential for ethical personalization. Insights from artificial intelligence (AI), philosophy, law, and more are employed to examine personalization technology. The author...
Preprint
Full-text available
Algorithms used by organizations increasingly wield power in society as they decide the allocation of key resources and basic goods. In order to promote fairer, juster, and more transparent uses of such decision-making power, explainable artificial intelligence (XAI) aims to provide insights into the logic of algorithmic decision-making. Despite mu...
Book
Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, c...
Preprint
Full-text available
Construct-based models have become a mainstay of management and information systems research. However, these models are likely overfit to the data samples upon which they are estimated, making them risky to use in explanatory, prescriptive, or predictive ways outside a given sample. Empirical researchers currently lack tools to analyze why and how...
Article
Construct-based models have become a mainstay of management and information systems research. However, these models are likely overfit to the data samples upon which they are estimated, making them risky to use in explanatory, prescriptive, or predictive ways outside a given sample. Empirical researchers currently lack tools to analyze why and how...
Article
Advances in reinforcement learning and implicit data collection on large-scale commercial platforms mark the beginning of a new era of personalization aimed at the adaptive control of human user environments. We present five emergent features of this new paradigm of personalization that endanger persons and societies at scale and analyze their pote...
Article
Full-text available
In response to growing recognition of the social impacts of new artificial intelligence (AI)-based technologies, major AI and machine learning (ML) conferences and journals now encourage or require papers to include ethics impact statements and undergo ethics reviews. This move has sparked heated debate concerning the role of ethics in AI research,...
Article
Big data and algorithmic risk prediction tools promise to improve criminal justice systems by reducing human biases and inconsistencies in decision‐making. Yet different, equally justifiable choices when developing, testing and deploying these socio‐technical tools can lead to disparate predicted risk scores for the same individual. Synthesising di...
Article
Many internet platforms that collect behavioral big data use it to predict user behavior for internal purposes and for their business customers (e.g., advertisers, insurers, security forces, governments, political consulting firms) who utilize the predictions for personalization, targeting, and other decision-making. Improving predictive accuracy i...
Preprint
Full-text available
In response to the growing recognition of the social, legal, and ethical impacts of new AI-based technologies, major AI and ML conferences and journals now encourage or require submitted papers to include ethics impact statements and undergo ethics reviews. This move has sparked heated debate concerning the role of ethics in AI and data science res...
Article
The era of behavioural big data has created new avenues for data science research, with many new contributions stemming from academic researchers. Yet data controlled by platforms have become increasingly difficult for academics to access. Platforms now routinely use algorithmic behaviour modification techniques to manipulate users’ behaviour, leav...
Article
Forecasting hierarchical or grouped time series using a reconciliation approach involves two steps: computing base forecasts and reconciling the forecasts. Base forecasts can be computed by popular time series forecasting methods such as Exponential Smoothing (ETS) and Autoregressive Integrated Moving Average (ARIMA) models. The reconciliation step...
Preprint
Full-text available
Algorithms, from simple automation to machine learning, have been introduced into judicial contexts to ostensibly increase the consistency and efficiency of legal decision making. In this paper, we describe four types of inconsistencies introduced by risk prediction algorithms. These inconsistencies threaten to violate the principle of treating sim...
Preprint
Full-text available
Personalization should take the human person seriously. This requires a deeper understanding of how recommender systems can shape both our self-understanding and identity. We unpack key European humanistic and philosophical ideas underlying the General Data Protection Regulation (GDPR) and propose a new paradigm of humanistic personalization. Human...
Preprint
The fields of statistics and machine learning design algorithms, models, and approaches to improve prediction. Larger and richer behavioral data increase predictive power, as evident from recent advances in behavioral prediction technology. Large internet platforms that collect behavioral big data predict user behavior for internal purposes and for...
Preprint
Full-text available
The field of computational statistics refers to statistical methods or tools that are computationally intensive. Due to the recent advances in computing power some of these methods have become prominent and central to modern data analysis. In this article we focus on several of the main methods including density estimation, kernel smoothing, smooth...
Preprint
Full-text available
We propose a tree-based semi-varying coefficient model for the Conway-Maxwell- Poisson (CMP or COM-Poisson) distribution which is a two-parameter generalization of the Poisson distribution and is flexible enough to capture both under-dispersion and over-dispersion in count data. The advantage of tree-based methods is their scalability to high-dimen...
Article
We propose a tree-based semi-varying coefficient model for the Conway-Maxwell-Poisson (CMP or COM-Poisson) distribution which is a two-parameter generalization of the Poisson distribution and is flexible enough to capture both under-dispersion and over-dispersion in count data. The advantage of tree-based methods is their scalability to high-dimens...
Preprint
Though used extensively, the concept and process of machine learning (ML) personalization have generally received little attention from academics, practitioners, and the general public. We describe the ML approach as relying on the metaphor of the person as a feature vector and contrast this with humanistic views of the person. In light of the rece...
Conference Paper
Methodological research in Partial Least Squares Path Modeling (PLS-PM), a construct-based modeling technique, has seen a flurry of efforts to introduce predictive analytic methods. However, there is still confusion about how prediction can be applied to refine theory and integrate with this traditionally inferential technique. We feel that predict...
Article
We propose two methods for time-series clustering that capture temporal information(trend, seasonality, autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as automated yet transparent tools for clustering large collections of time series. We address the chal...
Article
Purpose Partial least squares (PLS) has been introduced as a “causal-predictive” approach to structural equation modeling (SEM), designed to overcome the apparent dichotomy between explanation and prediction. However, while researchers using PLS-SEM routinely stress the predictive nature of their analyses, model evaluation assessment relies exclusi...
Preprint
Classification tasks are common across many fields and applications where the decision maker's action is limited by resource constraints. In direct marketing only a subset of customers is contacted; scarce human resources limit the number of interviews to the most promising job candidates; limited donated organs are prioritized to those with best f...
Article
Rapid growth in the availability of behavioral big data (BBD) has outpaced the speed of updates to ethical research codes and regulation of data privacy and human subjects' data collection, storage, and use. The introduction of the European Union's (EU's) General Data Protection Regulation (GDPR) in May 2018 will have far-reaching effects on data s...
Article
Partial least squares path modeling (PLS-PM) has become popular in various disciplines to model structural relationships among latent variables measured by manifest variables. To fully benefit from the predictive capabilities of PLS-PM, researchers must understand the efficacy of predictive metrics used. In this research, we compare the performance...
Article
Full-text available
Exploring theoretically plausible alternative models for explaining the phenomenon under study is a crucial step in advancing scientific knowledge. This paper advocates model selection in Information Systems (IS) studies that use Partial Least Squares path modeling (PLS) and suggests the use of model selection criteria derived from Information Theo...
Article
Analytics is important for education planning. Deploying forecasting analytics requires management information systems (MISs) that collect the needed data and deliver the forecasts to stakeholders. A critical question is whether the data collected by a system is adequate for producing the analytics for decision making. We describe the case of a new...
Article
The Conway–Maxwell–Poisson (CMP) or COM–Poisson regression is a popular model for count data due to its ability to capture both under dispersion and over dispersion. However, CMP regression is limited when dealing with complex nonlinear relationships. With today's wide availability of count data, especially due to the growing collection of data on...
Article
Studying causal effects is central to research in operations management in manufacturing and services, from evaluating prevention procedures, to effects of policies and new operational technologies and practices. The growing availability of micro-level data creates challenges for researchers and decision makers in terms of choosing the right level...
Conference Paper
Generating predictions from PLS models is a recent and novel addition to the research and practice of structural equation modeling. Shmueli et al. (2016) gave us an explicit understanding of what prediction should entail in the context of PLS. That study also demonstrated how to generate predictions using the measurement items and structure of the...
Article
Behavioral big data (BBD) refers to very large and rich multidimensional data sets on human and social behaviors, actions, and interactions, which have become available to companies, governments, and researchers. A growing number of researchers in social science and management fields acquire and analyze BBD for the purpose of extracting knowledge a...
Article
Full-text available
Linear regression is among the most popular statistical models in social sciences research, and researchers in various disciplines use linear probability models (LPMs)—linear regression models applied to a binary outcome. Surprisingly, LPMs are rare in the IS literature, where researchers typically use logit and probit models for binary outcomes. R...
Article
The field of computational statistics refers to statistical methods or tools that are computationally intensive. Due to the recent advances in computing power, some of these methods have become prominent and central to modern data analysis. In this paper, we focus on several of the main methods including density estimation, kernel smoothing, smooth...
Article
Full-text available
The term quality of statistical data, developed and used in official statistics and international organizations such as the International Monetary Fund (IMF) and the Organisation for Economic Co-operation and Development (OECD), refers to the usefulness of summary statistics generated by producers of official statistics. Similarly, in the context o...
Article
Full-text available
Count data are a popular outcome in many empirical studies, especially as big data has become available on human and social behavior. The Conway-Maxwell Poisson (CMP) distribution is popularly used for modeling count data due to its ability to handle both overdispersed and underdispersed data. Yet, current methods for estimating CMP regression mode...
Book
Provides an important framework for data analysts in assessing the quality of data and its potential to provide meaningful insights through analysis. Analytics and statistical analysis have become pervasive topics, mainly due to the growing availability of data and analytic tools. Technology, however, fails to deliver insights with added value if t...
Chapter
Full-text available
Research is about the advancement of knowledge. A main tool of research and empirical studies is the publication of results. Reports on difficulties in deriving repeated results under different circumstances or even with the original data have been growing, posing fundamental questions such as how to ensure integrity of research work. In some cases...
Chapter
Full-text available
The term quality of statistical data, developed and used in official statistics and international organizations such as the IMF and the OECD, refers to the usefulness of summary statistics and indicators generated by producers of official statistics. Similarly, in the context of survey quality, official agencies such as Eurostat, NCSES, and Statist...
Chapter
This chapter introduces the information quality (InfoQ) framework taking a structural approach, heretofore missing in the literature and education curricula. We introduce the InfoQ concept, its components and dimensions, and a formal definition. We then illustrate the InfoQ components by comparing several types of studies in the field of online auc...
Chapter
In this chapter we present seven case studies of data-driven research in the context of healthcare. The first case study is related to two influential reports prepared by the Institute of Medicine (IOM) that have significantly increased the understanding of the impact of current US healthcare processes on patient safety. Through the InfoQ lens, we...
Chapter
Customer surveys are designed to collect data on customer experience and customer opinions. They are a prime example of a situation in which operationalization and communication of results are key elements of a successful analysis. If a customer satisfaction survey does not lead to specific actions or is not adequately communicated to organizationa...
Chapter
This chapter describes statistical approaches designed to increase InfoQ at the postdata collection stage. Data can be primary, secondary, or semisecondary. At this stage, the data is affected by both a priori (ex ante) causes and a posteriori (ex post) causes. This creates a difference between data planned to be collected and data actually collect...
Chapter
Reviewers play a critical role in the publication process, an important landmark in scientific research. Yet, in many journals, acceptance of scientific papers for publication relies on the reviewer's experience and good sense, with no clear guidelines. The lack of guidance increases uncertainty and variability in the usefulness of reviews. This ch...
Chapter
The InfoQ components and dimensions presented in the previous chapters were applied to a wide range of domains such as education, healthcare, surveys, and official statistics. In this chapter, the focus is on education programs in areas such as data science, business analytics, or statistical methods. In this context, the focus is on practice-orien...
Chapter
Risk management is a prime example of a situation in which all InfoQ dimensions play a critical role. Risk assessment requires data at the right resolution, with proper integration, temporal relevance, and an analysis ensuring chronology of data and goals. Risks need to be addressed by decisions and actions; therefore, proper operationalization is...
Chapter
Full-text available
This chapter presents a breakdown of the InfoQ concept into eight dimensions for assessing the information quality (InfoQ) in a study. We start by describing approaches for assessing the concept of data quality, popular in marketing and medical research and government organizations. We then use a similar framework to create the eight dimensions of...
Chapter
This chapter introduces the application of the InfoQ framework to education-related studies. It includes four case studies. The first is the Missouri Assessment Program report card. Two other case studies are related to the application of value-added models (VAMs) in education. One study looks at the impact of value-added teachers on students’ long...
Chapter
This chapter examines established data collection and study design strategies aimed at increasing InfoQ at the predata collection stage. We also examine constraints such as resource limitations, ethical considerations, and human conformance that lower InfoQ. The two most applicable domains are surveys and experimental design. In experimental design...
Chapter
This chapter examines quality in terms of these information quality (InfoQ) components: quality of the analysis goal, data quality, analysis quality and quality of utility. Although the quality of each of the individual components affects InfoQ, it is the combination of the four that determines the level of InfoQ. The chapter aims to help the reade...
Article
The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict assumptions. We instead propose a generalized homogeneous count process (which we name the Conway-Maxwell-Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexibl...
Article
The term “Big Data” evokes emotions ranging from excitement to exasperation in the statistics community. Looking beyond these emotions reveals several important changes that affect us as statisticians and as humans. I focus on Behavioral Big Data (BBD), or very large and rich multidimensional datasets on human behaviors, actions and interactions, w...
Conference Paper
Despite the growing interest in predictive analytics using PLS models, there are no practical studies that demonstrate the application of predictive PLS modeling. This study reexamines an established empirical model and reanalyzes it through the lens of predictive analytics. In implementing predictive PLS procedures in recent literature, we uncover...
Article
Attempts to introduce predictive performance metrics into partial least squares (PLS) path modeling have been slow and fall short of demonstrating impact on either practice or scientific development in PLS. This study contributes to PLS development by offering a comprehensive framework that identifies different dimensions of prediction and their ef...
Article
Reviewers play a critical role in the publication process, the hallmark of scientific advancement. Yet, in many journals, determining the contribution of a paper is left to the reviewer's experience and good sense without providing structured guidelines. This lack of guidance to authors and reviewers increases uncertainty and variability in the use...
Article
Attempts to introduce predictive performance metrics into Partial Least Squares (PLS) path modeling have been slow and fall short of demonstrating impact on both practice and scientific development in PLS. This study contributes to PLS development by offering a comprehensive framework that identifies different dimensions of prediction and their eff...
Article
The growing popularity of online dating websites is altering one of the most fundamental human activities: finding a date or a marriage partner. Online dating platforms offer new capabilities, such as extensive search, big-data based mate recommendations and varying levels of anonymity, whose parallels do not exist in the physical world. Yet, littl...
Article
Multivariate control charts are used for monitoring multiple series simultaneously, for the pur- pose of detecting shifts in the mean vector in any direction. In the context of disease out- break detection, interest is in detecting only an increase in the process means. Two practical approaches for deriving directional Hotelling charts are Follmann...
Article
The term quality of statistical data, developed and used in official statistics and international organizations such as the IMF and the OECD, refers to the usefulness of summary statistics generated by producers of official statistics. Similarly, in the context of survey quality, official agencies such as Eurostat, NCSES and Statistics Canada creat...
Article
Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in social science research, where the main objective is causal explanation. Ideal causal modeling is based on randomized experiments, but because experiments are often impossible, unethical or expensive to perform, social science research often...
Article
The Bernoulli and Poisson are two popular discrete count processes; however, both rely on strict assumptions that motivate their use. We instead propose a generalized count process (the Conway-Maxwell-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexible mechanism to describe cou...
Article
Many employers expect to face a significant shortfall of workers with data science skills in the coming decade. This panel focuses on the opportunities and challenges this poses for the Information Systems (IS) community. Specifically, the panel focuses on three key questions at the nexus of data science, skills, and IS: a) characterizing the chang...
Article
Sizes of datasets used in IS research are growing quickly due to data available from digital technologies such as mobile, RFID, sensors, online markets, and more. It is not uncommon to see studies using tens and hundreds of thousands or even millions of records. Linear regression is among the most popular statistical model in social sciences resear...
Article
This work is aimed at finding potential Simpson's paradoxes in Big Data. Simpson's paradox (SP) arises when choosing the level of data aggregation for causal inference. It describes the phenomenon where the direction of a cause on an effect is reversed when examining the aggregate vs. disaggregates of a sample or population. The practical decision...
Article
Full-text available
The Internet has provided IS researchers with the opportunity to conduct studies with extremely large samples, frequently well over 10,000 observations. There are many advantages to large samples, but researchers using statistical inference must be aware of the p-value problem associated with them. In very large samples, p-values go quickly to zero...
Article
Full-text available
Bimodal truncated count distributions are frequently observed in aggregate survey data and in user ratings when respondents are mixed in their opinion. They also arise in censored count data, where the highest category might create an additional mode. Modeling bimodal behavior in discrete data is useful for various purposes, from comparing shapes o...
Conference Paper
Full-text available
Numbers are not data and data analysis does not necessarily produce information and knowledge. Statistics, data mining, and artificial intelligence are disciplines focused on extracting knowledge from data. They provide tools for testing hypotheses, predicting new observations, quantifying population effects, and summarizing data efficiently. In th...

Network

Cited By