# Gareth James's research while affiliated with University of Southern California and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (92)

Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which con...

Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which con...

We study fairness in classification, where one wishes to make automated decisions for people from different protected groups. When individuals are classified, the decision errors can be unfairly concentrated in certain protected groups. We develop a fairness-adjusted selective inference (FASI) framework and data-driven algorithms that achieve stati...

So far in this book, we have mostly focused on linear models. Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference.

In the regression setting, the standard linear model Y=β0+β1X1+⋯+βpXp+ϵ is commonly used to describe the relationship between a response Y and a set of variables X1, X2,…,Xp.

In this chapter, we will consider the topics of survival analysis and censored data. These arise in the analysis of a unique kind of outcome variable: the time until an event occurs.

This chapter is about linear regression, a very simple approach for supervised learning. In particular, linear regression is a useful tool for predicting a quantitative response. It has been around for a long time and is the topic of innumerable textbooks. Though it may seem somewhat dull compared to some of the more modern statistical learning app...

In this chapter, we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. In order to make a prediction for a given observation, we typically use the mean or the mode response value for the training observations in the region to which it belongs.

Resampling methods are an indispensable tool in modern statistics. They involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model.

The linear regression model discussed in Chap. 3 assumes that the response variable Y is quantitative. But in many situations, the response variable is instead qualitative.

Thus far, this textbook has mostly focused on estimation and its close cousin, prediction. In this chapter, we instead focus on hypothesis testing, which is key to conducting inference. We remind the reader that inference was briey discussed in Chapter 2.

In this chapter, we discuss the support vector machine (SVM), an approach for classification that was developed in the computer science community in the 1990s and that has grown in popularity since then.

In order to motivate our study of statistical learning, we begin with a simple example. Suppose that we are statistical consultants hired by a client to investigate the association between advertising and sales of a particular product.

Most of this book concerns supervised learning methods such as regression and classification. In the supervised learning setting, we typically have access to a set of p features X1,X2,…,Xp, measured on n observations, and a response Y also measured on those same n observations. The goal is then to predict Y using X1,X2,…,Xp.

This chapter covers the important topic of deep learning. At the time of writing (2020), deep learning is a very active area of research in the machine learning and artificial intelligence communities.

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modelin...

Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, despite conventional wisdom to the contrary, we show that there can be a significant loss in information from basing hypothesis tests on standar...

When faced with new technologies, the incumbents’ dilemma is whether to embrace the new technology, stick with their old technology, or invest in both. The entrants’ dilemma is whether to target a niche and avoid incumbent reaction or target the mass market and incur the incumbent’s wrath. The solution is to know to what extent the new technology c...

We consider the common setting where one observes probability estimates for a large number of events, such as default risks for numerous bonds. Unfortunately, even with unbiased estimates, selecting events corresponding to the most extreme probabilities can result in systematically underestimating the true level of uncertainty. We develop an empiri...

We consider estimating a functional graphical model from multivariate functional observations. In functional data analysis, the classical assumption is that each function has been measured over a densely sampled grid. However, in practice the functions have often been observed, with measurement error, at a relatively small number of points. We prop...

The simultaneous estimation of many parameters $\eta_i$, based on a corresponding set of observations $x_i$, for $i=1,\ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $\mu_i$ subject to a fixed variance term $\sigma^2$. Howeve...

We consider the common setting where one observes probability estimates for a large number of events, such as default risks for numerous bonds. Unfortunately, even with unbiased estimates, selecting events corresponding to the most extreme probabilities can result in systematically underestimating the true level of uncertainty. We develop an empiri...

Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, there can be a significant loss in information from basing hypothesis tests on standardized statistics rather than the full data. We develop a n...

Firms are increasingly transitioning advertising budgets to Internet display campaigns, but this transition poses new challenges. These campaigns use numerous potential metrics for success (e.g., reach or click rate), and because each website represents a separate advertising opportunity, this is also an inherently high-dimensional problem. Further...

In today's digital market, the number of websites available for advertising has ballooned into the millions. Consequently, firms often turn to ad agencies and demand-side platforms (DSPs) to decide how to allocate their Internet display advertising budgets. Nevertheless, most extant DSP algorithms are rule-based and strictly proprietary. This artic...

Graphical models have attracted increasing attention in recent years, especially in settings involving high dimensional data. In particular Gaussian graphical models are used to model the conditional dependence structure among multiple Gaussian random variables. As a result of its computational efficiency the graphical lasso (glasso) has become one...

We suggest a new method, called "Functional Additive Regression", or FAR, for efficiently performing high dimensional functional regression. FAR extends the usual linear regression model involving a functional predictor, X(t), and a scalar response, Y , in two key respects. First, FAR uses a penalized least squares optimization approach to efficien...

While functional regression models have received increasing attention recently, most existing approaches assume both a linear relationship and a scalar response variable. We suggest a new method, "Functional Response Additive Model Estimation" (FRAME), which extends the usual linear regression model to situations involving both functional predictor...

The regression problem involving functional predictors has many important applications and a number of functional regression methods have been developed. However, a common complication in functional data analysis is one of sparsely observed curves, that is predictors that are observed, with error, on a small subset of the possible time points. Such...

Lactose intolerance (LI) is a common medical problem with limited treatment options. The primary symptoms are abdominal pain, diarrhea, bloating, flatulence, and cramping. Limiting dairy foods to reduce symptoms contributes to low calcium intake and the risk for chronic disease. Adaptation of the colon bacteria to effectively metabolize lactose is...

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modelin...

In this chapter, we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. In order to make a prediction for a given observation, we typically use the mean or the mode of the training observations in the region to which it belongs. Since the set of...

In the regression setting, the standard linear model

So far in this book, we have mostly focused on linear models. Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference. However, standard linear regression can have significant limitations in terms of predictive power. This is because the linearity assumption i...

Resampling methods are an indispensable tool in modern statistics. They involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model. For example, in order to estimate the variability of a linear regression fit, we can repeatedly draw differe...

This chapter is about linear regression, a very simple approach for supervised learning. In particular, linear regression is a useful tool for predicting a quantitative response. Linear regression has been around for a long time and is the topic of innumerable textbooks. Though it may seem somewhat dull compared to some of the more modern statistic...

The linear regression model discussed in Chapter 3 assumes that the response variable Y is quantitative. But in many situations, the response variable is instead qualitative. For example, eye color is qualitative, taking on values blue, brown, or green. Often qualitative variables are referred to as categorical; we will use these terms interchangea...

In order to motivate our study of statistical learning, we begin with a simple example. Suppose that we are statistical consultants hired by a client to provide advice on how to improve sales of a particular product. The Advertising data set consists of the sales of that product in 200 different markets, along with advertising budgets for the produ...

Classification problems involving a categorical class label Y and a functional predictor X(t) are becoming increasingly common. Since X(t) is infinite dimensional, some form of dimen-sion reduction is essential in these problems. Conventional dimension reduction techniques for functional data usually suffer from one or both of the following problem...

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modelin...

Competition is intense among rival technologies and success depends on predicting their future trajectory of performance. To resolve this challenge, managers often follow popular heuristics, generalizations, or “laws” like the Moore’s Law. We propose a model, Step And Wait (SAW), for predicting the path of technological innovation and compare its p...

In this article, we propose a new method for principal component analysis (PCA), whose main objective is to capture natural “blocking” structures in the variables. Further, the method, beyond selecting different variables for different components, also encourages the loadings of highly correlated variables to have the same magnitude. These two feat...

Recently, considerable interest has focused on variable selection methods in
regression situations where the number of predictors, $p$, is large relative to
the number of observations, $n$. Two commonly applied variable selection
approaches are the Lasso, which computes highly shrunk regression coefficients,
and Forward Selection, which uses no shr...

Numerous penalization based methods have been proposed for fitting a traditional linear regression model in which the number of predictors, p, is large relative to the number of observations, n. Most of these approaches assume sparsity in the underlying coefficients and perform some form of variable selection. Recently, some of this work has been e...

High-dimensional classification has become an increasingly important problem. In this paper we propose a "Multivariate Adaptive Stochastic Search" (MASS) approach which first reduces the dimension of the data space and then applies a standard classification method to the reduced space. One key advantage of MASS is that it automatically adjusts to m...

This paper is concerned with classifying high dimensional data into one of two categories. In various settings, such as when dealing with fMRI and microarray data, the number of vari- ables is very large, which makes well-known classification techniques impractical. The num- ber of variables might be reduced via principal component analysis or some...

In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corr...

Classification problems involving a categorical class label Y and a functional predictor X(t) are becoming increasingly common. Since X(t) is essentially infinite dimensional some form of di- mensionality reduction is essential in these problems. Conventional data reduction techniques for functional data can be categorized into functional principal...

Both classical Forward Selection and the more modern Lasso provide compu- tationally feasible methods for performing variable selection in high dimensional regression problems involving many predictors. We note that although the Lasso is the solution to an optimization problem while Forward Selection is purely al- gorithmic, the two methods turn ou...

In this article, we propose a new method for principal component analysis (PCA), whose main objective is to capture natural "blocking" structures in the variables. Further, the method, beyond selecting different variables for different components, also encourages the loadings of highly correlated variables to have the same magnitude. These two feat...

Regression models to relate a scalar $Y$ to a functional predictor $X(t)$ are becoming increasingly common. Work in this area has concentrated on estimating a coefficient function, $\beta(t)$, with $Y$ related to $X(t)$ through $\int\beta(t)X(t) dt$. Regions where $\beta(t)\ne0$ correspond to places where there is a relationship between $X(t)$ and...

Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to...

The Dantzig selector performs variable selection and model fitting in linear regression. It uses an L<sub>1</sub> penalty to shrink the regression coefficients towards zero, in a similar fashion to the lasso. While both the lasso and Dantzig selector potentially do a good job of selecting the correct variables, they tend to overshrink the final coe...

We propose a new algorithm, DASSO, for fitting the entire coefficient path of the Dantzig selector with a similar computational cost to the least angle regression algorithm that is used to compute the lasso. DASSO efficiently constructs a piecewise linear path through a sequential simplex-like algorithm, which is remarkably similar to the least ang...

he Bass model has been a standard for analyzing and predicting the market penetration of new products. We demonstrate the insights to be gained and predictive performance of functional data analysis (FDA), a new class of nonparametric techniques that has shown impressive results within the statistics community, on the market penetration of 760 cate...

The Lasso is a popular and computationally efficient procedu re for automatically performing both variable selection and coefficient shrinka ge on linear regression mod- els. One limitation of the Lasso is that the same tuning parameter is used for both variable selection and shrinkage. As a result, it typically ends up selecting a model with too m...

A significant problem with most functional data analyses is that of misaligned curves. Without adjustment, even an analysis as simple as estimation of the mean will fail. One common method to synchronize a set of curves involves equating ``landmarks'' such as peaks or troughs. The landmarks method can work well but will fail if marker events can no...

The Bass (1969) model has been a standard for analyzing and predicting the market penetration of new products. The authors
demonstrate the insights to be gained and predictive performance of Functional Data Analysis (FDA), on the market penetration
of 760 categories drawn from 21 products and 70 countries. The authors compare a Functional Regressio...

The Dantzig selector (Candes and Tao, 2007) is a new approach that has been pro-posed for performing variable selection and model fitting on linear regression models. It uses an L 1 penalty to shrink the regression coefficients towards zero, in a similar fashion to the Lasso. While both the Lasso and Dantzig selector potentially do a good job of se...

This paper introduces a health state modeling approach using clustering and Markov analysis to compare short- and long-term outcomes among health care populations. We provide a comparison to more conventional mixed effects regression methods and show that discrete state modeling offers a richer portrait of patient outcomes than the standard univari...

In systems like Escherichia Coli, the abundance of sequence information, gene expression array studies and small scale experiments allows one to reconstruct the regulatory network and to quantify the effects of transcription factors on gene expression. However, this goal can only be achieved if all information sources are used in concert.
Our metho...

We explore different approaches for performing hypothesis tests on the shape of a mean function by developing general methodologies both, for the often assumed, i.i.d. error structure case, as well as for the more general case where the error terms have an arbitrary covariance structure. The procedures work by testing for patterns in the residuals...

Medical researchers interested in temporal, multivariate measurements of complex diseases have re- cently begun developing health state modelswhich divide the space of patient characteristics into medi - cally distinct clusters. The current state of the art in heal th services research uses k-means clustering to form the health states and a first o...

In this article we are interested in modeling the relationship between a scalar, Y , and a functional predictor, X(t). We introduce a highly flexible approach called "Functional Adaptive Model Estimation" (FAME) which extends generalized linear models (GLM), generalized additive models (GAM) and projection pursuit regression (PPR) to handle functio...

The objective of this study was to demonstrate a multivariate health state approach to analyzing complex disease data that allows projection of long-term outcomes using clustering, Markov modeling, and preference weights.
We studied patients hospitalized 30 to 364 days with refractory schizophrenia at 15 Veterans Affairs medical centers.
We conduct...

This document is not intended to explain statistical concepts or give detailed descriptions of JMP output. Therefore, don't worry if you come across an unfamiliar term. If you don't recognize a term like "variance inflation factor" then we probably just haven't gotten that far in class

One of the most difficult problems in cluster analysis is identifying the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. We develop a simple, yet powerful nonparametric method for choosing the number of clusters based on...

We develop a flexible model-based procedure for clustering functional data. The technique can be applied to all types of curve data but is particularly useful when individuals are observed at a sparse set of time points. In addition to producing final cluster assignments, the procedure generates predictions and confidence intervals for missing port...

When using squared error loss, bias and variance and their decomposition of prediction error are well understood and widely used concepts. However, there is no universally accepted definition for other loss functions. Numerous attempts have been made to extend these concepts beyond squared error loss. Most approaches have focused solely on 0-1 loss...

We develop a flexible model-based procedure for clustering functional data. The technique can be applied to all types of curve data but is particularly useful when individuals are observed at a sparse set of time points. In addition to producing final cluster assignments, the procedure generates predictions and confidence intervals for missing port...

One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet powerful non-parametric method for choosing the numb...

We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform lin...

this paper. This method attempts to estimate the principal component curve directly rather than estimating an entire covariance matrix and computing the first eigenvector. This involves estimating 4 Time 0.0 0.2 0.4 0.6 0.8 1.0

The elements of a multivariate dataset are often curves rather than single points. Functional principal components can be used to describe the modes of variation of such curves. If one has complete measurements for each individual curve or, as is more common, one has measurements on a fine grid taken at the same time points for all curves, then man...

We introduce a technique for extending the classical method of Linear Discriminant Analysis to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis (FLDA), is particularly useful when only fragments of the curves are observed. FLDA possesses all of the usual LDA tools...

Data often arrives as curves --- functions sampled at regular times or frequencies. Functional principal components (Ramsay and Silverman, 1997) can be used to describe the modes of variation of these functions. In many situations we do not get complete measurements of the individual curves. For example, growth curves are sampled functions, consist...

A new family of plug-in classification techniques has recently been developed in the statistics and machine learning literature. A plug-in classification technique (PICT) is a method that takes a standard classifier (such as LDA or TREES) and plugs it into an algorithm to produce a new classifier. The standard classifier is known as the base classi...

The bias and variance of a real valued random variable, using squared error loss, are well understood. However because of recent developments in classification techniques it has become desirable to extend these concepts to general random variables and loss functions. The 0-1 (misclassification) loss function with categorical random variables has be...

The bias and variance of a real valued random variable, using squared error loss, are well understood. However because of recent developments in classification techniques it has become desirable to extend these concepts to general random variables and loss functions. The 0-1 (misclassification) loss function with categorical random variables has be...

A new class of plug in classification techniques have recently been developed in the statistics literature. A plug in classification technique (PaCT) is a method that takes a standard classifier (such as LDA or nearest neighbors) and plugs it into an algorithm to produce a new classifier. The standard classifier is known as the Plug in Classifier (...

Abstract The Bass (1969) model has been a standard for analyzing and predicting the market

Forecasting product demand is an important yet challenging planning tool for many indus- tries. It is particularly challenging in industries that feature highly innovative products such as songs or movies in the entertainment industry. The challenge arises from the fact that, since products are highly innovative and thus very distinct, not many com...