Adam Kapelner

Adam Kapelner
City University of New York - Queens College | QC CUNY · Department of Mathematics

Ph.D. Statistics, A.M. Statistics

About

56
Publications
13,798
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,965
Citations
Citations since 2016
30 Research Items
1762 Citations
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
Additional affiliations
August 2014 - present
City University of New York - Queens College
Position
  • Professor (Assistant)
September 2009 - May 2014
University of Pennsylvania
Position
  • PhD Student
June 2006 - February 2007
Stanford Medicine
Position
  • Researcher

Publications

Publications (56)
Preprint
We consider the problem of evaluating designs for a two-arm randomized experiment with an incidence (binary) outcome under a nonparametric general response model. Our two main results are that the priori pair matching design of Greevy et al. (2004) is (1) the optimal design as measured by mean squared error among all block designs which includes co...
Article
Full-text available
The publishing industry shows marked evidence of both gender and racial discrimination. A rational explanation for this difference in treatment of both female and Black authors might relate to the taste-based preferences of book consumers, who might be less willing to pay for books by such authors. We ran a randomized experiment to test for the pre...
Article
We present a new experimental design procedure that divides a set of experimental units into two groups in order to minimize error in estimating a treatment effect. One concern is the elimination of large covariate imbalance between the two groups before the experiment begins. Another concern is robustness of the design to misspecification in respo...
Article
We present an optimized rerandomization design procedure for a non-sequential treatment-control experiment. Randomized experiments are the gold standard for finding causal effects in nature. But sometimes random assignments result in unequal partitions of the treatment and control group visibly seen as imbalance in observed covariates. There can ad...
Article
Full-text available
Machine-assisted treatment selection commonly follows one of two paradigms: a fully personalized paradigm which ignores any possible clustering of patients; or a sub-grouping paradigm which ignores personal differences within the identified groups. While both paradigms have shown promising results, each of them suffers from important limitations. I...
Article
We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in sequential randomized trials exploiting some subjects' previous assessed responses. Subjects arrive sequentially and are either randomized or paired to a previously randomized subject and administered the alternate treatment....
Article
Full-text available
We present methodological advances in understanding the effectiveness of personalized medicine models and supply easy-to-use open-source software. Personalized medicine involves the systematic use of individual patient characteristics to determine which treatment option is most likely to result in a better average outcome for the patient. Why is pe...
Preprint
Background Sufficiently accurate predictions of hospital readmissions are necessary for the allocation of scare clinical resources to reduce preventable readmissions. We describe the use of a data-driven approach that relies on machine learning algorithms to predict readmission at the time of discharge. Methods We employ random forests to clinical...
Article
Full-text available
Purpose: Our work introduces a highly accurate, safe, and sufficiently explicable machine-learning (artificial intelligence) model of intraocular lens power (IOL) translating into better post-surgical outcomes for patients with cataracts. We also demonstrate its improved predictive accuracy over previous formulas. Methods: We collected retrospectiv...
Preprint
Full-text available
We present a new experimental design procedure that divides a set of experimental units into two groups in order to minimize error in estimating an additive treatment effect. One concern is minimizing error at the experimental design stage is large covariate imbalance between the two groups. Another concern is robustness of design to misspecificati...
Article
Depression affects one in nine people, but treatment response rates remain low. There is significant potential in the use of computational modeling techniques to predict individual patient responses and thus provide more personalized treatment. Deep learning is a promising computational technique that can be used for differential treatment selectio...
Preprint
We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in sequential randomized trials exploiting some subjects' previous assessed responses. Subjects arrive iteratively and are either randomized or paired to a previously randomized subject and administered the alternate treatment. T...
Preprint
We consider the problem of evaluating designs for a two-arm randomized experiment with the criterion being the power of the randomization test for the one-sided null hypothesis. Our evaluation assumes a response that is linear in one observed covariate, an unobserved component and an additive treatment effect where the only randomness comes from th...
Article
There is a long debate in experimental design between the classic randomization design of Fisher, Yates, Kempthorne, Cochran and those who advocate deterministic assignments based on notions of optimality. In non-sequential trials comparing treatment and control, covariate measurements for each subject are known in advance, and subjects can be divi...
Article
Purpose Improving vocabulary knowledge is important for many adolescents, but there are few evidence-based vocabulary instruction programs available for high school students. The purpose of this article is to describe the iterative development of the DictionarySquared research platform, a web-based vocabulary program that provides individualized vo...
Preprint
Full-text available
Background Depression affects one in nine people, but treatment response rates remain low. There is significant potential in the use of computational modelling techniques to predict individual patient responses and thus provide more personalized treatment. Deep learning is a promising computational technique that can be used for differential treatm...
Preprint
We present an optimized rerandomization design procedure for a non-sequential treatment-control experiment. Randomized experiments are the gold standard for finding causal effects in nature. But sometimes random assignments result in unequal partitions of the treatment and control group, visibly seen as imbalanced observed covariates, increasing es...
Article
Full-text available
We run a randomized experiment to examine gender discrimination in book purchasing with 2,544 subjects on Amazon’s Mechanical Turk. We manipulate author gender and book genre in a factorial design to study consumer preferences for male versus female versus androgynous authorship. Despite previous findings in the literature showing gender discrimina...
Preprint
There is a movement in design of experiments away from the classic randomization put forward by Fisher, Cochran and others to one based on optimization. In fixed-sample trials comparing two groups, measurements of subjects are known in advance and subjects can be divided optimally into two groups based on a criterion of homogeneity or "imbalance" b...
Article
Full-text available
In traditional publishing, female authors’ titles command nearly half (45%) the price of male authors’ and are underrepresented in more prestigious genres, and books are published by publishing houses, which determined whose books get published, subject classification, and retail price. In the last decade, the growth of digital technologies and sal...
Article
Vocabulary knowledge is essential to educational progress. High quality vocabulary instruction requires supportive contextual examples to teach word meaning and proper usage. Identifying such contexts by hand for a large number of words can be difficult. In this work, we take a statistical learning approach to engineer a system that predicts inform...
Article
Objective: In the absence of specific metabolic disorders, accurate predictors of response to ketogenic dietary therapies (KDTs) for treating epilepsy are largely unknown. We hypothesized that specific biochemical parameters would be associated with the effectiveness of KDT in humans with epilepsy. The parameters tested were ?-hydroxybutyrate, ace...
Article
Full-text available
We present a new experimental design procedure that divides a set of experimental units into two groups so that the two groups are balanced on a prespecified set of covariates and being almost as random as complete randomization. Under complete randomization, the difference in covariate balance as measured by the standardized average between treatm...
Article
When measuring Henry's Law constants ($k_H$) using the phase ratio method via headspace gas chromatography (GC), the value of $k_H$ of the compound under investigation is calculated from the ratio of the slope to the intercept of a linear regression of the the inverse GC response versus the ratio of gas to liquid volumes of a series of vials drawn...
Article
Full-text available
We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the...
Article
Full-text available
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar nationa...
Article
Full-text available
Forecasts of prospective criminal behavior have long been an important feature of many criminal justice decisions. There is now substantial evidence that machine learning procedures will classify and forecast at least as well, and typically better, than logistic regression, which has to date dominated conventional practice. However, machine learnin...
Article
We consider the task of discovering gene regulatory networks, which are defined as sets of genes and the corresponding transcription factors which regulate their expression levels. This can be viewed as a variable selection problem, potentially with high dimensionality. Variable selection is especially challenging in high-dimensional settings, wher...
Article
Neoplasms are highly dependent on glucose as their substrate for energy production and are generally not able to catabolize other fuel sources such as ketones and fatty acids. Thus, removing access to glucose has the potential to starve cancer cells and induce apoptosis. Unfortunately, other body tissues are also dependent on glucose for energy und...
Article
Full-text available
In medical practice, when more than one treatment option is viable, there is little systematic use of individual patient characteristics to estimate which treatment option is most likely to result in a better outcome for the patient. We introduce a new framework for using statistical models for personalized medicine. Our framework exploits (1) data...
Article
Full-text available
We incorporate heteroskedasticity into Bayesian Additive Regression Trees (BART) by modeling the log of the error variance parameter as a linear function of prespecified covariates. Under this scheme, the Gibbs sampling procedure for the original sum-of- trees model is easily modified, and the parameters for the variance model are updated via a Met...
Article
We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in fixed sample randomized trials with sequential allocation. Subjects arrive iteratively and are either randomized or paired via a matching criterion to a previously randomized subject and administered the alternate treatment. W...
Article
This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced exp...
Article
Full-text available
We present a new package in R implementing Bayesian Additive Regression Trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the...
Article
Full-text available
The variable selection problem is especially challenging in high dimensional data, where it is difficult to detect subtle individual effects and interactions between factors. Bayesian additive regression trees (BART, Chipman et al., 2010) provides a novel nonparametric exploratory alternative to parametric regression approaches, such as the lasso o...
Article
Full-text available
Dendritic cells (DCs) are important mediators of anti-tumor immune responses. We hypothesized that an in-depth analysis of dendritic cells and their spatial relationships to each other as well as to other immune cells within tumor draining lymph nodes (TDLNs) could provide a better understanding of immune function and dysregulation in cancer. We an...
Article
Full-text available
This article presents Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. Classical partial dependence plots (PDPs) help visualize the average partial relationship between the predicted response and one or more features. In the presence of substantial interaction effects,...
Article
Full-text available
We present a method for incorporating missing data in non-parametric statistical learning without the need for imputation. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with "Missingness Incorporated in Attributes," an approach recently proposed incorporating missingness into decision trees (Twala, 2008). This...
Article
Full-text available
We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in sequential randomized trials. Subjects arrive iteratively and are either randomized or paired via a matching criterion to a previously randomized subject and administered the alternate treatment. We develop estimators for the...
Article
We conduct the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort. We employed about 2,500 workers from Amazon's Mechanical Turk (MTurk), an online labor market, to label medical images. Although given an identical task, we experimentally manipulated how the task was framed. Subjects...
Data
Spectral unmixing of a triple-stained lymph node section by VectraTM. (A) An original RGB image of a part of a tissue section, taken at 200× magnification. (B) Images resulting from unmixing of the spectral signals of each chromogen and counterstain. (C) A reconstructed image with pseudo-colors that allowed a greater distinction of the cell populat...
Data
Proportions of both T cells and B cells were similar in TDLNs and HLNs used. (A) Proportions of T and B cells in HLNs and tumor-invaded TDLNs. (B) Proportions T and B cells in ALN+ and ALN− pairs (p = 0.7 and 0.1, respectively; paired t test). (0.10 MB TIF)
Data
An RGB image of an entire TDLN cross section taken by VectraTM. In this particular example, the whole-section image consists of 125, 200× sub-images. Chromogens used were Vulcan Fast Red (cytokeratin (tumor), red), DAB (CD20(+)- B cells, brown) and Ferangi Blue (CD3(+)-T cells, dark blue). Cellular nuclei were counterstained with hematoxylin (light...
Data
An illustration of L function plots that can be generated from T and B cell data within a tissue section. (A) Interpretations of each plot. (B) An extrapolation of an L function of B cells to another plot that illustrates how much more clustered B cells are compared to the T cells. (0.36 MB TIF)
Article
Full-text available
To date, pathological examination of specimens remains largely qualitative. Quantitative measures of tissue spatial features are generally not captured. To gain additional mechanistic and prognostic insights, a need for quantitative architectural analysis arises in studying immune cell-cancer interactions within the tumor microenvironment and tumor...
Article
Full-text available
Supervised learning can be used to segment/identify regions of interest in images usingboth color and morphological information. A novel object identication algorithm wasdeveloped in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from a recent study published by Kohrt et al. (2005). Thealgorithms...
Article
11059 Background: Clinical decisions in oncology are increasingly individualized and dependent upon accurate assessment of tumor biology, such as hormone receptor status. Imaging techniques are limited by subjective interpretation of staining patterns and limited tissue sampling leading to variability in patient care. We developed a novel imaging a...
Conference Paper
Full-text available
Supervised learning can be used to segment /identify regions of interest in images making use of color and morphological information. A novel object identification algorithm was developed in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from the recent Kohrt study[1] and also shows promise in ot...
Article
Full-text available
Researchers are increasingly using online labor markets such as Amazon's Mechanical Turk (MTurk) as a source of inex-pensive data. One of the most popular tasks is answering surveys. However, without adequate controls, researchers should be concerned that respondents may fill out surveys haphazardly in the unsupervised environment of the Inter-net....
Article
Full-text available

Network

Cited By