Houtao Deng

Houtao Deng
Instacart · Machine learning

PhD

About

20
Publications
17,866
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,309
Citations
Citations since 2017
8 Research Items
1985 Citations
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300

Publications

Publications (20)
Preprint
Full-text available
We consider a Dynamic Workforce Acquisition (DWA) problem for crowdsourced last-mile delivery platforms that need to match supply (the number of available workers) with demand (the number of requested deliveries). This need arises due to the fact that the initial number of scheduled workers does not always match the number of requested deliveries....
Preprint
Full-text available
Demand variance can result in a mismatch between planned supply and actual demand. Demand shaping strategies such as pricing can be used to reduce the imbalance between supply and demand. In this work, we propose to consider the demand shaping factor in forecasting. We present a method to reallocate the historical elastic demand to reduce variance,...
Article
Full-text available
Traditional control charts assume a baseline parametric model, against which new observations are compared in order to identify significant departures from the baseline model. To monitor a process without a baseline model, real-time contrasts (RTC) control charts were recently proposed to monitor classification errors when seperarting new observati...
Article
Full-text available
Phaeosphaeria leaf spot (PLS) is considered one of the major diseases that threaten the stability of maize production in tropical and subtropical African regions. The objective of the present study was to investigate the use of hyperspectral data in detecting the early stage of PLS in tropical maize. Field data were collected from healthy and the e...
Article
Quality control of multivariate processes has been extensively studied in the past decades; however, fundamental challenges still remain due to the complexity and the decision-making challenges that require not only sensitive fault detection but also identification of the truly out-of-control variables. In existing approaches, fault detection and d...
Article
The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) plays an important role in studying early brain development in health and disease. In the isointense stage (approximately 6-8 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MR images, making the tis...
Article
Full-text available
Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from a tree ensemble, and calculates frequent variable interactions. An rule-based learner, referred to as the simp...
Article
Associative classifiers have been proposed to achieve an accurate model with each individual rule being interpretable. However, existing associative classifiers often consist of a large number of rules and, thus, can be difficult to interpret. We show that associative classifiers consisting of an ordered rule set can be represented as a tree model....
Article
A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in data sets with many features and a small number of labeled instances in a variety of domains (e.g., bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit impo...
Article
The regularized random forest (RRF) was recently proposed for feature selection by building only one ensemble. In RRF the features are evaluated on a part of the training data at each tree node. We derive an upper bound for the number of distinct Gini information gain values in a node, and show that many features can share the same information gain...
Article
Full-text available
Brain function is the result of interneuron signal transmission controlled by the fundamental biochemistry of each neuron. The biochemical content of a neuron is in turn determined by spatiotemporal gene expression and regulation encoded into the genomic regulatory networks. It is thus of particular interests to elucidate the relationship between g...
Article
We propose a tree ensemble method, referred to as time series forest (TSF), for time series classification. TSF employs a combination of the entropy gain and a distance measure, referred to as the Entrance (entropy and distance) gain, for evaluating the splits. Experimental studies show that the Entrance gain criterion improves the accuracy of TSF....
Article
Full-text available
Random Forest (RF) is a powerful supervised learner and has been popularly used in many applications such as bioinformatics. In this work we propose the guided random forest (GRF) for feature selection. Similar to a feature selection method called guided regularized random forest (GRRF), GRF is built using the importance scores from an ordinary RF....
Article
Full-text available
We propose a tree regularization framework, which enables many tree models to perform feature selection efficiently. The key idea of the regularization framework is to penalize selecting a new feature for splitting when its gain (e.g. information gain) is similar to the features used in previous splits. The regularization framework is applied on ra...
Article
Full-text available
Monitoring real-time data steams is an important learning task in numerous disciplines. Traditional process monitoring techniques are challenged by increasingly complex, high-dimensional data, mixed categorical and numerical variables, non-Gaussian distributions, non-linear relationships, etc. A new monitoring method based on realtime contrasts (RT...
Chapter
Full-text available
Learning Markov Blankets is important for classification and regression, causal discovery, and Bayesian network learning. We present an argument that ensemble masking measures can provide an approximate Markov Blanket. Consequently, an ensemble feature selection method can be used to learnMarkov Blankets for either discrete or continuous networks (...
Conference Paper
Full-text available
Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one...

Network

Cited By