## About

171

Publications

33,307

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

7,351

Citations

Introduction

**Skills and Expertise**

Additional affiliations

January 2005 - December 2012

March 2003 - August 2004

August 1997 - March 2003

Education

August 1997 - March 2003

## Publications

Publications (171)

Learning from large-scale and high-dimensional data still remains a computationally challenging problem, though it has received increasing interest recently. To address this issue, randomized reduction methods have been developed by either reducing the dimensionality or reducing the number of training instances to obtain a small sketch of the origi...

Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals...

We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextua...

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with...

Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to han...

In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-rec...

Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank k generalized FM of d dimensional input, the previous best known sampling complexity is O[k3d · polylog(kd)] under Gaussian distribution. T...

Recently, online matching problems have attracted much attention due to its emerging applications in internet advertising. Most existing online matching methods have adopted either adversarial or stochastic user arrival assumption, while on both of them significant limitation exists. The adversarial model does not exploit existing knowledge of the...

Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals...

Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$...

Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to han...

For large scale non-convex stochastic optimization, parallel mini-batch SGD using multiple workers ideally can achieve a linear speed-up with respect to the number of workers compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for communication as more workers are invol...

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's obje...

Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which...

Learning with a {\it convex loss} function has been a dominating paradigm for many years. It remains an interesting question how non-convex loss functions help improve the generalization of learning with broad applicability. In this paper, we study a family of objective functions formed by truncating traditional loss functions, which is applicable...

Recently, machine learning becomes important for the cloud computing service. Users of cloud computing can benefit from the sophisticated machine learning models provided by the service. Considering that users can come from different domains with the same problem, an ideal model has to be applicable over multiple domains. In this work, we propose t...

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a $K$-cardinality subset from $N$ candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The pla...

Although Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing an external memory and constructing a novel persistent memory augmented RNN (term as PRNN) model. The PRNN model captur...

The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4...

Although deep learning models are highly effective for various tasks, such as detection and classification, the high computational cost prohibits the deployment in scenarios where either memory or computational resources are limited. In this paper, we focus on model compression and acceleration of deep models. We model a low bit quantized neural ne...

Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stab...

We study the problem of similarity learning and its application to image
retrieval with large-scale data. The similarity between pairs of images can be
measured by the distances between their high dimensional representations, and
the problem of learning the appropriate similarity is often addressed by
distance metric learning. However, distance met...

In this work, we study distance metric learning (DML) for high dimensional
data. A typical approach for DML with high dimensional data is to perform the
dimensionality reduction first before learning the distance metric. The main
shortcoming of this approach is that it may result in a suboptimal solution due
to the subspace removed by the dimension...

Systems and methods are disclosed for object detection by receiving an image and extracting features therefrom; applying a learning process to determine sub-regions and select predetermined pooling regions; and performing selective max-pooling to choose one or more feature regions without noises.

In this paper, we consider the problem of column subset selection. We present
a novel analysis of the spectral norm reconstruction for a simple randomized
algorithm and establish a new bound that depends explicitly on the sampling
probabilities. The sampling dependent error bound (i) allows us to better
understand the tradeoff in the reconstruction...

Systems and methods are disclosed for object detection by receiving an image; segmenting the image; extracting features from the image; and performing a dimension-wise spatial layout selection to pick up dimensions inside a discriminative spatial region for classification.

Systems and methods for object detection by receiving an image; segmenting the image and identifying candidate bounding boxes which may contain an object; for each candidate bounding box, dividing the box into overlapped small patches, and extracting dense features from the patches; during a training phase, applying a learning process to learn one...

In this paper, we study randomized reduction methods, which reduce
high-dimensional features into low-dimensional space by randomized methods
(e.g., random projection, random hashing), for large-scale high-dimensional
classification. Previous theoretical results on randomized reduction methods
hinge on strong assumptions about the data, e.g., low r...

In this paper, we present a novel yet simple homotopy proximal mapping
algorithm for compressive sensing. The algorithm adopts a simple proximal
mapping for $\ell_1$ norm regularization at each iteration and gradually
reduces the regularization parameter before the $\ell_1$ norm. We prove a
globally linear convergence for the proposed homotopy prox...

Systems and methods are disclosed to search for a query image, by detecting local invariant features and local descriptors; retrieving best matching images by quantizing the local descriptors with a vocabulary tree; and reordering retrieved images with results from the vocabulary tree quantization.

A method for fine-grained image classification on an image includes automatically segmenting one or more objects of interest prior to classification; and combining segmented and original image features before performing final classification.

Random projection has been widely used in data classification. It maps high-dimensional data into a low-dimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are focused on analyzing the classification performance in the low-dimensional space, in this paper, we consider th...

Systems and methods for metric learning include iteratively determining feature groups of images based on its derivative norm. Corresponding metrics of the feature groups are learned by gradient descent based on an expected loss. The corresponding metrics are combined to provide an intermediate metric matrix as a sparse representation of the images...

A nearest-neighbor-based distance metric learning process includes applying an exponential-based loss function to provide a smooth objective; and determining an objective and a gradient of both hinge-based and exponential-based loss function in a quadratic time of the number of instances using a computer.

There are provided a system and method for predicting query execution time in a database system. A cost model determination device determines a cost model of a database query optimizer for the database system. The cost model models costs of queries applied to the database system. A profiling device determines profiling queries for profiling input/o...

Distance metric learning (DML) aims to learn a distance metric better than Euclidean distance. It has been successfully applied to various tasks, e.g., classification, clustering and information retrieval. Many DML algorithms suffer from the over-fitting problem because of a large number of parameters to be determined in DML. In this paper, we expl...

In this work, we study data preconditioning, a well-known and long-existing
technique, for boosting the convergence of first-order methods for regularized
loss minimization. It is well understood that the condition number of the
problem, i.e., the ratio of the Lipschitz constant to the strong convexity
modulus, has a harsh effect on the convergence...

An admission control system for a cloud database includes a machine learning prediction module to estimate a predicted probability for a newly arrived query with a deadline, if admitted into the cloud database, to finish its execution before said deadline, wherein the prediction considers query characteristics and current system conditions. The sys...

Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: docu...

CUR matrix decomposition is a randomized algorithm that can efficiently
compute the low rank approximation for a given rectangle matrix. One limitation
with the existing CUR algorithms is that they require an access to the full
matrix A for computing U. In this work, we aim to alleviate this limitation. In
particular, we assume that besides having...

In this paper, we focus on distance metric learning (DML) for high
dimensional data and its application to fine-grained visual categorization. The
challenges of high dimensional DML arise in three aspects. First, the high
dimensionality leads to a large-scale optimization problem to be solved that is
computationally expensive. Second, the high dime...

Virtualization-based multi-tenant database consolidation is an important technique for database-as-a-service (DBaaS) providers to minimize their total cost which is composed of SLA penalty cost, infrastructure cost and action cost. Due to the bursty and diverse tenant workloads, over-provisioning for the peak or under-provisioning for the off-peak...

In \citep{Yangnips13}, the author presented distributed stochastic dual
coordinate ascent (DisDCA) algorithms for solving large-scale regularized loss
minimization. Extraordinary performances have been observed and reported for
the well-motivated updates, as referred to the practical updates, compared to
the naive updates. However, no serious analy...

Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifie...

Systems and methods are disclosed for determining personal characteristics from images by generating a baseline gender model and an age estimation model using one or more convolutional neural networks (CNNs); capturing correspondences of faces by face tracking, and applying incremental learning to the CNNs and enforcing correspondence constraint su...

Non-negative tensor factorization (NTF) has been successfully used to extract significant characteristics from polyadic data, such as data in social networks. Because these polyadic data have multiple dimensions (e.g., the author, content, and timestamp of a blog post), NTF fits in naturally and extracts data characteristics jointly from different...

Systems and methods are disclosed for generating super resolution images by building a set of multi-resolution bases from one or more training images; estimating a sparse resolution-invariant representation of an image, and reconstructing one or more missing patches at any resolution level.

We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. The algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation. Apart from segmenting the object, we can also 'zoom in' on the object, i.e. center it, normalize i...

With a weighting scheme proportional to t, a traditional stochastic gradient
descent (SGD) algorithm achieves a high probability convergence rate of
O({\kappa}/T) for strongly convex functions, instead of O({\kappa} ln(T)/T). We
also prove that an accelerated SGD algorithm also achieves a rate of
O({\kappa}/T).

AUC is an important performance measure and many algorithms have been devoted
to AUC optimization, mostly by minimizing a surrogate convex loss on a training
data set. In this work, we focus on one-pass AUC optimization that requires
only going through the training data once without storing the entire training
dataset, where conventional online lea...

In this manuscript, we analyze the sparse signal recovery (compressive
sensing) problem from the perspective of convex optimization by stochastic
proximal gradient descent. This view allows us to significantly simplify the
recovery analysis of compressive sensing. More importantly, it leads to an
efficient optimization algorithm for solving the reg...

Distance metric learning (DML) is an important task that has found
applications in many domains. The high computational cost of DML arises
from the large number of variables to be determined and the constraint
that a distance metric has to be a positive semi-definite (PSD) matrix.
Although stochastic gradient descent (SGD) has been successfully app...

We introduce a new learning-based solution for portable database workload performance prediction. The current state of the art addresses performance prediction for individual, static hardware configurations and thus cannot generalize to new platforms without additional training. In this work, we focus on analytical databases that might be deployed...

Predicting query execution time is useful in many database management issues including admission control, query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is...

Systems and methods are disclosed for summarizing multiple documents by generating a model of the documents as a mixture of document clusters, each document in turn having a mixture of sentences, wherein the model simultaneously representing summarization information and document cluster structure; and determining a loss function for evaluating the...

In this paper, we develop SumView, a Web-based review summarization system, to automatically extract the most representative expressions and customer opinions in the reviews on various product features. Different from existing review analysis which makes more efforts on sentiment classification and opinion mining, our system mainly focuses on summa...

We propose a segmentation algorithm for the purposes of large-scale flower species recognition. Our approach is based on identifying potential object regions at the time of detection. We then apply a Laplacian-based segmentation, which is guided by these initially detected regions. More specifically, we show that 1) recognizing parts of the potenti...

Systems and methods are disclosed to analyze a social network by generating a data tensor from social networking data; applying a non-negative tensor factorization (NTF) with user prior knowledge and preferences to generate a core tensor and facet matrices; and rendering information to social networking users based on the core tensor and facet matr...

Learning Mahanalobis distance metrics in a high- dimensional feature space is
very difficult especially when structural sparsity and low rank are enforced to
improve com- putational efficiency in testing phase. This paper addresses both
aspects by an ensemble metric learning approach that consists of sparse block
diagonal metric ensembling and join...

We study the tail bound of the emperical covariance of multivariate normal
distribution. Following the work of (Gittens & Tropp, 2011), we provide a tail
bound with a small constant.

In this paper we analyze influence in the blogosphere. Recently, influence
analysis has become an increasingly important research topic, as online
communities, such as social networks and e-commerce sites, playing a more and
more significant role in our daily life. However, so far few studies have
succeeded in extracting influence from online commu...

This paper addresses the problem of community detection in networked data
that combines link and content analysis. Most existing work combines link and
content information by a generative model. There are two major shortcomings
with the existing approaches. First, they assume that the probability of
creating a link between two nodes is determined o...

The paper presents the resolution-invariant image representation (ЯIIRЯIIR) framework. It applies sparse-coding with multi-resolution codebook to learn resolution-invariant sparse representations of local patches. An input image can be reconstructed to higher resolution at not only discrete integer scales, as that in many existing super-resolution...

We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet eff...

We study the online convex optimization problem, in which an online algorithm has to make repeated decisions with convex loss functions and hopes to achieve a small regret. We consider a natural restriction of this problem in which the loss functions have a small deviation, measured by the sum of the distances between every two consecutive loss fun...

Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensi...

In citep{Hazan-2008-extract}, the authors showed that the regret of online
linear optimization can be bounded by the total variation of the cost vectors.
In this paper, we extend this result to general online convex optimization. We
first analyze the limitations of the algorithm in \citep{Hazan-2008-extract}
when applied it to online convex optimiz...

In this paper we address the problem of image retrieval from millions of database images. We improve the vocabulary tree based approach by introducing contextual weighting of local features in both descriptor and spatial domains. Specifically, we propose to incorporate efficient statistics of neighbor descriptors both on the vocabulary tree and in...