Shenghuo Zhu

Shenghuo Zhu
Aibee

PhD

About

171
Publications
33,307
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,351
Citations
Additional affiliations
January 2005 - December 2012
March 2003 - August 2004
Amazon
Position
  • SDE
August 1997 - March 2003
University of Rochester
Position
  • Research Assistant
Education
August 1997 - March 2003
University of Rochester
Field of study
  • Computer Science

Publications

Publications (171)
Article
Full-text available
Learning from large-scale and high-dimensional data still remains a computationally challenging problem, though it has received increasing interest recently. To address this issue, randomized reduction methods have been developed by either reducing the dimensionality or reducing the number of training instances to obtain a small sketch of the origi...
Conference Paper
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals...
Article
Full-text available
We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextua...
Article
In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with...
Article
Full-text available
Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to han...
Article
In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-rec...
Article
Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank k generalized FM of d dimensional input, the previous best known sampling complexity is O[k3d · polylog(kd)] under Gaussian distribution. T...
Article
Full-text available
Recently, online matching problems have attracted much attention due to its emerging applications in internet advertising. Most existing online matching methods have adopted either adversarial or stochastic user arrival assumption, while on both of them significant limitation exists. The adversarial model does not exploit existing knowledge of the...
Preprint
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals...
Preprint
Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$...
Preprint
Full-text available
Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to han...
Preprint
For large scale non-convex stochastic optimization, parallel mini-batch SGD using multiple workers ideally can achieve a linear speed-up with respect to the number of workers compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for communication as more workers are invol...
Conference Paper
Full-text available
Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's obje...
Preprint
Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which...
Preprint
Learning with a {\it convex loss} function has been a dominating paradigm for many years. It remains an interesting question how non-convex loss functions help improve the generalization of learning with broad applicability. In this paper, we study a family of objective functions formed by truncating traditional loss functions, which is applicable...
Preprint
Recently, machine learning becomes important for the cloud computing service. Users of cloud computing can benefit from the sophisticated machine learning models provided by the service. Considering that users can come from different domains with the same problem, an ideal model has to be applicable over multiple domains. In this work, we propose t...
Preprint
Full-text available
Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a $K$-cardinality subset from $N$ candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The pla...
Article
Full-text available
Although Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing an external memory and constructing a novel persistent memory augmented RNN (term as PRNN) model. The PRNN model captur...
Conference Paper
Full-text available
The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4...
Article
Although deep learning models are highly effective for various tasks, such as detection and classification, the high computational cost prohibits the deployment in scenarios where either memory or computational resources are limited. In this paper, we focus on model compression and acceleration of deep models. We model a low bit quantized neural ne...
Article
Full-text available
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stab...
Article
We study the problem of similarity learning and its application to image retrieval with large-scale data. The similarity between pairs of images can be measured by the distances between their high dimensional representations, and the problem of learning the appropriate similarity is often addressed by distance metric learning. However, distance met...
Article
In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main shortcoming of this approach is that it may result in a suboptimal solution due to the subspace removed by the dimension...
Patent
Systems and methods are disclosed for object detection by receiving an image and extracting features therefrom; applying a learning process to determine sub-regions and select predetermined pooling regions; and performing selective max-pooling to choose one or more feature regions without noises.
Article
In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction...
Patent
Systems and methods are disclosed for object detection by receiving an image; segmenting the image; extracting features from the image; and performing a dimension-wise spatial layout selection to pick up dimensions inside a discriminative spatial region for classification.
Patent
Systems and methods for object detection by receiving an image; segmenting the image and identifying candidate bounding boxes which may contain an object; for each candidate bounding box, dividing the box into overlapped small patches, and extracting dense features from the patches; during a training phase, applying a learning process to learn one...
Article
In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low r...
Article
Full-text available
In this paper, we present a novel yet simple homotopy proximal mapping algorithm for compressive sensing. The algorithm adopts a simple proximal mapping for $\ell_1$ norm regularization at each iteration and gradually reduces the regularization parameter before the $\ell_1$ norm. We prove a globally linear convergence for the proposed homotopy prox...
Patent
Full-text available
Systems and methods are disclosed to search for a query image, by detecting local invariant features and local descriptors; retrieving best matching images by quantizing the local descriptors with a vocabulary tree; and reordering retrieved images with results from the vocabulary tree quantization.
Patent
A method for fine-grained image classification on an image includes automatically segmenting one or more objects of interest prior to classification; and combining segmented and original image features before performing final classification.
Article
Random projection has been widely used in data classification. It maps high-dimensional data into a low-dimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are focused on analyzing the classification performance in the low-dimensional space, in this paper, we consider th...
Patent
Full-text available
Systems and methods for metric learning include iteratively determining feature groups of images based on its derivative norm. Corresponding metrics of the feature groups are learned by gradient descent based on an expected loss. The corresponding metrics are combined to provide an intermediate metric matrix as a sparse representation of the images...
Patent
Full-text available
A nearest-neighbor-based distance metric learning process includes applying an exponential-based loss function to provide a smooth objective; and determining an objective and a gradient of both hinge-based and exponential-based loss function in a quadratic time of the number of instances using a computer.
Patent
Full-text available
There are provided a system and method for predicting query execution time in a database system. A cost model determination device determines a cost model of a database query optimizer for the database system. The cost model models costs of queries applied to the database system. A profiling device determines profiling queries for profiling input/o...
Article
Distance metric learning (DML) aims to learn a distance metric better than Euclidean distance. It has been successfully applied to various tasks, e.g., classification, clustering and information retrieval. Many DML algorithms suffer from the over-fitting problem because of a large number of parameters to be determined in DML. In this paper, we expl...
Article
Full-text available
In this work, we study data preconditioning, a well-known and long-existing technique, for boosting the convergence of first-order methods for regularized loss minimization. It is well understood that the condition number of the problem, i.e., the ratio of the Lipschitz constant to the strong convexity modulus, has a harsh effect on the convergence...
Patent
Full-text available
An admission control system for a cloud database includes a machine learning prediction module to estimate a predicted probability for a newly arrived query with a deadline, if admitted into the cloud database, to finish its execution before said deadline, wherein the prediction considers query characteristics and current system conditions. The sys...
Article
Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: docu...
Article
Full-text available
CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix. One limitation with the existing CUR algorithms is that they require an access to the full matrix A for computing U. In this work, we aim to alleviate this limitation. In particular, we assume that besides having...
Article
In this paper, we focus on distance metric learning (DML) for high dimensional data and its application to fine-grained visual categorization. The challenges of high dimensional DML arise in three aspects. First, the high dimensionality leads to a large-scale optimization problem to be solved that is computationally expensive. Second, the high dime...
Article
Virtualization-based multi-tenant database consolidation is an important technique for database-as-a-service (DBaaS) providers to minimize their total cost which is composed of SLA penalty cost, infrastructure cost and action cost. Due to the bursty and diverse tenant workloads, over-provisioning for the peak or under-provisioning for the off-peak...
Article
In \citep{Yangnips13}, the author presented distributed stochastic dual coordinate ascent (DisDCA) algorithms for solving large-scale regularized loss minimization. Extraordinary performances have been observed and reported for the well-motivated updates, as referred to the practical updates, compared to the naive updates. However, no serious analy...
Conference Paper
Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifie...
Patent
Full-text available
Systems and methods are disclosed for determining personal characteristics from images by generating a baseline gender model and an age estimation model using one or more convolutional neural networks (CNNs); capturing correspondences of faces by face tracking, and applying incremental learning to the CNNs and enforcing correspondence constraint su...
Article
Non-negative tensor factorization (NTF) has been successfully used to extract significant characteristics from polyadic data, such as data in social networks. Because these polyadic data have multiple dimensions (e.g., the author, content, and timestamp of a blog post), NTF fits in naturally and extracts data characteristics jointly from different...
Patent
Full-text available
Systems and methods are disclosed for generating super resolution images by building a set of multi-resolution bases from one or more training images; estimating a sparse resolution-invariant representation of an image, and reconstructing one or more missing patches at any resolution level.
Conference Paper
We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. The algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation. Apart from segmenting the object, we can also 'zoom in' on the object, i.e. center it, normalize i...
Article
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O({\kappa}/T) for strongly convex functions, instead of O({\kappa} ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O({\kappa}/T).
Article
AUC is an important performance measure and many algorithms have been devoted to AUC optimization, mostly by minimizing a surrogate convex loss on a training data set. In this work, we focus on one-pass AUC optimization that requires only going through the training data once without storing the entire training dataset, where conventional online lea...
Article
Full-text available
In this manuscript, we analyze the sparse signal recovery (compressive sensing) problem from the perspective of convex optimization by stochastic proximal gradient descent. This view allows us to significantly simplify the recovery analysis of compressive sensing. More importantly, it leads to an efficient optimization algorithm for solving the reg...
Article
Distance metric learning (DML) is an important task that has found applications in many domains. The high computational cost of DML arises from the large number of variables to be determined and the constraint that a distance metric has to be a positive semi-definite (PSD) matrix. Although stochastic gradient descent (SGD) has been successfully app...
Conference Paper
Full-text available
We introduce a new learning-based solution for portable database workload performance prediction. The current state of the art addresses performance prediction for individual, static hardware configurations and thus cannot generalize to new platforms without additional training. In this work, we focus on analytical databases that might be deployed...
Conference Paper
Full-text available
Predicting query execution time is useful in many database management issues including admission control, query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is...
Patent
Full-text available
Systems and methods are disclosed for summarizing multiple documents by generating a model of the documents as a mixture of document clusters, each document in turn having a mixture of sentences, wherein the model simultaneously representing summarization information and document cluster structure; and determining a loss function for evaluating the...
Article
In this paper, we develop SumView, a Web-based review summarization system, to automatically extract the most representative expressions and customer opinions in the reviews on various product features. Different from existing review analysis which makes more efforts on sentiment classification and opinion mining, our system mainly focuses on summa...
Conference Paper
We propose a segmentation algorithm for the purposes of large-scale flower species recognition. Our approach is based on identifying potential object regions at the time of detection. We then apply a Laplacian-based segmentation, which is guided by these initially detected regions. More specifically, we show that 1) recognizing parts of the potenti...
Patent
Full-text available
Systems and methods are disclosed to analyze a social network by generating a data tensor from social networking data; applying a non-negative tensor factorization (NTF) with user prior knowledge and preferences to generate a core tensor and facet matrices; and rendering information to social networking users based on the core tensor and facet matr...
Article
Learning Mahanalobis distance metrics in a high- dimensional feature space is very difficult especially when structural sparsity and low rank are enforced to improve com- putational efficiency in testing phase. This paper addresses both aspects by an ensemble metric learning approach that consists of sparse block diagonal metric ensembling and join...
Article
We study the tail bound of the emperical covariance of multivariate normal distribution. Following the work of (Gittens & Tropp, 2011), we provide a tail bound with a small constant.
Article
In this paper we analyze influence in the blogosphere. Recently, influence analysis has become an increasingly important research topic, as online communities, such as social networks and e-commerce sites, playing a more and more significant role in our daily life. However, so far few studies have succeeded in extracting influence from online commu...
Article
Full-text available
This paper addresses the problem of community detection in networked data that combines link and content analysis. Most existing work combines link and content information by a generative model. There are two major shortcomings with the existing approaches. First, they assume that the probability of creating a link between two nodes is determined o...
Article
The paper presents the resolution-invariant image representation (ЯIIRЯIIR) framework. It applies sparse-coding with multi-resolution codebook to learn resolution-invariant sparse representations of local patches. An input image can be reconstructed to higher resolution at not only discrete integer scales, as that in many existing super-resolution...
Article
Full-text available
We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions. Previous studies on efficient empirical loss minimization assume either a smooth loss function or a strongly convex regularizer, making them unsuitable for non-smooth optimization. We develop a simple yet eff...
Article
We study the online convex optimization problem, in which an online algorithm has to make repeated decisions with convex loss functions and hopes to achieve a small regret. We consider a natural restriction of this problem in which the loss functions have a small deviation, measured by the sum of the distances between every two consecutive loss fun...
Article
Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensi...
Article
Full-text available
In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in \citep{Hazan-2008-extract} when applied it to online convex optimiz...
Conference Paper
Full-text available
In this paper we address the problem of image retrieval from millions of database images. We improve the vocabulary tree based approach by introducing contextual weighting of local features in both descriptor and spatial domains. Specifically, we propose to incorporate efficient statistics of neighbor descriptors both on the vocabulary tree and in...