[Show abstract][Hide abstract] ABSTRACT: We develop necessary and sufficient conditions and a novel provably
consistent and efficient algorithm for discovering topics (latent factors) from
observations (documents) that are realized from a probabilistic mixture of
shared latent factors that have certain properties. Our focus is on the class
of topic models in which each shared latent factor contains a novel word that
is unique to that factor, a property that has come to be known as separability.
Our algorithm is based on the key insight that the novel words correspond to
the extreme points of the convex hull formed by the row-vectors of a suitably
normalized word co-occurrence matrix. We leverage this geometric insight to
establish polynomial computation and sample complexity bounds based on a few
isotropic random projections of the rows of the normalized word co-occurrence
matrix. Our proposed random-projections-based algorithm is naturally amenable
to an efficient distributed implementation and is attractive for modern
web-scale distributed data mining applications.
[Show abstract][Hide abstract] ABSTRACT: In recent years baggage screening at airports has included the use of dual-energy X-ray computed tomography (DECT), an advanced technology for non-destructive evaluation. The main challenge remains to reliably find and identify threat objects in the bag from DECT data. This task is particularly hard due to the wide variety of objects, the high clutter, and the presence of metal which causes streaks and shading in the scanner images. Image noise and artifacts are generally much more severe than in medical CT and can lead to splitting of objects and inaccurate object labeling. The conventional approach performs object segmentation and material identification in two decoupled processes. Dual-energy information is typically not used for the segmentation, and object localization is not explicitly used to stabilize the material parameter estimates. We propose a novel learning-based framework for joint segmentation and identification of objects directly from volumetric DECT images, which is robust to streaks, noise and variability due to clutter. We focus on segmenting and identifying a small set of objects of interest with characteristics that are learned from training images, and consider everything else as background. We include data weighting to mitigate metal artifacts and incorporate an object boundary-field to reduce object splitting. The overall formulation is posed as a multi-label discrete optimization problem and solved using an efficient graph-cut algorithm. We test the method on real data and show its potential for producing accurate labels of the objects of interest without splits in the presence of metal and clutter.
[Show abstract][Hide abstract] ABSTRACT: We propose a novel parameterized family of Mixed Membership Mallows Models
(M4) to account for variability in pairwise comparisons generated by a
heterogeneous population of noisy and inconsistent users. M4 models individual
preferences as a user-specific probabilistic mixture of shared latent Mallows
components. Our key algorithmic insight for estimation is to establish a
statistical connection between M4 and topic models by viewing pairwise
comparisons as words, and users as documents. This key insight leads us to
explore Mallows components with a separable structure and leverage recent
advances in separable topic discovery. While separability appears to be overly
restrictive, we nevertheless show that it is an inevitable outcome of a
relatively small number of latent Mallows components in a world of large number
of items. We then develop an algorithm based on robust extreme-point
identification of convex polygons to learn the reference rankings, and is
provably consistent with polynomial sample complexity guarantees. We
demonstrate that our new model is empirically competitive with the current
state-of-the-art approaches in predicting real-world preferences.
[Show abstract][Hide abstract] ABSTRACT: We propose a new model for rank aggregation from pairwise comparisons that
captures both ranking heterogeneity across users and ranking inconsistency for
each user. We establish a formal statistical equivalence between the new model
and topic models. We leverage recent advances in the topic modeling literature
to develop an algorithm that can learn shared latent rankings with provable
statistical and computational efficiency guarantees. The method is also shown
to empirically outperform competing approaches on some semi-synthetic and
[Show abstract][Hide abstract] ABSTRACT: Change detection is one of the most commonly encountered low-level tasks in computer vision and video processing. A plethora of algorithms have been developed to date, yet no widely accepted, realistic, large-scale video dataset exists for benchmarking different methods. Presented here is a unique change detection video dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR). A distinguishing characteristic of this benchmark video dataset is that each frame is meticulously annotated by hand for ground-truth foreground, background, and shadow area boundaries - an effort that goes much beyond a simple binary label denoting the presence of change. This enables objective and precise quantitative comparison and ranking of video-based change detection algorithms. This paper discusses various aspects of the new dataset, quantitative performance metrics used, and comparative results for over two dozen change detection algorithms. It draws important conclusions on solved and remaining issues in change detection, and describes future challenges for the scientific community. The dataset, evaluation tools, and algorithm rankings are available to the public on a website1 and will be updated with feedback from academia and industry in the future.
[Show abstract][Hide abstract] ABSTRACT: For a number of lossy source coding problems it is shown that even if the
usual single-letter sum-rate-distortion expressions may become invalid for
non-infinite distortion functions, they can be approached, to any desired
accuracy, via the usual valid expressions for appropriately truncated finite
versions of the distortion functions.
[Show abstract][Hide abstract] ABSTRACT: Change detection is one of the most important lowlevel tasks in video analytics. In 2012, we introduced the changedetection.net (CDnet) benchmark, a video dataset devoted to the evalaution of change and motion detection approaches. Here, we present the latest release of the CDnet dataset, which includes 22 additional videos (70; 000 pixel-wise annotated frames) spanning 5 new categories that incorporate challenges encountered in many surveillance settings. We describe these categories in detail and provide an overview of the results of more than a dozen methods submitted to the IEEE Change DetectionWorkshop 2014. We highlight strengths and weaknesses of these methods and identify remaining issues in change detection.
[Show abstract][Hide abstract] ABSTRACT: We propose a new structure-preserving dual-energy (SPDE) CT inversion technique for luggage screening, which can mitigate metal artifacts and provide precise object localization. Such artifact reduction can increase material identification accuracy in security applications. Our main objective is formation of enhanced photoelectric and Compton pixel property images from dual-energy X-ray tomographic data. We achieve this aim by incorporating three important elements in a single unified framework. First, we generate our images as the solution of a joint optimization problem, which explicitly models the projection process. Second, we include metal aware data weighting to reduce streaks and metal artifacts. Third, we estimate a regularized joint boundary field and apply it to both the photoelectric and Compton images in order to improve object localization as well as smoothing inside the objects. We evaluate the performance of the method using real dual-energy data. We demonstrate a significant reduction in noise and metal artifacts.
ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
[Show abstract][Hide abstract] ABSTRACT: In the secure two-party sampling problem, two parties wish to generate
outputs with a desired joint distribution via an interactive protocol, while
ensuring that neither party learns more than what can be inferred from only
their own output. For semi-honest parties and information-theoretic privacy
guarantees, it is well-known that if only noiseless communication is available,
then only the "trivial" joint distributions, for which common information
equals mutual information, can be securely sampled. We consider the problem
where the parties may also interact via a given set of general communication
primitives (multi-input/output channels). Our feasibility characterization of
this problem can be stated as a zero-one law: primitives are either complete
(enabling the secure sampling of any distribution) or useless (only enabling
the secure sampling of trivial distributions). Our characterization of the
complete primitives also extends to the more general class of secure two-party
[Show abstract][Hide abstract] ABSTRACT: X-ray Computed Tomography (CT) is an effective nondestructive technology used for security applications. In CT, three-dimensional images of the interior of an object are generated based on its X-ray attenuation. Multi-energy CT can be used to enhance material discrimination. Currently, reliable identification and segmentation of objects from CT data is challenging due to the large range of materials which may appear in baggage and the presence of metal and high clutter. Conventionally reconstructed CT images suffer from metal induced streaks and artifacts which can lead to breaking of objects and inaccurate object labeling. We propose a novel learning-based framework for joint metal artifact reduction and direct object labeling from CT derived data. A material label image is directly estimated from measured effective attenuation images. We include data weighting to mitigate metal artifacts and incorporate an object boundary-field to reduce object splitting. The overall problem is posed as a graph optimization problem and solved using an efficient graphcut algorithm. We test the method on real data and show that it can produce accurate material labels in the presence of metal and clutter.
SPIE Computational Imaging XII, San Francisco, California, USA; 02/2014
[Show abstract][Hide abstract] ABSTRACT: We propose a novel approach for designing kernels for support vector machines
(SVMs) when the class label is linked to the observation through a latent state
and the likelihood function of the observation given the state (the sensing
model) is available. We show that the Bayes-optimum decision boundary is a
hyperplane under a mapping defined by the likelihood function. Combining this
with the maximum margin principle yields kernels for SVMs that leverage
knowledge of the sensing model in an optimal way. We derive the optimum kernel
for the bag-of-words (BoWs) sensing model and demonstrate its superior
performance over other kernels in document and image classification tasks.
These results indicate that such optimum sensing-aware kernel SVMs can match
the performance of rather sophisticated state-of-the-art approaches.
Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing 12/2013; DOI:10.1109/ICASSP.2014.6854140
[Show abstract][Hide abstract] ABSTRACT: In game theory, a trusted mediator acting on behalf of the players can enable
the attainment of correlated equilibria, which may provide better payoffs than
those available from the Nash equilibria alone. We explore the approach of
replacing the trusted mediator with an unconditionally secure sampling protocol
that jointly generates the players' actions. We characterize the joint
distributions that can be securely sampled by malicious players via protocols
using error-free communication. This class of distributions depends on whether
players may speak simultaneously ("cheap talk") or must speak in turn ("polite
talk"). In applying sampling protocols toward attaining correlated equilibria
with rational players, we observe that security against malicious parties may
be much stronger than necessary. We propose the concept of secure sampling by
rational players, and show that many more distributions are feasible given
certain utility functions. However, the payoffs attainable via secure sampling
by malicious players are a dominant subset of the rationally attainable
[Show abstract][Hide abstract] ABSTRACT: We consider a novel problem of endmember detection in hyperspectral imagery where signal of frequency bands are probed sequentially. We propose an adaptive strategy in controlling the sensing order to maximize the normalized solid angle as a robustness measure of the problem geometry. This is based on efficiently identifying pure pixels that are unique to each endmember and exploiting information from a spectral library known in advance though sequential random projections. We present simulations on synthetic datasets to demonstrate the merits of our scheme in reducing the observation cost.
2013 Asilomar Conference on Signals, Systems and Computers; 11/2013
[Show abstract][Hide abstract] ABSTRACT: The simplicial condition and other stronger conditions that imply it have
recently played a central role in developing polynomial time algorithms with
provable asymptotic consistency and sample complexity guarantees for topic
estimation in separable topic models. Of these algorithms, those that rely
solely on the simplicial condition are impractical while the practical ones
need stronger conditions. In this paper, we demonstrate, for the first time,
that the simplicial condition is a fundamental, algorithm-independent,
information-theoretic necessary condition for consistent separable topic
estimation. Furthermore, under solely the simplicial condition, we present a
practical quadratic-complexity algorithm based on random projections which
consistently detects all novel words of all topics using only up to
second-order empirical word moments. This algorithm is amenable to distributed
implementation making it attractive for 'big-data' scenarios involving a
network of large distributed databases.
[Show abstract][Hide abstract] ABSTRACT: The Kinect has primarily been used as a gesture-driven device for motion-based controls. To date, Kinect-based research has predominantly focused on improving tracking and gesture recognition across a wide base of users. In this paper, we propose to use the Kinect for biometrics; rather than accommodating a wide range of users we exploit each user's uniqueness in terms of gestures. Unlike pure biometrics, such as iris scanners, face detectors, and fingerprint recognition which depend on irrevocable biometric data, the Kinect can provide additional revocable gesture information. We propose a dynamic time-warping (DTW) based framework applied to the Kinect's skeletal information for user access control. Our approach is validated in two scenarios: user identification, and user authentication on a dataset of 20 individuals performing 8 unique gestures. We obtain an overall 4.14%, and 1.89% Equal Error Rate (EER) in user identification, and user authentication, respectively, for a gesture and consistently outperform related work on this dataset. Given the natural noise present in the real-time depth sensor this yields promising results.
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 10/2013
[Show abstract][Hide abstract] ABSTRACT: Despite a significant growth in the last few years, the availability of 3D content is still dwarfed by that of its 2D counterpart. In order to close this gap, many 2D-to-3D image and video conversion methods have been proposed. Methods involving human operators have been most successful but also time-consuming and costly. Automatic methods, that typically make use of a deterministic 3D scene model, have not yet achieved the same level of quality for they rely on assumptions that are often violated in practice. In this paper, we propose a new class of methods that are based on the radically different approach of learning the 2D-to-3D conversion from examples. We develop two types of methods. The first is based on learning a point mapping from local image/video attributes, such as color, spatial position, and, in the case of video, motion at each pixel, to scene-depth at that pixel using a regression type idea. The second method is based on globally estimating the entire depth map of a query image directly from a repository of 3D images (image+depth pairs or stereopairs) using a nearest-neighbor regression type idea. We demonstrate both the efficacy and the computational efficiency of our methods on numerous 2D images and discuss their drawbacks and benefits. Although far from perfect, our results demonstrate that repositories of 3D content can be used for effective 2D-to-3D image conversion. An extension to video is immediate by enforcing temporal continuity of computed depth maps.
[Show abstract][Hide abstract] ABSTRACT: BIOMETRICS are an important and widely used class of methods for identity
verification and access control. Biometrics are attractive because they are
inherent properties of an individual. They need not be remembered like
passwords, and are not easily lost or forged like identifying documents. At the
same time, bio- metrics are fundamentally noisy and irreplaceable. There are
always slight variations among the measurements of a given biometric, and,
unlike passwords or identification numbers, biometrics are derived from
physical characteristics that cannot easily be changed. The proliferation of
biometric usage raises critical privacy and security concerns that, due to the
noisy nature of biometrics, cannot be addressed using standard cryptographic
methods. In this article we present an overview of "secure biometrics", also
referred to as "biometric template protection", an emerging class of methods
that address these concerns.
[Show abstract][Hide abstract] ABSTRACT: We propose a general framework for fast and accurate recognition of actions in video using empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed from video to provide a localized description of the action, and subsequently aggregated in an empirical covariance matrix to compactly represent the action. Two supervised learning methods for action recognition are developed using feature covariance matrices. Common to both methods is the transformation of the classification problem in the closed convex cone of covariance matrices into an equivalent problem in the vector space of symmetric matrices via the matrix logarithm. The first method applies nearestneighbor classification using a suitable Riemannian metric for covariance matrices. The second method approximates the logarithm of a query covariance matrix by a sparse linear combination of the logarithms of training covariance matrices. The action label is then determined from the sparse coefficients. Both methods achieve state-of-the-art classification performance on several datasets, and are robust to action variability, viewpoint changes, and low object resolution. The proposed framework is conceptually simple and has low storage and computational requirements making it attractive for real-time implementation.
[Show abstract][Hide abstract] ABSTRACT: We present algorithms for topic modeling based on the geometry of
cross-document word-frequency patterns. This perspective gains
significance under the so called separability condition. This is a
condition on existence of novel-words that are unique to each topic. We
present a suite of highly efficient algorithms based on data-dependent
and random projections of word-frequency patterns to identify novel
words and associated topics. We will also discuss the statistical
guarantees of the data-dependent projections method based on two mild
assumptions on the prior density of topic document matrix. Our key
insight here is that the maximum and minimum values of cross-document
frequency patterns projected along any direction are associated with
novel words. While our sample complexity bounds for topic recovery are
similar to the state-of-art, the computational complexity of our random
projection scheme scales linearly with the number of documents and the
number of words per document. We present several experiments on
synthetic and real-world datasets to demonstrate qualitative and
quantitative merits of our scheme.