[Show abstract][Hide abstract] ABSTRACT: We develop necessary and sufficient conditions and a novel provably
consistent and efficient algorithm for discovering topics (latent factors) from
observations (documents) that are realized from a probabilistic mixture of
shared latent factors that have certain properties. Our focus is on the class
of topic models in which each shared latent factor contains a novel word that
is unique to that factor, a property that has come to be known as separability.
Our algorithm is based on the key insight that the novel words correspond to
the extreme points of the convex hull formed by the row-vectors of a suitably
normalized word co-occurrence matrix. We leverage this geometric insight to
establish polynomial computation and sample complexity bounds based on a few
isotropic random projections of the rows of the normalized word co-occurrence
matrix. Our proposed random-projections-based algorithm is naturally amenable
to an efficient distributed implementation and is attractive for modern
web-scale distributed data mining applications.
[Show abstract][Hide abstract] ABSTRACT: In recent years baggage screening at airports has included the use of dual-energy X-ray computed tomography (DECT), an advanced technology for non-destructive evaluation. The main challenge remains to reliably find and identify threat objects in the bag from DECT data. This task is particularly hard due to the wide variety of objects, the high clutter, and the presence of metal which causes streaks and shading in the scanner images. Image noise and artifacts are generally much more severe than in medical CT and can lead to splitting of objects and inaccurate object labeling. The conventional approach performs object segmentation and material identification in two decoupled processes. Dual-energy information is typically not used for the segmentation, and object localization is not explicitly used to stabilize the material parameter estimates. We propose a novel learning-based framework for joint segmentation and identification of objects directly from volumetric DECT images, which is robust to streaks, noise and variability due to clutter. We focus on segmenting and identifying a small set of objects of interest with characteristics that are learned from training images, and consider everything else as background. We include data weighting to mitigate metal artifacts and incorporate an object boundary-field to reduce object splitting. The overall formulation is posed as a multi-label discrete optimization problem and solved using an efficient graph-cut algorithm. We test the method on real data and show its potential for producing accurate labels of the objects of interest without splits in the presence of metal and clutter.
No preview · Article · Jul 2015 · IEEE Transactions on Image Processing
[Show abstract][Hide abstract] ABSTRACT: We propose a novel parameterized family of Mixed Membership Mallows Models
(M4) to account for variability in pairwise comparisons generated by a
heterogeneous population of noisy and inconsistent users. M4 models individual
preferences as a user-specific probabilistic mixture of shared latent Mallows
components. Our key algorithmic insight for estimation is to establish a
statistical connection between M4 and topic models by viewing pairwise
comparisons as words, and users as documents. This key insight leads us to
explore Mallows components with a separable structure and leverage recent
advances in separable topic discovery. While separability appears to be overly
restrictive, we nevertheless show that it is an inevitable outcome of a
relatively small number of latent Mallows components in a world of large number
of items. We then develop an algorithm based on robust extreme-point
identification of convex polygons to learn the reference rankings, and is
provably consistent with polynomial sample complexity guarantees. We
demonstrate that our new model is empirically competitive with the current
state-of-the-art approaches in predicting real-world preferences.
[Show abstract][Hide abstract] ABSTRACT: User authentication based on biometrics such as fingerprint, iris, face, speech or gait has been around for many years. Recently, intentional user gestures have been shown to be a promising modality for user authentication. However, it is unclear how much of the performance can be attributed to pure biometric information that a user has no control over, such as individual limb lengths, and how much to the gesture dynamics, that a user can fully control. A related question is: How easy is it to copy these dynamics? In this paper, we propose a framework to decompose a gesture into three components: initial posture, limb proportions, and gesture dynamics. We then study the impact of each component and various component combinations on the performance of gesture-based user authentication using a dataset of 36 users performing 3 gestures of varying complexity. We also study spoof attacks using the same dataset and show, somewhat surprisingly, that amateurs are unable to copy gestures with sufficient accuracy so as to significantly degrade the overall authentication performance even when they are trained on users that they are closest to. While training certainly improves an attacker's ability to copy gesture dynamics, it seems that the unique limb proportions (which cannot be altered) and the initial posture (which amateurs attackers fail to pay attention to), more than make up for the loss due to compromised dynamics (which can always be renewed).
[Show abstract][Hide abstract] ABSTRACT: We propose a new model for rank aggregation from pairwise comparisons that
captures both ranking heterogeneity across users and ranking inconsistency for
each user. We establish a formal statistical equivalence between the new model
and topic models. We leverage recent advances in the topic modeling literature
to develop an algorithm that can learn shared latent rankings with provable
statistical and computational efficiency guarantees. The method is also shown
to empirically outperform competing approaches on some semi-synthetic and
[Show abstract][Hide abstract] ABSTRACT: In the secure two-party computation problem, two parties wish to compute a (possibly randomized) function of their inputs via an interactive protocol, while ensuring that neither party learns more than what can be inferred from only their own input and output. For semi-honest parties and information-theoretic security guarantees, it is well-known that, if only noise-less communication is available, only a limited set of functions can be securely computed; however, if interaction is also allowed over general communication primitives (multi-input/output channels), there are 'complete' primitives that enable any function to be securely computed. The general set of complete primitives was characterized recently by Maji, Prabhakaran, and Rosulek leveraging an earlier specialized characterization by Kilian. Our contribution in this paper is a simple, self-contained, alternative derivation using elementary information-theoretic tools.
[Show abstract][Hide abstract] ABSTRACT: Although traditionally used as a gesture recognition device, the Kinect has been recently leveraged for user entry control. In this context, a user admission decision is typically based on biometrics such as face, speech, gait and gestures. Despite being a relatively new biometric, gestures have been shown to be a promising authentication modality. These results have been achieved using a single Kinect camera. This paper aims to investigate the potential performance and robustness gains in gesture-based user authentication using multiple Kinects. We study the impact of multiple viewpoints on a dataset of 40 users that contains notable degradations from user memory and personal effects (multiple types of bags and outerwear). We found that two additional viewpoints can provide as much as 26-43% average relative improvement in the Equal Error Rate (EER) for user authentication, and as much as 16-68% average relative improvement in the Correct Classification Error (CCE) compared to using a single centered Kinect camera.
[Show abstract][Hide abstract] ABSTRACT: Change detection is one of the most commonly encountered low-level tasks in computer vision and video processing. A plethora of algorithms have been developed to date, yet no widely accepted, realistic, large-scale video dataset exists for benchmarking different methods. Presented here is a unique change detection video dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR). A distinguishing characteristic of this benchmark video dataset is that each frame is meticulously annotated by hand for ground-truth foreground, background, and shadow area boundaries - an effort that goes much beyond a simple binary label denoting the presence of change. This enables objective and precise quantitative comparison and ranking of video-based change detection algorithms. This paper discusses various aspects of the new dataset, quantitative performance metrics used, and comparative results for over two dozen change detection algorithms. It draws important conclusions on solved and remaining issues in change detection, and describes future challenges for the scientific community. The dataset, evaluation tools, and algorithm rankings are available to the public on a website1 and will be updated with feedback from academia and industry in the future.
No preview · Article · Aug 2014 · IEEE Transactions on Image Processing
[Show abstract][Hide abstract] ABSTRACT: For a number of lossy source coding problems it is shown that even if the
usual single-letter sum-rate-distortion expressions may become invalid for
non-infinite distortion functions, they can be approached, to any desired
accuracy, via the usual valid expressions for appropriately truncated finite
versions of the distortion functions.
[Show abstract][Hide abstract] ABSTRACT: Since its release, the Kinect has been successfully used in gesture recognition. Recent work has extended Kinect's use towards biometric user authentication based on face, speech, gait, and gestures. Our work expands on the last of these modalities - gestures, which have yielded promising authentication results in prior work. This paper aims to gain insight into how authentication methods that are based on silhouette features compare against those that are based on skeletal features in terms of trade-offs between authentication performance and robustness against some real-world degradations. On a dataset of 40 users that contains two types of degradations namely, user-memory and personal-effects (heavy coats, bags, etc.), we found that for user-defined gestures, skeletal features outperform silhouettes on average by 4.89% in terms of the Equal Error Rate (EER).
[Show abstract][Hide abstract] ABSTRACT: Change detection is one of the most important lowlevel tasks in video analytics. In 2012, we introduced the changedetection.net (CDnet) benchmark, a video dataset devoted to the evalaution of change and motion detection approaches. Here, we present the latest release of the CDnet dataset, which includes 22 additional videos (70; 000 pixel-wise annotated frames) spanning 5 new categories that incorporate challenges encountered in many surveillance settings. We describe these categories in detail and provide an overview of the results of more than a dozen methods submitted to the IEEE Change DetectionWorkshop 2014. We highlight strengths and weaknesses of these methods and identify remaining issues in change detection.
[Show abstract][Hide abstract] ABSTRACT: We propose a new structure-preserving dual-energy (SPDE) CT inversion technique for luggage screening, which can mitigate metal artifacts and provide precise object localization. Such artifact reduction can increase material identification accuracy in security applications. Our main objective is formation of enhanced photoelectric and Compton pixel property images from dual-energy X-ray tomographic data. We achieve this aim by incorporating three important elements in a single unified framework. First, we generate our images as the solution of a joint optimization problem, which explicitly models the projection process. Second, we include metal aware data weighting to reduce streaks and metal artifacts. Third, we estimate a regularized joint boundary field and apply it to both the photoelectric and Compton images in order to improve object localization as well as smoothing inside the objects. We evaluate the performance of the method using real dual-energy data. We demonstrate a significant reduction in noise and metal artifacts.
[Show abstract][Hide abstract] ABSTRACT: In the secure two-party sampling problem, two parties wish to generate
outputs with a desired joint distribution via an interactive protocol, while
ensuring that neither party learns more than what can be inferred from only
their own output. For semi-honest parties and information-theoretic privacy
guarantees, it is well-known that if only noiseless communication is available,
then only the "trivial" joint distributions, for which common information
equals mutual information, can be securely sampled. We consider the problem
where the parties may also interact via a given set of general communication
primitives (multi-input/output channels). Our feasibility characterization of
this problem can be stated as a zero-one law: primitives are either complete
(enabling the secure sampling of any distribution) or useless (only enabling
the secure sampling of trivial distributions). Our characterization of the
complete primitives also extends to the more general class of secure two-party
[Show abstract][Hide abstract] ABSTRACT: X-ray Computed Tomography (CT) is an effective nondestructive technology used for security applications. In CT, three-dimensional images of the interior of an object are generated based on its X-ray attenuation. Multi-energy CT can be used to enhance material discrimination. Currently, reliable identification and segmentation of objects from CT data is challenging due to the large range of materials which may appear in baggage and the presence of metal and high clutter. Conventionally reconstructed CT images suffer from metal induced streaks and artifacts which can lead to breaking of objects and inaccurate object labeling. We propose a novel learning-based framework for joint metal artifact reduction and direct object labeling from CT derived data. A material label image is directly estimated from measured effective attenuation images. We include data weighting to mitigate metal artifacts and incorporate an object boundary-field to reduce object splitting. The overall problem is posed as a graph optimization problem and solved using an efficient graphcut algorithm. We test the method on real data and show that it can produce accurate material labels in the presence of metal and clutter.
[Show abstract][Hide abstract] ABSTRACT: We propose a novel approach for designing kernels for support vector machines
(SVMs) when the class label is linked to the observation through a latent state
and the likelihood function of the observation given the state (the sensing
model) is available. We show that the Bayes-optimum decision boundary is a
hyperplane under a mapping defined by the likelihood function. Combining this
with the maximum margin principle yields kernels for SVMs that leverage
knowledge of the sensing model in an optimal way. We derive the optimum kernel
for the bag-of-words (BoWs) sensing model and demonstrate its superior
performance over other kernels in document and image classification tasks.
These results indicate that such optimum sensing-aware kernel SVMs can match
the performance of rather sophisticated state-of-the-art approaches.
Full-text · Article · Dec 2013 · Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
[Show abstract][Hide abstract] ABSTRACT: In game theory, a trusted mediator acting on behalf of the players can enable
the attainment of correlated equilibria, which may provide better payoffs than
those available from the Nash equilibria alone. We explore the approach of
replacing the trusted mediator with an unconditionally secure sampling protocol
that jointly generates the players' actions. We characterize the joint
distributions that can be securely sampled by malicious players via protocols
using error-free communication. This class of distributions depends on whether
players may speak simultaneously ("cheap talk") or must speak in turn ("polite
talk"). In applying sampling protocols toward attaining correlated equilibria
with rational players, we observe that security against malicious parties may
be much stronger than necessary. We propose the concept of secure sampling by
rational players, and show that many more distributions are feasible given
certain utility functions. However, the payoffs attainable via secure sampling
by malicious players are a dominant subset of the rationally attainable
[Show abstract][Hide abstract] ABSTRACT: We consider a novel problem of endmember detection in hyperspectral imagery where signal of frequency bands are probed sequentially. We propose an adaptive strategy in controlling the sensing order to maximize the normalized solid angle as a robustness measure of the problem geometry. This is based on efficiently identifying pure pixels that are unique to each endmember and exploiting information from a spectral library known in advance though sequential random projections. We present simulations on synthetic datasets to demonstrate the merits of our scheme in reducing the observation cost.
[Show abstract][Hide abstract] ABSTRACT: The simplicial condition and other stronger conditions that imply it have
recently played a central role in developing polynomial time algorithms with
provable asymptotic consistency and sample complexity guarantees for topic
estimation in separable topic models. Of these algorithms, those that rely
solely on the simplicial condition are impractical while the practical ones
need stronger conditions. In this paper, we demonstrate, for the first time,
that the simplicial condition is a fundamental, algorithm-independent,
information-theoretic necessary condition for consistent separable topic
estimation. Furthermore, under solely the simplicial condition, we present a
practical quadratic-complexity algorithm based on random projections which
consistently detects all novel words of all topics using only up to
second-order empirical word moments. This algorithm is amenable to distributed
implementation making it attractive for 'big-data' scenarios involving a
network of large distributed databases.
[Show abstract][Hide abstract] ABSTRACT: The Kinect has primarily been used as a gesture-driven device for motion-based controls. To date, Kinect-based research has predominantly focused on improving tracking and gesture recognition across a wide base of users. In this paper, we propose to use the Kinect for biometrics; rather than accommodating a wide range of users we exploit each user's uniqueness in terms of gestures. Unlike pure biometrics, such as iris scanners, face detectors, and fingerprint recognition which depend on irrevocable biometric data, the Kinect can provide additional revocable gesture information. We propose a dynamic time-warping (DTW) based framework applied to the Kinect's skeletal information for user access control. Our approach is validated in two scenarios: user identification, and user authentication on a dataset of 20 individuals performing 8 unique gestures. We obtain an overall 4.14%, and 1.89% Equal Error Rate (EER) in user identification, and user authentication, respectively, for a gesture and consistently outperform related work on this dataset. Given the natural noise present in the real-time depth sensor this yields promising results.