Prakash Ishwar

Boston University, Boston, Massachusetts, United States

Are you Prakash Ishwar?

Claim your profile

Publications (96)99.87 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Change detection is one of the most commonly encountered low-level tasks in computer vision and video processing. A plethora of algorithms have been developed to date, yet no widely accepted, realistic, large-scale video dataset exists for benchmarking different methods. Presented here is a unique change detection video dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR). A distinguishing characteristic of this benchmark video dataset is that each frame is meticulously annotated by hand for ground-truth foreground, background, and shadow area boundaries - an effort that goes much beyond a simple binary label denoting the presence of change. This enables objective and precise quantitative comparison and ranking of video-based change detection algorithms. This paper discusses various aspects of the new dataset, quantitative performance metrics used, and comparative results for over two dozen change detection algorithms. It draws important conclusions on solved and remaining issues in change detection, and describes future challenges for the scientific community. The dataset, evaluation tools, and algorithm rankings are available to the public on a website1 and will be updated with feedback from academia and industry in the future.
    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. 08/2014;
  • Source
    Ye Wang, Prakash Ishwar, Shantanu Rane
    [Show abstract] [Hide abstract]
    ABSTRACT: In the secure two-party sampling problem, two parties wish to generate outputs with a desired joint distribution via an interactive protocol, while ensuring that neither party learns more than what can be inferred from only their own output. For semi-honest parties and information-theoretic privacy guarantees, it is well-known that if only noiseless communication is available, then only the "trivial" joint distributions, for which common information equals mutual information, can be securely sampled. We consider the problem where the parties may also interact via a given set of general communication primitives (multi-input/output channels). Our feasibility characterization of this problem can be stated as a zero-one law: primitives are either complete (enabling the secure sampling of any distribution) or useless (only enabling the secure sampling of trivial distributions). Our characterization of the complete primitives also extends to the more general class of secure two-party computation problems.
    02/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: X-ray Computed Tomography (CT) is an effective nondestructive technology used for security applications. In CT, three-dimensional images of the interior of an object are generated based on its X-ray attenuation. Multi-energy CT can be used to enhance material discrimination. Currently, reliable identification and segmentation of objects from CT data is challenging due to the large range of materials which may appear in baggage and the presence of metal and high clutter. Conventionally reconstructed CT images suffer from metal induced streaks and artifacts which can lead to breaking of objects and inaccurate object labeling. We propose a novel learning-based framework for joint metal artifact reduction and direct object labeling from CT derived data. A material label image is directly estimated from measured effective attenuation images. We include data weighting to mitigate metal artifacts and incorporate an object boundary-field to reduce object splitting. The overall problem is posed as a graph optimization problem and solved using an efficient graphcut algorithm. We test the method on real data and show that it can produce accurate material labels in the presence of metal and clutter.
    SPIE Computational Imaging XII, San Francisco, California, USA; 02/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for designing kernels for support vector machines (SVMs) when the class label is linked to the observation through a latent state and the likelihood function of the observation given the state (the sensing model) is available. We show that the Bayes-optimum decision boundary is a hyperplane under a mapping defined by the likelihood function. Combining this with the maximum margin principle yields kernels for SVMs that leverage knowledge of the sensing model in an optimal way. We derive the optimum kernel for the bag-of-words (BoWs) sensing model and demonstrate its superior performance over other kernels in document and image classification tasks. These results indicate that such optimum sensing-aware kernel SVMs can match the performance of rather sophisticated state-of-the-art approaches.
    12/2013;
  • Source
    Ye Wang, Shantanu Rane, Prakash Ishwar
    [Show abstract] [Hide abstract]
    ABSTRACT: In game theory, a trusted mediator acting on behalf of the players can enable the attainment of correlated equilibria, which may provide better payoffs than those available from the Nash equilibria alone. We explore the approach of replacing the trusted mediator with an unconditionally secure sampling protocol that jointly generates the players' actions. We characterize the joint distributions that can be securely sampled by malicious players via protocols using error-free communication. This class of distributions depends on whether players may speak simultaneously ("cheap talk") or must speak in turn ("polite talk"). In applying sampling protocols toward attaining correlated equilibria with rational players, we observe that security against malicious parties may be much stronger than necessary. We propose the concept of secure sampling by rational players, and show that many more distributions are feasible given certain utility functions. However, the payoffs attainable via secure sampling by malicious players are a dominant subset of the rationally attainable payoffs.
    11/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for 'big-data' scenarios involving a network of large distributed databases.
    10/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite a significant growth in the last few years, the availability of 3D content is still dwarfed by that of its 2D counterpart. In order to close this gap, many 2D-to-3D image and video conversion methods have been proposed. Methods involving human operators have been most successful but also time-consuming and costly. Automatic methods, that typically make use of a deterministic 3D scene model, have not yet achieved the same level of quality for they rely on assumptions that are often violated in practice. In this paper, we propose a new class of methods that are based on the radically different approach of learning the 2D-to-3D conversion from examples. We develop two types of methods. The first is based on learning a point mapping from local image/video attributes, such as color, spatial position, and, in the case of video, motion at each pixel, to scene-depth at that pixel using a regression type idea. The second method is based on globally estimating the entire depth map of a query image directly from a repository of 3D images (image+depth pairs or stereopairs) using a nearest-neighbor regression type idea. We demonstrate both the efficacy and the computational efficiency of our methods on numerous 2D images and discuss their drawbacks and benefits. Although far from perfect, our results demonstrate that repositories of 3D content can be used for effective 2D-to-3D image conversion. An extension to video is immediate by enforcing temporal continuity of computed depth maps.
    IEEE Transactions on Image Processing 06/2013; · 3.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BIOMETRICS are an important and widely used class of methods for identity verification and access control. Biometrics are attractive because they are inherent properties of an individual. They need not be remembered like passwords, and are not easily lost or forged like identifying documents. At the same time, bio- metrics are fundamentally noisy and irreplaceable. There are always slight variations among the measurements of a given biometric, and, unlike passwords or identification numbers, biometrics are derived from physical characteristics that cannot easily be changed. The proliferation of biometric usage raises critical privacy and security concerns that, due to the noisy nature of biometrics, cannot be addressed using standard cryptographic methods. In this article we present an overview of "secure biometrics", also referred to as "biometric template protection", an emerging class of methods that address these concerns.
    IEEE Signal Processing Magazine 05/2013; · 3.37 Impact Factor
  • K Guo, P Ishwar, J Konrad
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a general framework for fast and accurate recognition of actions in video using empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed from video to provide a localized description of the action, and subsequently aggregated in an empirical covariance matrix to compactly represent the action. Two supervised learning methods for action recognition are developed using feature covariance matrices. Common to both methods is the transformation of the classification problem in the closed convex cone of covariance matrices into an equivalent problem in the vector space of symmetric matrices via the matrix logarithm. The first method applies nearestneighbor classification using a suitable Riemannian metric for covariance matrices. The second method approximates the logarithm of a query covariance matrix by a sparse linear combination of the logarithms of training covariance matrices. The action label is then determined from the sparse coefficients. Both methods achieve state-of-the-art classification performance on several datasets, and are robust to action variability, viewpoint changes, and low object resolution. The proposed framework is conceptually simple and has low storage and computational requirements making it attractive for real-time implementation.
    IEEE Transactions on Image Processing 03/2013; · 3.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.
    03/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster than the number of labeled training samples. We show that the Bayes error, namely the minimum attainable error probability with complete distributional knowledge and equally likely classes, can be arbitrarily close to zero and yet the limiting minimax error probability of every supervised learning algorithm is no better than a random coin toss. In contrast to related studies where the classification difficulty (Bayes error) is made to vanish, we hold it constant when taking high-dimensional limits. In contrast to VC-dimension based minimax lower bounds that consider the worst case error probability over all distributions that have a fixed Bayes error, our worst case is over the family of Gaussian distributions with constant Bayes error. We also show that a nontrivial asymptotic minimax error probability can only be attained for parametric subsets of zero measure (in a suitable measure space). These results expose the fundamental importance of prior knowledge and suggest that unless we impose strong structural constraints, such as sparsity, on the parametric space, supervised learning may be ineffective in high dimensional small sample settings.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.
    Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on 01/2013; · 4.63 Impact Factor
  • J. Wu, J. Konrad, P. Ishwar
    [Show abstract] [Hide abstract]
    ABSTRACT: The Kinect has primarily been used as a gesture-driven device for motion-based controls. To date, Kinect-based research has predominantly focused on improving tracking and gesture recognition across a wide base of users. In this paper, we propose to use the Kinect for biometrics; rather than accommodating a wide range of users we exploit each user's uniqueness in terms of gestures. Unlike pure biometrics, such as iris scanners, face detectors, and fingerprint recognition which depend on irrevocable biometric data, the Kinect can provide additional revocable gesture information. We propose a dynamic time-warping (DTW) based framework applied to the Kinect's skeletal information for user access control. Our approach is validated in two scenarios: user identification, and user authentication on a dataset of 20 individuals performing 8 unique gestures. We obtain an overall 4.14%, and 1.89% Equal Error Rate (EER) in user identification, and user authentication, respectively, for a gesture and consistently outperform related work on this dataset. Given the natural noise present in the real-time depth sensor this yields promising results.
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 01/2013 · 4.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Searching for images on-line using keywords returns results that are often difficult to interpret. This becomes even more complicated if one attempts to compare image search output for several keywords with a common theme. We focus on the latter problem and propose a method to efficiently compare sets of images in order to find representative images, one from each set, that are coherent in certain sense. However, the search for an optimal set of representative images is very complex even for as few as 10 sets of 20 images each since all possible combinations of 10 images need to be considered. Therefore, we formulate our problem as the Generalized Traveling Salesman Problem (GTSP) and propose an efficient approximation algorithm to solve it. Our approximate GTSP algorithm is faster than other well-known approximations and is also more likely to reach the exact solution for large-scale inputs. We present a number of experimental results using the proposed algorithm and conclude that it can be a useful, almost real-time tool for on-line search.
    Proceedings of the 20th ACM international conference on Multimedia; 10/2012
  • Source
    Ye Wang, Prakash Ishwar, Shantanu Rane
    [Show abstract] [Hide abstract]
    ABSTRACT: The problem in which one of three pairwise interacting parties is required to securely compute a function of the inputs held by the other two, when one party may arbitrarily deviate from the computation protocol (active behavioral model), is studied. An information-theoretic characterization of unconditionally secure computation protocols under the active behavioral model is provided. A protocol for Hamming distance computation is provided and shown to be unconditionally secure under both active and passive behavioral models using the information-theoretic characterization. The difference between the notions of security under the active and passive behavioral models is illustrated through the BGW protocol for computing quadratic and Hamming distances; this protocol is secure under the passive model, but is shown to be not secure under the active model.
    06/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The availability of 3D hardware has so far outpaced the production of 3D content. Although to date many methods have been proposed to convert 2D images to 3D stereopairs, the most successful ones involve human operators and, therefore, are time-consuming and costly, while the fully-automatic ones have not yet achieved the same level of quality. This subpar performance is due to the fact that automatic methods usually rely on assumptions about the captured 3D scene that are often violated in practice. In this paper, we explore a radically different approach inspired by our work on saliency detection in images. Instead of relying on a deterministic scene model for the input 2D image, we propose to "learn" the model from a large dictionary of stereopairs, such as YouTube 3D. Our new approach is built upon a key observation and an assumption. The key observation is that among millions of stereopairs available on-line, there likely exist many stereopairs whose 3D content matches that of the 2D input (query). We assume that two stereopairs whose left images are photometrically similar are likely to have similar disparity fields. Our approach first finds a number of on-line stereopairs whose left image is a close photometric match to the 2D query and then extracts depth information from these stereopairs. Since disparities for the selected stereopairs differ due to differences in underlying image content, level of noise, distortions, etc., we combine them by using the median. We apply the resulting median disparity field to the 2D query to obtain the corresponding right image, while handling occlusions and newly-exposed areas in the usual way. We have applied our method in two scenarios. First, we used YouTube 3D videos in search of the most similar frames. Then, we repeated the experiments on a small, but carefully-selected, dictionary of stereopairs closely matching the query. This, to a degree, emulates the results one would expect from the use of an extremely large 3D repository. While far from perfect, the presented results demonstrate that on-line repositories of 3D content can be used for effective 2D-to-3D image conversion. With the continuously increasing amount of 3D data on-line and with the rapidly growing computing power in the cloud, the proposed framework seems a promising alternative to operator-assisted 2D-to-3D conversion.
    Proc SPIE 02/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Change detection is one of the most commonly encoun-tered low-level tasks in computer vision and video process-ing. A plethora of algorithms have been developed to date, yet no widely accepted, realistic, large-scale video dataset exists for benchmarking different methods. Presented here is a unique change detection benchmark dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR). A distinguishing characteristic of this dataset is that each frame is meticu-lously annotated for ground-truth foreground, background, and shadow area boundaries – an effort that goes much be-yond a simple binary label denoting the presence of change. This enables objective and precise quantitative comparison and ranking of change detection algorithms. This paper presents and discusses various aspects of the new dataset, quantitative performance metrics used, and comparative re-sults for over a dozen previous and new change detection algorithms. The dataset, evaluation tools, and algorithm rankings are available to the public on a website 1 and will be updated with feedback from academia and industry in the future.
    CVPR - Change Detection Workshop; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: A successful approach to high-dimensional classification problems has been to couple nearest-neighbor classification with distance-preserving data dimensionality reduction via independent random projections. In many problems, however, the observed data is related to a natural set of latent variables through a non-trivial sensing process. A common approach in such cases is to either ignore the sensing process or to invert it and then proceed with dimensionality reduction and classifier design. Inversion can be costly and may be unnecessary if the end goal is classification. Rather than either ignoring sensing structure or using it for explicit inversion, we propose using “structured random projections” whose correlation explicitly accounts for the sensing structure. We show examples comparing the performance of independent random projections of observed data, independent random projections of reconstructed data, and our new structured random projections of observed data which demonstrate the benefits of our approach.
    Statistical Signal Processing Workshop (SSP), 2012 IEEE; 01/2012
  • J. Konrad, Meng Wang, P. Ishwar
    [Show abstract] [Hide abstract]
    ABSTRACT: Among 2D-to-3D image conversion methods, those involving human operators have been most successful but also time-consuming and costly. Automatic methods, that typically make use of a deterministic 3D scene model, have not yet achieved the same level of quality as they often rely on assumptions that are easily violated in practice. In this paper, we adopt the radically different approach of “learning” the 3D scene structure. We develop a simplified and computationally-efficient version of our recent 2D-to-3D image conversion algorithm. Given a repository of 3D images, either as stereopairs or image+depth pairs, we find k pairs whose photometric content most closely matches that of a 2D query to be converted. Then, we fuse the k corresponding depth fields and align the fused depth with the 2D query. Unlike in our original work, we validate the simplified algorithm quantitatively on a Kinect-captured image+depth dataset against the Make3D algorithm. While far from perfect, the presented results demonstrate that online repositories of 3D content can be used for effective 2D-to-3D image conversion.
    Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on; 01/2012
  • Nan Ma, Prakash Ishwar
    [Show abstract] [Hide abstract]
    ABSTRACT: Mehdi Torbatian kindly pointed out a difficulty in the proof of the unnumbered lemma in the above titled paper, Sec. III-C, p. 3768. The issue is addressed by the authors of the original paper.
    IEEE Transactions on Information Theory 01/2012; 58(6):4074-4074. · 2.62 Impact Factor

Publication Stats

730 Citations
99.87 Total Impact Points

Institutions

  • 2005–2012
    • Boston University
      • Department of Electrical and Computer Engineering
      Boston, Massachusetts, United States
  • 2007–2010
    • Indian Institute of Technology Bombay
      • Department of Electrical Engineering
      Mumbai, State of Maharashtra, India
  • 2009
    • University of California, San Diego
      • Department of Electrical and Computer Engineering
      San Diego, CA, United States
  • 2002–2008
    • University of California, Berkeley
      • Department of Electrical Engineering and Computer Sciences
      Berkeley, MO, United States
  • 1997–2005
    • University of Illinois, Urbana-Champaign
      • • Department of Electrical and Computer Engineering
      • • Beckman Institute for Advanced Science and Technology
      Urbana, Illinois, United States