Home
Microsoft
Harpreet S. Sawhney

Harpreet S. Sawhney
Microsoft

About

165

Publications

21,063

Reads

13,206

Citations

Publications

Ultrawide Baseline Facade Matching for Geo-localization

Chapter

Jul 2016

Matching street-level images to a database of airborne images is hard because of extreme viewpoint and illumination differences. Color/gradient distributions or local descriptors fail to match forcing us to rely on the structure of self-similarity of patterns on facades. We propose to capture this structure with a novel “scale-selective self-simila...

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Article

Mar 2016

We propose a new zero-shot Event-Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional s...

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Conference Paper

Full-text available

Feb 2016

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional s...

Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries

Article

Full-text available

Oct 2015

We present an algorithm to estimate depth in dynamic video scenes. We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene. Using our method, depth can be estimated from unconstrained videos with no requirement of camera pose estimation, and with significant background/foregrou...

"Snap-n-Eat": Food Recognition and Nutrition Estimation on a Smartphone

Article

Apr 2015

We present snap-n-eat, a mobile food recognition system. The system can recognize food and estimate the calorific and nutrition content of foods automatically without any user intervention. To identify food items, the user simply snaps a photo of the food plate. The system detects the salient region, crops its image, and subtracts the background ac...

Method for building and extracting entity networks from video

Patent

Mar 2015

A computer implemented method for deriving an attribute entity network (AEN) from video data is disclosed, comprising the steps of: extracting at least two entities from the video data; tracking the trajectories of the at least two entities to form at least two tracks; deriving at least one association between at least two entities by detecting at...

Method and apparatus for inferring the geographic location of captured scene depictions

Patent

Mar 2015

A method and apparatus for determining a geographic location of a scene in a captured depiction comprising extracting a first set of features from the captured depiction by algorithmically analyzing the captured depiction, matching the extracted features of the captured depiction against a second set of extracted features associated with reference...

De-correlating CNN Features for Generative Classification

Article

Feb 2015

The problem of training a classifier from a handful of positive examples, without having to supply class specific negatives is of great practical importance. The proposed approach to solving this problem builds on the idea of training LDA classifiers using only class specific foreground images and a large collection of unlabelled images, as describ...

3-D model based method for detecting and classifying vehicles in aerial imagery

Patent

Dec 2014

A computer implemented method for determining a vehicle type of a vehicle detected in an image is disclosed. An image having a detected vehicle is received. A number of vehicle models having salient feature points is projected on the detected vehicle. A first set of features derived from each of the salient feature locations of the vehicle models i...

Method and apparatus for real-time pedestrian detection for urban driving

Patent

Oct 2014

A computer implemented method for detecting the presence of one or more pedestrians in the vicinity of the vehicle is disclosed. Imagery of a scene is received from at least one image capturing device. A depth map is derived from the imagery. A plurality of pedestrian candidate regions of interest (ROIs) is detected from the depth map by matching e...

Pedestrian Detection in Low-Resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS)

Conference Paper

Jun 2014

Detecting pedestrians at a distance from large-format wide-area imagery is a challenging problem because of low ground sampling distance (GSD) and low frame rate of the imagery. In such a scenario, the approaches based on appearance cues alone mostly fail because pedestrians are only a few pixels in size. Frame-differencing and optical flow based a...

Method and apparatus for detecting and tracking vehicles

Patent

Apr 2014

The present invention relates to a method and apparatus for detecting and tracking vehicles. One embodiment of a system for detecting and tracking an object (e.g., vehicle) in a field of view includes a moving object indication stage for detecting a candidate object in a series of input video frames depicting the field of view and a track associati...

Automatic 3D change detection for glaucoma diagnosis

Conference Paper

Mar 2014

Important diagnostic criteria for glaucoma are changes in the 3D structure of the optic disc due to optic nerve damage. We propose an automatic approach for detecting these changes in 3D models reconstructed from fundus images of the same patient taken at different times. For each time session, only two uncalibated fundus images are required. The a...

Multimodal fusion using dynamic hybrid models

Conference Paper

Mar 2014

We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the representational power of generative models. Our focus is on detecting multimodal events in time varying sequences. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based class...

Content-based matching of videos using local spatio-temporal fingerprints

Patent

Full-text available

Feb 2014

A computer implemented method for matching video data to a database containing a plurality of video fingerprints of the type described above, comprising the steps of calculating at least one fingerprint representing at least one query frame from the video data; indexing into the database using the at least one calculated fingerprint to find a set o...

Real-time action detection and classification

Patent

Jan 2014

The present invention relates to a method and system for creating a strong classifier based on motion patterns wherein the strong classifier may be used to determine an action being performed by a body in motion. When creating the strong classifier, action classification is performed by measuring similarities between features within motion patterns...

Semantic pooling for complex event detection

Conference Paper

Oct 2013

Complex event detection is very challenging in open source such as You-Tube videos, which usually comprise very diverse visual contents involving various object, scene and action concepts. Not all of them, however, are relevant to the event. In other words, a video may contain a lot of "junk" information which is harmful for recognition. Hence, we...

Interactive Retinal Vessel Extraction by Integrating Vessel Tracing and Graph Search

Conference Paper

Sep 2013

Despite recent advances, automatic blood vessel extraction from low quality retina images remains difficult. We propose an interactive approach that enables a user to efficiently obtain near perfect vessel segmentation with a few mouse clicks. Given two seed points, the approach seeks an optimal path between them by minimizing a cost function. In c...

3D optic disc reconstruction via a global fundus stereo algorithm

Article

Full-text available

Jul 2013

This paper presents a novel method to recover 3D structure of the optic disc in the retina from two uncalibrated fundus images. Retinal images are commonly uncalibrated when acquired clinically, creating rectification challenges as well as significant radiometric and blur differences within the stereo pair. By exploiting structural peculiarities of...

Affect analysis in natural human interaction using Joint Hidden Conditional Random Fields

Conference Paper

Jul 2013

We present a novel approach for multi-modal affect analysis in human interactions that is capable of integrating data from multiple modalities while also taking into account temporal dynamics. Our fusion approach, Joint Hidden Conditional Random Fields (JHRCFs), combines the advantages of purely feature level (early fusion) fusion approaches with l...

Recognizing Activities via Bag of Words for Attribute Dynamics

Conference Paper

Full-text available

Jun 2013

In this work, we propose a novel video representation for activity recognition that models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized by the dynamics of their attributes. These segments are modeled by a dictionary of attribute dynamics templates, which are implement...

System and method for detection of multi-view/multi-pose objects

Patent

Mar 2013

The present invention provides a computer implemented process for detecting multi-view multi-pose objects. The process comprises training of a classifier for each intra-class exemplar, training of a strong classifier and combining the individual exemplar-based classifiers with a single objective function. This function is optimized using the two ne...

Weapon identification using acoustic signatures across varying capture conditions

Patent

Feb 2013

A computer implemented method for automatically detecting and classifying acoustic signatures across a set of recording conditions is disclosed. A first acoustic signature is received. The first acoustic signature is projected into a space of a minimal set of exemplars of acoustic signature types derived from a larger set of exemplars using a wrapp...

System and method of processing stereo images

Patent

Feb 2013

The present invention is a system and a method for processing stereo images utilizing a real time, robust, and accurate stereo matching system and method based on a coarse-to-fine architecture. At each image pyramid level, non-centered windows for matching and adaptive upsampling of coarse-level disparities are performed to generate estimated dispa...

Image to LIDAR matching for geotagging in urban environments

Conference Paper

Jan 2013

We present a novel method for matching ground-based query images to a georeferenced LIDAR 3D dataset acquired from an airborne platform in urban environments. We are addressing two main technical challenges: (i) different modalities between the query and the reference data (electro-optical vs. LIDAR) that impose unique challenges to the matching pr...

Video event recognition using concept attributes

Conference Paper

Jan 2013

We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos. We model events using a variety of complementary semantic attribute features developed in a semantic concept space. Our contribution is to systematically demonstrate the advantages of this conce...

Method for computing food volume in a method for analyzing food

Patent

Full-text available

Jan 2013

A computer-implemented method for estimating a volume of at least one food item on a food plate is disclosed. A first and second plurality of images are received from different positions above a food plate, wherein angular spacing between the positions of the first plurality of images is greater than angular spacing between the positions of the sec...

Method and apparatus for recognizing 3-D objects

Patent

Full-text available

Jan 2013

A method and apparatus for recognizing an object, comprising providing a set of scene features from a scene, pruning a set of model features, generating a set of hypotheses associated with the pruned set of model features for the set of scene features, pruning the set of hypotheses, and verifying the set of pruned hypotheses is provided.

Multimedia event recounting with concept based representation

Conference Paper

Oct 2012

Multimedia event detection has drawn a lot of attention in recent years. Given a recognized event, in this paper, we conduct a pilot study of the multimedia event recounting problem, which answers the question why this video is recognized as this event, i.e. what evidences this decision is made on. In order to provide a semantic recounting of the m...

Ultra-wide Baseline Facade Matching for Geo-localization

Conference Paper

Oct 2012

Building segmentation for densely built urban regions using aerial LIDAR data

Patent

Jul 2012

A method for extracting a 3D terrain model for identifying at least buildings and terrain from LIDAR data is disclosed, comprising the steps of generating a point cloud representing terrain and buildings mapped by LIDAR; classifying points in the point cloud, the point cloud having ground and non-ground points, the non-ground points representing bu...

Evaluation of low-level features and their combinations for complex event detection in open source videos

Conference Paper

Jun 2012

Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the...

Non-fluoride treatment modalities of tooth hypersensitivity

Article

Jan 2012

It is true that the teeth of man have been hurting for many thousands of years. The causes of the pain and effective methods to relieve or prevent the pain tend to follow traditional routes like random treatment, starting with materials of natural origin and then work by the aggressively curious, the intelligent and the innovative to better underst...

Photodynamic therapy in dentistry - A review

Article

Jan 2012

Photodynamic therapy (PDT) also known as photoradiation therapy, phototherapy, or photochemotherapy, involves the use of a photoactive dye (photosensitizer) that is activated by exposure to light of a specific wavelength in the presence of oxygen. PDT is a very promising and non invasive treatment modality. At present, PDT is alternative method of...

Geo-localization of street views with aerial image databases

Conference Paper

Nov 2011

We study the feasibility of solving the challenging problem of geolocalizing ground level images in urban areas with respect to a database of images captured from the air such as satellite and oblique aerial images. We observe that comprehensive aerial image databases are widely available while complete coverage of urban areas from the ground is at...

CLUSTERING MULTIPLE IMAGE SEQUENCES WITH A SEQUENCE-TO-SEQUENCE SIMILARITY MEASURE

Article

Nov 2011

We propose a novel similarity measure of two image sequences based on shapeme histograms. The idea of shapeme histogram has been used for single image/texture recognition, but is used here to solve the sequence-to-sequence matching problem. We develop techniques to represent each sequence as a set of shapeme histograms, which captures different var...

3D Alignment and Change Detection from Uncalibrated Eye Images

Conference Paper

Jul 2011

Analyzing change in the 3D structure of the optic disc over time has long been recognized as central to the diagnosis of glaucoma but has been inadequately addressed by computer vision methods. Currently, clinicians examine stereo pairs from different time instants for interval changes indicative of glaucoma. Due to the clinical procedures in captu...

Vehicle tracking across nonoverlapping cameras using joint kinematic and appearance features

Conference Paper

Jun 2011

We describe a vehicle tracking algorithm using input from a network of nonoverlapping cameras. Our algorithm is based on a novel statistical formulation that uses joint kinematic and image appearance information to link local tracks of the same vehicles into global tracks with longer persistence. The algorithm can handle significant spatial separat...

A LIDAR streaming architecture for mobile robotics with application to 3D structure characterization

Conference Paper

May 2011

We present a novel LIDAR streaming architecture for real-time, on-board processing using unmanned robots. We propose a two-level 3D data structure that allows pipelined and streaming processing of the 3D data as it arrives from a moving robot: (i) at the coarse level, the incoming 3D scans are stored in memory in a dense 3D voxel grid with a relati...

Utility of Digital Stereo Images for Optic Disc Evaluation

Article

Full-text available

Nov 2010

To assess the suitability of digital stereo images for optic disc evaluations in glaucoma. Stereo color optic disc images in both digital and 35-mm slide film formats were acquired contemporaneously from 29 subjects with various cup-to-disc ratios (range, 0.26-0.76; median, 0.475). Using a grading scale designed to assess image quality, the ease of...

Automatic Blood Vessel Localization in Small Field of View Eye Images

Article

Aug 2010

Localizing blood vessels in eye images is a crucial step in the automated and objective diagnosis of eye diseases. Most previous research has focused on extracting the centerlines of vessels in large field of view images. However, for diagnosing diseases of the optic disk region, like glaucoma, small field of view images have to be analyzed. One ne...

3D Model Based Vehicle Classification in Aerial Imagery

Conference Paper

Jun 2010

We present an approach that uses detailed 3D models to detect and classify objects into fine levels of vehicle categories. Unlike other approaches that use silhouette information to fit a 3D model, our approach uses complete appearance from the image. Each D model has a set of salient location markers that are determined a-priori. These salient loc...

Vehicle Detection and Tracking in Wide Field-of-View Aerial Video

Conference Paper

Jun 2010

This paper presents a joint probabilistic relation graph approach to simultaneously detect and track a large number of vehicles in low frame rate aerial videos. Due to low frame rate, low spatial resolution and sheer number of moving objects, detection and tracking in wide area video poses unique challenges. In this paper, we explore vehicle behavi...

A Real-time Pedestrian Detection System based on Structure and Appearance Classification

Conference Paper

May 2010

We present a real-time pedestrian detection system based on structure and appearance classification. We discuss several novel ideas that contribute to having low-false alarms and high detection rates, while at the same time achieving computational efficiency: (i) At the front end of our system we employ stereo to detect pedestrians in 3D range maps...

Wide Area Active Collaborative Tracking of Waterborne Vessels

Article

Apr 2010

We describe a real-time wide area surveillance system (WA-ACTV) for the automatic tracking of vessels using a network of PTZ cameras. The system is capable of optimally managing hundreds of PTZ cameras to simultaneously track a large numbers of vessels. The tracked vessels are fingerprinted using a scale-invariant part-based representation and subs...

Combining Structure and Appearance Cues for Real-time Pedestrian Detection

Article

Apr 2010

We present a real-time pedestrian detection system which uses cues derived from structure and appearance classification We discuss several novel ideas to achieve computational efficien y while improving on both detection and false-alarm rates: (i) At the front end of our system we employ stereo to detect pedestrians in 3D range maps, and to classif...

Weapon Identification Across Varying Acoustic Conditions Using an Exemplar Embedding Approach

Article

Apr 2010

Gunshot recordings have the potential for both tactical detection and forensic evaluation particularly to ascertain information about the type of firearm and ammunition used. Perhaps the most significant challenge to such an analysis is the effect of recording conditions on the audio signature of recorded data. In this paper we present a first stud...

Advanced Vehicle Tracking in Persistent Aerial Surveillance Video

Article

Apr 2010

This paper presents a relational graph based approach to track thousands of vehicles from persistent wide area airborne surveillance (WAAS) videos. Due to the low ground sampling distance and low frame rate, vehicles usually have small size and may travel a long distance between consecutive frames, WAAS videos pose great challenges to correct assoc...

LIDAR-based Door and Stair Detection from a Mobile Robot

Article

Apr 2010

We present an on-the-move LIDAR-based object detection system for autonomous and semi-autonomous unmanned vehicle systems. In this paper we make several contributions: (i) we describe an algorithm for real-time detection of objects such as doors and stairs in indoor environments; (ii) we describe efficient data structures and algorithms for process...

Recognition and volume estimation of food intake using a mobile device

Conference Paper

Dec 2009

We present a system that improves accuracy of food intake assessment using computer vision techniques. Traditional dietetic method suffers from the drawback of either inaccurate assessment or complex lab measurement. Our solution is to use a mobile phone to capture images of foods, recognize food types, estimate their respective volumes and finally...

Pedestrian detection with depth-guided structure labeling

Conference Paper

Nov 2009

We propose a principled statistical approach for using 3D information and scene context to reduce the number of false positives in stereo based pedestrian detection. Current pedestrian detection algorithms have focused on improving the discriminability of 2D features that capture the pedestrian appearance, and on using various classifier architectu...

Action exemplar based real-time action detection

Conference Paper

Sep 2009

We propose a real-time action detection system based on a novel action representation and an effective learning method with a small training set. We represent actions with a new feature that measures the Â¿globalÂ¿ distance from a set of action exemplars, where action exemplars are constructed from a vocabulary that encodes Â¿localÂ¿ instantaneous...

Weapon Identification Using Hierarchical Classification of Acoustic Signatures

Article

May 2009

We apply a unique hierarchical audio classification technique to weapon identification using gunshot analysis. The Audio Classification classifies each audio segment as one of ten weapon classes (e.g., 9mm, 22, shotgun etc.) using lowcomplexity Gaussian Mixture Models (GMM). The first level of hierarchy consists of classification into broad weapons...

ACT-Vision: active collaborative tracking for multiple PTZ cameras

Article

Full-text available

Apr 2009

We describe a novel scalable approach for the management of a large number of Pan-Tilt-Zoom (PTZ) cameras deployed outdoors for persistent tracking of humans and vehicles, without resorting to the large fields of view of associated static cameras. Our system, Active Collaborative Tracking - Vision (ACT-Vision), is essentially a real-time operating...

Vision-based Perception for Autonomous Urban Navigation

Conference Paper

Nov 2008

We describe a low-cost vision-based sensing and positioning system that enables intelligent vehicles of the future to autonomously drive in an urban environment with traffic. The system was built by integrating Sarnoff's algorithms for driver awareness and vehicle safety with commercial off-the-shelf hardware on a robot vehicle. We implemented a mo...

Toward a sentient environment: Real-time wide area multiple human tracking with identities

Article

Oct 2008

In this paper, we presented a fully integratedreal-time computer vision system that can detect and track multiple humans in a wide-area using a network of stereo cameras. Continuous human identities are achieved by fusing video tracking with different kinds of biometric devices. The system also provides immersive visualization which enables the use...

Special issue on video surveillance research in industry and academia

Article

Oct 2008

Predicting motion of humans, animals and other objects which move according to internal plans is a challenging problem. Most existing approaches operate in two stages: (a) learning typical motion patterns by observing an environment and (b) predicting ...

Geo-spatial aerial video processing for scene understanding and object tracking

Conference Paper

Jul 2008

This paper presents an approach to extracting and using semantic layers from low altitude aerial videos for scene understanding and object tracking. The input video is captured by low flying aerial platforms and typically consists of strong parallax from non-ground-plane structures. A key aspect of our approach is the use of geo-registration of vid...

HO2: A new feature for multi-agent event detection and recognition

Article

Jun 2008

In this paper, we present a new feature to model a class of events that consist of complex interactions among multiple entities captured by tracks and inter-object relationships over space and time. Existing approaches represent these events using features that measure only pairwise relationships between entities at a time, such as relative distanc...

Building segmentation for densely built urban regions using aerial LIDAR data

Conference Paper

Jun 2008

We present a novel building segmentation system for densely built areas, containing thousands of buildings per square kilometer. We employ solely sparse LIDAR (Light/Laser Detection Ranging) 3D data, captured from an aerial platform, with resolution less than one point per square meter. The goal of our work is to create segmented and delineated bui...

Discovering class specific composite features through discriminative sampling with Swendsen-Wang Cut

Conference Paper

Jun 2008

This paper proposes a novel approach to discover a set of class specific ldquocomposite featuresrdquo as the feature pool for the detection and classification of complex objects using AdaBoost. Each composite feature is constructed from the combination of multiple individual features. Unlike previous works that design features manually or with cert...

Matching vehicles under large pose transformations using approximate 3D models and piecewise MRF model

Conference Paper

Full-text available

Jun 2008

We propose a robust object recognition method based on approximate 3D models that can effectively match ob- jects under large viewpoint changes and partial occlusion. The specific problem we solve is: given two views of an object, determine if the views are for the same or differ- ent object. Our domain of interest is vehicles, but the ap- proach c...

Real-time global localization with a pre-built visual landmark database

Conference Paper

Full-text available

Jun 2008

In this paper, we study how to build a vision-based sys- tem for global localization with accuracies within 10cm. for robots and humans operating both indoors and outdoors over wide areas covering many square kilometers. In par- ticular, we study the parameters of building a landmark database rapidly and utilizing that database online for real- tim...

Unsupervised Learning of Discriminative Edge Measures for Vehicle Matching between Nonoverlapping Cameras

Article

May 2008

This paper proposes a novel unsupervised algorithm learning discriminative features in the context of matching road vehicles between two non-overlapping cameras. The matching problem is formulated as a same-different classification problem, which aims to compute the probability of vehicle images from two distinct cameras being from the same vehicle...

Action video retrieval based on atomic action vocabulary

Conference Paper

Oct 2008

We propose an efficient action retrieval system that is based on a novel action representation and an effective video matching method. We represent actions with a hierarchical encoding scheme that at low-level measures local body parts motions, which then evolves into encoding of instantaneous global body motions and finally high-level description...

Content-Based Matching of Videos Using Local Spatio-temporal Fingerprints

Conference Paper

Nov 2007

Fingerprinting is the process of mapping content or fragments of it, into unique, discriminative hashes called fingerprints. In this paper, we propose an automated video identification algorithm that employs fingerprinting for storing videos inside its database. When queried using a degraded short video segment, the objective of the system is to re...

Multiple Cue Integrated Action Detection

Conference Paper

Oct 2007

We present an action recognition scheme that integrates multiple modality of cues that include shape, motion and depth to recognize human gesture in the video sequences. In the proposed approach we extend classification framework that is commonly used in 2D object recognition to 3D spatio-temporal space for recognizing actions. Specifically, a boos...

Invited Talk; "Visual Intelligence from Video and 3D Sensor Analytics"

Conference Paper

Jul 2007

Harpreet S. Sawhney

Video cameras are no longer being used only in their traditional role of providing "viewable pixels", but are rapidly becoming sources of intelligent information about the world. More recently 3D cameras are being developed to directly provide 3D measurements of objects and scenes. Appearance and geometry of objects and scenes, and the temporal dyn...

PEET: Prototype Embedding and Embedding Transition for Matching Vehicles over Disparate Viewpoints

Conference Paper

Jun 2007

This paper presents a novel framework, prototype embedding and embedding transition (PEET), for matching objects, especially vehicles, that undergo drastic pose, appearance, and even modality changes. The problem of matching objects seen under drastic variations is reduced to matching embeddings of object appearances instead of matching the object...

Robust Object Matching for Persistent Tracking with Heterogeneous Features

Article

Jun 2007

This paper addresses the problem of matching vehicles across multiple sightings under variations in illumination and camera poses. Since multiple observations of a vehicle are separated in large temporal and/or spatial gaps, thus prohibiting the use of standard frame-to-frame data association, we employ features extracted over a sequence during one...

Ten-fold Improvement in Visual Odometry Using Landmark Matching

Conference Paper

Full-text available

Jan 2007

Our goal is to create a visual odometry system for robots and wearable systems such that localization accuracies of centimeters can be obtained for hundreds of meters of distance traveled. Existing systems have achieved approximately a 1% to 5% localization error rate whereas our proposed system achieves close to 0.1% error rate, a ten-fold reducti...

Learning Actions Using Robust String Kernels

Conference Paper

Jan 2007

This paper presents an action analysis method based on robust string matching using dynamic programming. Similar to matching text sequences, atomic actions based on semantic and structural features are first detected and coded as spatio-temporal characters or symbols. These symbols are subsequently concatenated to form a unique set of strings for e...

Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation

Article

Full-text available

Aug 2006

We propose a new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques. Our approach achieves a sublinear complexity on the number of models, maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings. The key c...

Rapid and scalable 3D object recognition using LIDAR data

Article

Jun 2006

This paper describes a model-based 3D object recognition system, which makes use of 3D data acquired by LIDAR sensors. The system is based on a coarse-to-fine scheme for object indexing and verification to achieve high efficiency and accuracy. The system employs rotationally invariant semi-local spin image features for object representation and for...

Exploiting Model Similarity for Indexing and Matching to a Large Model Database

Conference Paper

May 2006

This paper proposes a novel method to exploit model similarity in model-based 3D object recognition. The scenario consists of a large D model database of vehicles, and rapid indexing and matching needs to be done without sequential model alignment. In this scenario, the competition amongst shape features from similar models may pose serious challen...

Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection

Conference Paper

Full-text available

May 2006

Using the variational approaches to estimate optical flow between two frames, the flow discontinuities between different motion fields are usually not distinguished even when an anisotropic diffusion operator is applied. In this pa- per, we propose a multi-cue driven adaptive bilateral filter to regularize the flow computation, which is able to ach...

Identification of Highly Similar 3D Objects Using Model Saliency

Conference Paper

Full-text available

May 2006

We present a novel approach for identifying 3D objects from a database of models, highly similar in shape, using range data acquired in unconstrained settings from a limited number of viewing directions. We are addressing also the challenging case of identifying targets not present in the database. The method is based on learning offline saliency...

Defense and Security Symposium

Article

May 2006

Shapeme histogram projection and matching for partial object recognition

Article

May 2006

Histograms of shape signature or prototypical shapes, called shapemes, have been used effectively in previous work for 2D/3D shape matching and recognition. We extend the idea of shapeme histogram to recognize partially observed query objects from a database of complete model objects. We propose representing each model object as a collection of sha...

A Heterogeneous Feature-based Image Alignment Method

Conference Paper

Full-text available

Jan 2006

In this paper, we propose a robust heterogeneous feature based image alignment method that utilizes points, lines and regions in a unified framework. The image motion is decomposed into progressively complex components, i.e., translation, similarity, affine, and projective motion models, and alignment is obtained with deliberatively selected suitab...

Learning Exemplar-Based Categorization for the Detection of Multi-View Multi-Pose Objects

Conference Paper

Jan 2006

This paper proposes a novel approach for multi-view multi-pose object detection using discriminative shapebased exemplars. The key idea underlying this method is motivated by numerous previous observations that manually clustering multi-view multi-pose training data into different categories and then combining the separately trained two-class class...

Robust object matching for persistent tracking with heterogeneous features

Conference Paper

Nov 2005

Tracking objects over a long period of time in realistic environments remains a challenging problem for ground and aerial video surveillance. Matching objects and verifying their identities across multiple spatial and temporal gaps proves to be an effective way to extend tracking range. When an object track is lost due to occlusion or other reasons...

Vehicle Identification between Non-Overlapping Cameras without Direct Feature Matching.

Conference Paper

Jan 2005

We propose a novel method for identifying road vehicles between two nonoverlapping cameras. The problem is formulated as a same-different classification problem: probability of two vehicle images from two distinct cameras being from the same vehicle or from different vehicles. The key idea is to compute the probability without matching the two vehi...

Unsupervised Learning of Discriminative Edge Measures for Vehicle Matching between Non-Overlapping Cameras

Conference Paper

Jan 2005

This paper proposes a method for matching road vehicles between two non-overlapping cameras. The matching problem is formulated as a same-different classification problem: probability of two observations from two distinct cameras being from the same vehicle or from different vehicles. We employ a measurement vector consists of three independent edg...

Real-Time Wide Area Multi-Camera Stereo Tracking.

Conference Paper

Jan 2005

We present a fully integrated real-time system to track humans with a network of stereo sensors over a wide area. The processing includes single camera tracking and multi-camera fusion. Each single camera detects and tracks humans in its own view and a multi-camera fusion module combines all the local tracks of the same human into a global track. W...

Vehicle Fingerprinting for Reacquisition and Tracking in Videos

Article

Jan 2005

Visual recognition of objects through multiple observations is an important component of object tracking. We address the problem of vehicle matching when multiple observations of a vehicle are separated in time such that frames of observations are not contiguous, thus prohibiting the use of standard frame-to-frame data association. We employ featur...

Linear Model Hashing and Batch RANSAC

Article

Jul 2004

This paper proposes a joint feature-based model indexing and geometric constraint based alignment pipeline for efficient and accurate recognition of 3D objects from a large model database. Traditional approaches either first prune the model database using indexing without geometric alignment or directly perform recognition based alignment. The inde...

Partial Object Matching with Shapeme Histograms

Conference Paper

May 2004

Histogram of shape signature or prototypical shapes, called shapemes, have been used effectively in previous work for 2D/3D shape matching & recognition. We extend the idea of shapeme histogram to recognize partially observed query objects from a database of complete model objects. We propose to represent each model object as a collection of shapem...

Depth map compression for real-time view-based rendering

Article

May 2004

Realistic and interactive telepresence has been a hot research topic in recent years. Enabling telepresence using depth-based new view rendering requires the compression and transmission of video as well as dynamic depth maps from multiple cameras. The telepresence application places additional requirements on the compressed representation of depth...

Linear Model Hashing and Batch RANSAC for Rapid and Accurate Object Recognition.

Conference Paper

Full-text available

Jan 2004

MEASURING THE SIMILARITY OF TWO IMAGE SEQUENCES

Article

Jan 2004

We propose a novel similarity measure of two image sequences based on shapeme histograms. The idea of shapeme histogram has been used for single image/texture recognition, but is used here to solve the sequence-to- sequence matching problem. We develop techniques to represent each sequence as a set of shapeme histograms, which captures different va...

Robust Video Georegistration in the Presence of Significant Appearance Changes

Article

Jan 2003

Video information can provide an inexpensive source of information about the world. For many applications such as surveillance, situation awareness and navigation, the utility of this video information is increased if we are able to assign precise geocoordinates to the pixels in the video acquired from an airborne platform. Many video-capture platf...

Immersive remote monitoring of urban sites

Article

Aug 2002

In a typical security and monitoring system a large number of networked cameras are installed at fixed positions around a site under surveillance. There is generally no global view or map that shows the guard how the views of different cameras relate to one another. Individual cameras may be equipped with pan, tilt and zoom capabilities, and the gu...

Is Super-Resolution with Optical Flow Feasible?

Conference Paper

May 2002

Reconstruction-based super-resolution from motion video has been an active area of study in computer vision and video analysis. Image alignment is a key component of super-resolution algorithms. Almost all previous super-resolution algorithms have assumed that standard methods of image alignment can provide accurate enough alignment for creating su...

Automated Mosaics via Topology Inference.

Article

Apr 2002

This article presents a complete approach for automated construction of mosaics from images and video, constituting a practical end-to-end system. Local alignment of spatially overlapping frames followed by global consistency provides spatial continuity, while compositing via multiresolution blending provides photometric continuity, so that the mos...

Object tracking with Bayesian estimation of dynamic layerrepresentations

Article

Full-text available

Feb 2002

Decomposing video frames into coherent 2D motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and sprite-based video compression. Previous work on motion layer analysis has lar...

A depth map representation for real-time transmission and view-based rendering of a dynamic 3D scene

Conference Paper

Feb 2002

View-based 3D video streaming requires the compression and transmission of depth maps along with the video sequence. The requirements for representing these depth maps include moderately high compression, preservation of depth discontinuities, low complexity decoding, and to be in a form that is suitable for real-time rendering using graphics cards...

Object Tracking with Bayesian Estimation of Dynamic Layer Representations.

Article

Jan 2002

Decomposing video frames into coherent two-dimensional motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and sprite-based video compression. Previous work on motion layer ana...

Video Flashlights: Real Time Rendering of Multiple Videosfor Immersive Model Visualization.

Conference Paper

Full-text available

Jan 2002

Videos and 3D models have traditionally existed in separate worlds and as distinct representations. Although texture maps for 3D models have been traditionally derived from multiple still images, real-time mapping of live videos as textures on 3D models has not been attempted. This paper presents a system for rendering multiple live videos in real-...

Super-Fusion: A Super-Resolution Method Based on Fusion.

Conference Paper

Full-text available

Jan 2002

Reconstruction-based super-resolution algorithms require very accurate alignment and good choice of filters to be effective. Often these requirements are hard to satisfy, for example, when we adopt optical flow as the motion model. In addition, the condition of having enough sub-samples may vary from pixel to pixel. We propose an alternative super-...