-
[show abstract]
[hide abstract]
ABSTRACT: We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.
IEEE Transactions on Pattern Analysis and Machine Intelligence 04/2009; · 4.91 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.
IEEE Transactions on Pattern Analysis and Machine Intelligence 04/2009; 31(3):539-55. · 4.91 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes a computational system of object categorization based on decomposition and adaptive fusion of visual information. A coupled Conditional Random Field is developed to model the interaction between low level cues of contour and texture, and to decompose contour and texture in natural images. The advantages of using coupled rather than single-layer Random Fields are demonstrated with model learning and evaluation. Multiple decomposed visual cues are adaptively combined for object categorization to fully leverage different discriminative cues for different classes. Experimental results show that the proposed computational model of ldquorecognition-through-decomposition-and-fusionrdquo achieves better performance than most of the state-of-the-art methods, especially when only a limited number of training samples are available.
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on; 07/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In this work, we describe a white matter trajectory clustering algorithm that allows for incorporating and appropriately weighting anatomical information. The influence of the anatomical prior reflects confidence in its accuracy and relevance. It can either be defined by the user or it can be inferred automatically. After a detailed description of our novel clustering framework, we demonstrate its properties through a set of preliminary experiments.
Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on; 07/2008
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a Bayesian approach to incorporate anatomical information in the clustering of fiber trajectories. An expectation-maximization (EM) algorithm is used to cluster the trajectories, in which an atlas serves as the prior on the labels. The atlas guides the clustering algorithm and makes the resulting bundles anatomically meaningful. In addition, it provides the seed points for the tractography and initial settings of the EM algorithm. The proposed approach provides a robust and automated tool for tract-oriented analysis both in a single subject and over a population.
Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE International Symposium on; 06/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In the traditional mixture of Gaussians background model, the generating process of each pixel is modeled as a mixture of Gaussians over color. Unfortunately, this model performs poorly when the background consists of dynamic textures such as trees waving in the wind and rippling water. To address this deficiency, researchers have recently looked to more complex and/or less compact representations of the background process. We propose a generalization of the MoG model that handles dynamic textures. In the context of background modeling, we achieve better, more accurate segmentations than the competing methods, using a model whose complexity grows with the underlying complexity of the scene (as any good model should), rather than the amount of time required to observe all aspects of the texture.
Applications of Computer Vision, 2008. WACV 2008. IEEE Workshop on; 02/2008
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce an algorithm for segmenting brain magnetic resonance (MR) images into anatomical compartments such as the major tissue classes and neuro-anatomical structures of the gray matter. The algorithm is guided by prior information represented within a tree structure. The tree mirrors the hierarchy of anatomical structures and the subtrees correspond to limited segmentation problems. The solution to each problem is estimated via a conventional classifier. Our algorithm can be adapted to a wide range of segmentation problems by modifying the tree structure or replacing the classifier. We evaluate the performance of our new segmentation approach by revisiting a previously published statistical group comparison between first-episode schizophrenia patients, first-episode affective psychosis patients, and comparison subjects. The original study is based on 50 MR volumes in which an expert identified the brain tissue classes as well as the superior temporal gyrus, amygdala, and hippocampus. We generate analogous segmentations using our new method and repeat the statistical group comparison. The results of our analysis are similar to the original findings, except for one structure (the left superior temporal gyrus) in which a trend-level statistical significance (p = 0.07) was observed instead of statistical significance.
IEEE Transactions on Medical Imaging 10/2007; · 3.64 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Intensity-based classification of MR images has proven problematic, even when advanced techniques are used. Intra-scan and interscan intensity inhomogeneities are a common source of difficulty. While reported methods have had some success in correcting intra-scan inhomogeneities, such methods require supervision for the individual scan. This paper describes a new method called adaptive segmentation that uses knowledge of tissue intensity properties and intensity inhomogeneities to correct and segment MR images. Use of the EM algorithm leads to a fully automatic method that allows for more accurate segmentation of tissue types as well as better visualization of MRI data, that has proven to be effective in a study that includes more than 1000 brain scans.
10/2006: pages 57-69;
-
[show abstract]
[hide abstract]
ABSTRACT: Despite its potential for visualizing white matter fiber tracts in vivo, diffusion tensor tractography has found only limited applications in clinical research in which specific anatomic connections between distant regions need to be evaluated. We introduce a robust method for fiber clustering that guides the separation of anatomically distinct fiber tracts and enables further estimation of anatomic connectivity between distant brain regions.
Line scanning diffusion tensor images (LSDTI) were acquired on a 1.5T magnet. Regions of interest for several anatomically distinct fiber tracts were manually drawn; then, white matter tractography was performed by using the Runge-Kutta method to interpolate paths (fiber traces) following the major directions of diffusion, in which traces were seeded only within the defined regions of interest. Next, a fully automatic procedure was applied to fiber traces, grouping them according to a pairwise similarity function that takes into account the shapes of the fibers and their spatial locations.
We demonstrated the ability of the clustering algorithm to separate several fiber tracts which are otherwise difficult to define (left and right fornix, uncinate fasciculus and inferior occipitofrontal fasciculus, and corpus callosum fibers).
This method successfully delineates fiber tracts that can be further analyzed for clinical research purposes. Hypotheses regarding specific fiber connections and their abnormalities in various neuropsychiatric disorders can now be tested.
American Journal of Neuroradiology 06/2006; 27(5):1032-6. · 2.93 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. In this paper, we present a method for segmentation of a particularly complex structure, the brain tissue, from MRI. Our method is a combination of three existing techniques from the Computer Vision literature: adaptive segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation from gradient echo images, and the resultant method is more robust than its components. We present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 10 brains each with 256256124 voxels.
04/2006: pages 427-433;
-
G. J. Ettinger,
M. E. Leventon, W. E. L. Grimson,
R. Kikinis,
V. Gugino,
W. Cote,
L. Sprung,
L. Aglio,
M. Shenton,
G. Potts,
E. Alexander
[show abstract]
[hide abstract]
ABSTRACT: We describe functional brain mapping experiments using a transcranial magnetic stimulation (TMS) device. This device, when placed on a subject's scalp, stimulates the underlying neurons by generating focused magnetic field pulses. A brain mapping is then generated by measuring responses of different motor and sensory functions to this stimulation. The key process in generating this mapping is the association of the 3D positions and orientations of the TMS probe on the scalp to a 3D brain reconstruction such as is feasible with a magnetic resonance image (MRI). We have developed a system which not only generates functional brain maps using such a device, but also provides real-time feedback to guide the technician in placing the probe at appropriate points on the head for achieving the desired map resolution. Functional areas we have mapped are the motor and visual cortex. Validation experiments to date have consisted of repeatability and symmetry tests for mapping the same subjects several times. Applications of the technique include neuranatomy research, surgical planning and guidance, treatment and disease monitoring, and therapeutic procedures.
04/2006: pages 477-486;
-
[show abstract]
[hide abstract]
ABSTRACT: Frameless guidance systems are needed to help surgeons plan exact locations for incisions, define margins of tumors and precisely locate critical structures. We describe an automatic method for registering clinical data, such as segmented MRI or CT, with any view of the patient, demonstrated on neurosurgery examples. The method enables mixing live video of the patient with the segmented 3D MRI or CT model, supporting enhanced reality techniques for planning and guiding procedures, and for interactively, non-intrusively viewing internal structures. We detail a computational evaluation of the method's performance, and clinical experiments using the system in actual neurosurgical cases.
04/2006: pages 1-12;
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we propose an approach to vehicle classification under a mid-field surveillance framework. We develop a repeatable and discriminative feature based on edge points and modified SIFT descriptors, and introduce a rich representation for object classes. Experimental results show the proposed approach is promising for vehicle classification in surveillance videos despite great challenges such as limited image size and quality and large intra-class variations. Comparisons demonstrate the proposed approach outperforms other methods.
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on; 11/2005
-
[show abstract]
[hide abstract]
ABSTRACT: We present an approach for inferring the topology of a camera network by measuring statistical dependence between observations in different cameras. Two cameras are considered connected if objects seen departing in one camera is seen arriving in the other. This is captured by the degree of statistical dependence between the cameras. The nature of dependence is characterized by the distribution of observation transformations between cameras, such as departure to arrival transition times, and color appearance. We show how to measure statistical dependence when the correspondence between observations in different cameras is unknown. This is accomplished by non-parametric estimates of statistical dependence and Bayesian integration of the unknown correspondence. Our approach generalizes previous work which assumed restricted parametric transition distributions and only implicitly dealt with unknown correspondence. Results are shown on simulated and real data. We also describe a technique for learning the absolute locations of the cameras with Global Positioning System (GPS) side information.
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on; 11/2005
-
[show abstract]
[hide abstract]
ABSTRACT: We present an algorithm to estimate the body pose of a walking person given synchronized video input from multiple uncalibrated cameras. We construct an appearance model of human walking motion by generating examples from the space of body poses and camera locations, and clustering them using expectation-maximization. Given a segmented input video sequence, we find the closest matching appearance cluster for each silhouette and use the sequence of matched clusters to extrapolate the position of the camera with respect to the person's direction of motion. For each frame, the matching cluster also provides an estimate of the walking phase. We combine these estimates from all views and find the most likely sequence of walking poses using a cyclical, feed-forward hidden Markov model. Our algorithm requires no manual initialization and no prior knowledge about the locations of the cameras.
Computer Vision and Pattern Recognition Workshop, 2004 Conference on; 07/2004
-
[show abstract]
[hide abstract]
ABSTRACT: High quality segmentation of brain MR images is a challenging task. To deal with this problem many automatic segmentation methods rely on atlas information of anatomical structures. We will further investigate this line of research by introducing hierarchical representations of anatomical structures in an expectation-maximization like framework. This new approach enables us to divide a complex segmentation scenario into less difficult sub-problems reducing the scenario's statistical complexity. We will demonstrate the method's strength by segmenting a set of brain MR images into 31 different anatomical structures as well as comparing it to other methods.
Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on; 05/2004
-
[show abstract]
[hide abstract]
ABSTRACT: A novel method of incorporating shape information into the image segmentation process is presented. We introduce a representation for deformable shapes and define a probability distribution over the variances of a set of training shapes. The segmentation process embeds an initial curve as the zero level set of a higher dimensional surface, and evolves the surface such that the zero level set converges on the boundary of the object to be segmented. At each step of the surface evolution, we estimate the maximum a posteriori (MAP) position and shape of the object in the image, based on the prior shape information and the image information. We then evolve the surface globally, towards the MAP estimate, and locally, based on image gradients and curvature. Results are demonstrated on synthetic data and medical imagery, in 2D and 3D.
Biomedical Imaging, 2002. 5th IEEE EMBS International Summer School on; 07/2002
-
[show abstract]
[hide abstract]
ABSTRACT: A method is presented for segmentation of anatomical structures that incorporates prior information about the intensity and curvature profile of the structure from a training set of images and boundaries. Specifically, we model the intensity distribution as a function of signed distance from the object boundary, instead of modeling only the intensity of the object as a whole. A curvature profile acts as a boundary regularization term specific to the shape being extracted, as opposed to simply penalizing high curvature. Using the prior model, the segmentation process estimates a maximum a posteriori higher dimensional surface whose zero level set converges on the boundary of the object to be segmented. Segmentation results are demonstrated on synthetic data and magnetic resonance imagery.
Biomedical Imaging, 2002. 5th IEEE EMBS International Summer School on; 07/2002
-
[show abstract]
[hide abstract]
ABSTRACT: This paper describes a set of representations of gait appearance features for the purpose of person identification. Our gait
representation has two stages: the first stage computes a set of image features that are based on moments extracted from orthogonal
view video silhouettes of human walking motion; the second stage applies three methods of aggregating these image features
over time to create the gait sequence features. Despite their simplicity, the resulting gait sequence feature vectors contain
enough information to perform well on human identification. We demonstrate the accuracy of recognition using gait video sequences
collected over different days and times, under varying lighting environments and explore the differences in the three time-aggregation
methods for the purpose of recognition.
12/2001: pages 143-154;
-
[show abstract]
[hide abstract]
ABSTRACT: Our goal is to develop a visual monitoring system that passively
observes moving objects in a site and learns patterns of activity from
those observations. For extended sites, the system will require multiple
cameras. Thus, key elements of the system are motion tracking, camera
coordination, activity classification, and event detection. In this
paper, we focus on motion tracking and show how one can use observed
motion to learn patterns of activity in a site. Motion segmentation is
based on an adaptive background subtraction method that models each
pixel as a mixture of Gaussians and uses an online approximation to
update the model. The Gaussian distributions are then evaluated to
determine which are most likely to result from a background process.
This yields a stable, real-time outdoor tracker that reliably deals with
lighting changes, repetitive motions from clutter, and long-term scene
changes. While a tracking system is unaware of the identity of any
object it tracks, the identity remains the same for the entire tracking
sequence. Our system leverages this information by accumulating joint
co-occurrences of the representations within a sequence. These joint
co-occurrence statistics are then used to create a hierarchical
binary-tree classification of the representations. This method is useful
for classifying sequences, as well as individual instances of activities
in a site
IEEE Transactions on Pattern Analysis and Machine Intelligence 09/2000; · 4.91 Impact Factor