# Daniel P. Huttenlocher's research while affiliated with Cornell University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (113)

This chapter addresses the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to datasets with hundreds of thousands of images and t...

The dramatic growth of social media websites over the last few years has created huge collections of online images and raised new challenges in organizing them effectively. One particularly intuitive way of browsing and searching images is by the geo-spatial location of where on Earth they were taken, but most online images do not have GPS
metadata...

Many of the world's most popular websites catalyze their growth through invitations from existing members. New members can then in turn issue invitations, and so on, creating cascades of member signups that can spread on a global scale. Although these diffusive invitation processes are critical to the popularity and growth of many websites, they ha...

The Web has enabled one of the most visible recent developments in education---the deployment of massive open online courses. With their global reach and often staggering enrollments, MOOCs have the potential to become a major new mechanism for learning. Despite this early promise, however, MOOCs are still relatively unexplored and poorly understoo...

Recent work in structure from motion (SfM) has built 3D models from large collections of images downloaded from the Internet. Many approaches to this problem use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the image collection grows, and can suffer from drift or loc...

An increasingly common feature of online communities and social media sites is a mechanism for rewarding user achievements based on a system of badges. Badges are given to users for particular contributions to a site, such as performing a certain number of actions of a given type. They have been employed in many domains, including news sites like t...

Question answering (Q&A) websites are now large repositories of valuable knowledge. While most Q&A sites were initially aimed at providing useful answers to the question asker, there has been a marked shift towards question answering as a community-driven knowledge creation process whose end product can be of enduring value to a broad audience. As...

There are many settings in which users of a social media application provide evaluations of one another. In a variety of domains, mechanisms for evaluation allow one user to say whether he or she trusts another user, or likes the content they produced, or wants to confer special levels of authority or responsibility on them. Earlier work has studie...

The goal of this work is to detect and track the articulated pose of a human in signing videos of more than one hour in length. In particular we wish to accurately localise hands and arms, despite fast motion and a cluttered and changing background.
We cast the problem as inference in a generative model of the image, and propose a complete model wh...

A novel-tracking algorithm is presented as a computationally feasible, real-time solution to the joint estimation problem of data assignment and dynamic obstacle tracking from a potentially moving robotic platform. The algorithm implements a Rao-Blackwellized particle filter (RBPF) to factorize the joint estimation problem into 1) a data assignment...

We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and...

We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that is not available when considering images in isolation. For...

The spread of influence among individuals in a social net- work can be naturally modeled in a probabilistic framework, but it is challenging to reason about differences between var- ious models as well as to relate these models to actual so- cial network data. Here we consider two of the most fun- damental definitions of influence, one based on a s...

In this paper, we show how to generate a sharp panorama from a set of motion-blurred video frames. Our technique is based on joint global motion estimation and multi-frame deblurring. It also automatically computes the duty cycle of the video, namely the percentage of time between frames that is actually exposure time. The duty cycle is necessary f...

The spread of influence among individuals in a social network can be naturally modeled in a probabilistic framework, but it is challenging to reason about differences between various models as well as to relate these models to actual social network data. Here we consider two of the most fundamental definitions of influence, one based on a small set...

Social media sites are often guided by a core group of committed users engaged in various forms of governance. A crucial aspect of this type of governance is deliberation, in which such a group reaches decisions on issues of importance to the site. Despite its crucial --- though subtle --- role in how a number of prominent social media sites functi...

Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. In contrast to the bulk of research on social networks that has focused almost exclusively on positive interpretations of links between people, we study how the interplay between positive and negative relationships...

We study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism). Such a mix of positive and negative links arise in a variety of online settings; we study datasets from Epinions, Slashdot and Wikipedia. We find that the signs...

With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used such sites as a source of image data, but the test images have been selected and labeled by hand, yielding relatively small validation sets. In this paper we study image cla...

We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analysis based on text tags and image data with structural analysis based on geospatial data. We use the spatial distribution of where people take photos to define a relational...

Many recent techniques for low-level vision problems such as image denoising are formulated in terms of Markov random field
(MRF) or conditional random field (CRF) models. Nonetheless, the role of the underlying graph structure is still not well
understood. On the one hand there are pairwise structures where each node is connected to its local nei...

We present a technique for learning the parameters of a continuous-state Markov random field (MRF) model of optical flow,
by minimizing the training loss for a set of ground-truth images using simultaneous perturbation stochastic approximation (SPSA). The use of SPSA to directly
minimize the training loss offers several advantages over most previou...

The goal of this work is to detect hand and arm positions over continuous sign language video sequences of more than one hour in length. We cast the problem as inference in a generative model of the image. Under this model, limb detection is expensive due to the very large number of
possible configurations each part can assume. We make the followin...

A fundamental open question in the analysis of social networks is to understand the interplay between similarity and social ties. People are similar to their neighbors in a social network for two distinct reasons: first, they grow to resemble their current friends due to social influence; and second, they tend to form new links to others who are al...

We present a random field based model for stereo vision with explicit occlusion labeling in a probabilistic frame- work. The model employs non-parametric cost functions that can be learnt automatically using the structured sup- port vector machine. The learning algorithm enables the training of models that are steered towards optimizing for a parti...

This paper presents a method of learning and recogniz- ing generic object categories using part-based spatial mod - els. The models are multiscale, with a scene component that specifies relationships between the object and surrounding scene context, and an object component that specifies re- lationships between parts of the object. The underlying g...

We present a new class of statistical models for part-based object recognition. These models are explicitly parametrized according
to the degree of spatial structure that they can represent. This provides a way of relating different spatial priors that
have been used in the past such as joint Gaussian models and tree-structured models. By providing...

Belief propagation (BP) has become widely used for low-level vision problems and various inference techniques have been proposed for loopy graphs. These methods typically rely on ad hoc spatial priors such as the Potts model. In this paper we investigate the use of learned mod- els of image structure, and demonstrate the improvements obtained over...

In this paper we investigate a new method of learning part- based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration). This method learns both a model of local part ap- pearance and a model of the spatial relations between those parts. In contrast,...

Model-based recognition methods often use ad hoc techniques to decide if a match of data to a model is correct. Generally an empirically determined threshold is placed on the fraction of model features that must be matched. We instead rigorously derive conditions under which to accept a match. We obtain an expression relating the probability of a m...

Affine transformations of the plane have been used by modelbased recognition systems to approximate the effects of perspective projection. Because the underlying mathematics are based on exact data, in practice various heuristics are used to adapt the methods to real data where there is positional uncertainty. This paper provides a precise analysis...

The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences — political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, on-line groups are be- coming increasingly promine...

Tree structured models have been widely used for determining the pose of a human body, from either 2D or 3D data. While such models can effectively represent the kinematic constraints of the skeletal structure, they do not capture additional constraints such as coordination of the limbs. Tree structured models thus miss an important source of infor...

In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is mod...

Many object recognition systems use a small number of pairings of data and model features to compute the 3D transformation from a model coordinate frame into the sensor coordinate system. With perfect image data, these systems work well. With uncertain image data, however, their performance is less clear. We examine the effects of 2D sensor uncerta...

This paper addresses the problem of segmenting an image into regions. We define a predicate for measuring the evidence for a boundary between two regions using a graph-based representation of the image. We then develop an efficient segmentation algorithm based on this predicate, and show that although this algorithm makes greedy decisions it produc...

This paper addresses the problem of segmenting an image into regions. We define a predicate for measuring the evidence for a boundary between two regions using a graph-based representation of the image. We then develop an efficient segmentation algorithm based on this predicate, and show that although this algorithm makes greedy decisions it produc...

We describe linear-time algorithms for solving a class of problems
that involve transforming a cost function on a grid using spatial
information. These problems can be viewed as a
generalization of classical distance transforms of binary images,
where the binary image is replaced by an arbitrary function on a grid.
Alternatively they can be viewed...

Markov random field models provide a robust and unified framework for early vision problems such as stereo, optical flow and image restoration. Inference algorithms based on graph cuts and belief propagation yield accurate results, but despite recent advances are often still too slow for practical use. In this paper we present new algorithmic techn...

In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artificially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to line...

Usage data at a high-traffic web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link structure. We consider sites that are organized around a set of items available for purchase or download, consider, for example, an e-commerce site or collection of online re...

Usage data at a high-traffic Web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link structure.

Tracking articulated objects in image sequences remains a challenging problem, particularly in terms of the ability to localize the individual parts of an object given self-occlusions and changes in viewpoint. In this paper we propose a two-dimensional spatio-temporal modeling approach that handles both self-occlusions and changes in viewpoint. We...

Usage data at a high-trafc Web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link struc- ture. We consider sites that are organized around a set of items available for purchase or download ó consider for example an e-commerce site or collection of on-line re...

Object recognition from sensory data involves, in part, determining the pose of a model with respect to a scene. A common method for finding an object's pose is the generalized Hough transform, which accumulates evidence for possible coordinate transformations in a parameter space whose axes are the quantized transformation parameters. Large cluste...

A pictorial structure is a collection of parts arranged in a deformable configuration. Each part is represented using a simple appearance model and the deformable configuration is represented by spring-like connections between pairs of parts. While pictorial structures were introduced a number of years ago, they have not been broadly applied to mat...

This paper addresses the problem of segmenting an image into regions. We develop a framework for image segmentation based on the intuition that there should be evidence for a boundary between each pair of neighboring regions. This framework provides precise definitions of what it means for a segmentation to be too coarse or too fine, in terms of bo...

This paper examines the problem of image retrieval from large, heterogeneous image databases. We present a technique that fulfills several needs identified by surveying recent research in the field. This technique fairly integrates a diverse and expandable set of image properties (for example, color, texture, and location) in a retrieval framework,...

We present a framework for tracking rigid objects based on an
adaptive Bayesian recognition technique that incorporates dependencies
between object features. At each frame we find a maximum a posteriori
(MAP) estimate of the object parameters that include positioning and
configuration of non-occluded features. This estimate may be rejected
based on...

We have developed a file format which is well-suited for network applications invovling images of documents. It is extremely compact and flexible, using the emerging JBIG2 standard and the Mixed Raster Content imaging model. DigiPaper is designed for ease of document storage and interchange: it shares image elements across multiple pages, can be cr...

We present a new graph-theoretic approach to the problem of image segmentation. Our method uses local criteria and yet produces results that reflect global properties of the image. We develop a framework that provides specific definitions of what it means for an image to be under- or over-segmented. We then present an e#cient algorithm for computin...

View-based recognition methods, such as those using eigenspace
techniques, have been successful for a number of recognition tasks. Such
approaches, however, are somewhat limited in their ability to recognize
objects that are partly hidden from view or occur against cluttered
backgrounds. In order to address these limitations, we have developed a
vi...

We show that the dynamic Voronoi diagram of k sets of points in the plane, where each set consists of n points moving rigidly, has complexity O(n 2 k 2 s (k)) for some fixed s, where s (n) is the maximum length of a (n; s) Davenport-Schinzel sequence. This improves the result of Aonuma et. al., who show an upper bound of O(n 3 k 4 log k) for the co...

We say that two sets of n points in the plane, P and Q, are views of the same object when there exists a set of n points S ae ! 3 and two distinct planes A and B, such that P is the projection of S onto A and Q is the projection of S onto B. In the case of orthographic projection, we provide an O(n 3 ) algorithm for deciding whether P and Q are vie...

We consider the problem of computing invariant functions of the image of a set of points or line segments in ! 3 under projection. Such functions are in principle useful for machine vision systems, because they allow different images of a given geometric object to be described by an invariant `key'. We show that if a geometric object consists of an...

We describe a segmentation method and associated file format for
storing images of color documents. We separate each page of the document
into three layers, containing the background (usually one or more
photographic images), the text, and the color of the text. Each of these
layers has different properties, making it desirable to use different
com...

We introduce an approach to feature-based object recognition, using maximum a posteriori (MAP) estimation under a Markov random field (MRF) model. This approach provides an efficienct solution for a wide class of priors that explicitly model dependencies between individual features of an object. These priors capture phenomena such as the fact that...

this paper we describe a method for using twodimensional shape information to determine the location of a mobile robot with respect to some visual landmark in the world. The task is for the robot to navigate to a specified target or landmark in its visual field, possibly in the presence of obstacles. The landmark is initially specified either by ma...

This paper gives a method for determining the probability of finding a false positive instance of an object in an image when matching is performed using chains of pixels with associated local information (such as orientation) . We model the matching process between a chain of object pixels and the image as a Markov process, where each state represe...

This paper describes techniques to perform efficient and accurate target recognition in difficult domains. In order to accurately model small, irregularly shaped targets, the target objects and images are represented by their edge maps, with a local orientation associated with each edge pixel. Three dimensional objects are modeled by a set of two-d...

Given two planar sets A and B, we examine the problem of determining the smallest ε such that there is a Euclidean motion (rotation and translation) of A that brings each member of A within distance ε of some member of B. We establish upper bounds on the combinatorial complexity of this subproblem in model-based computer vision, when the sets A and...

. In this paper we describe a new recognition method that uses a subspace representation to approximate the comparison of binary images (e.g. intensity edges) using the Hausdorff fraction. The technique is robust to outliers and occlusion, and thus can be used for recognizing objects that are partly hidden from view and occur in cluttered backgroun...

This paper describes an object recognition system for use in complex imagery that can perform recognition adaptively by setting the matching threshold such that the probability of a false positive is low. In order to accurately model small, irregularly shaped objects, we represent the objects using dense sets of edge pixels with associated local or...

In this paper we describe a new recognition method that uses a subspace representation to approximate the comparison of binary images (e.g. intensity edges) using the Hausdorff fraction. The technique is robust to outliers and occlusion, and thus can be used for recognizing objects that are partly hidden from view and occur in cluttered backgrounds...

In this paper we address the problem of recognizing an object from a novel viewpoint, given a single “model” view of that object. As is common in model-based recognition, objects and images are represented as sets of feature points. We present an efficient algorithm for determining whether two sets of image points (in the plane) could be projection...

This paper describes techniques to perform efficient and accurate recognition in difficult domains by matching edge pixels with associated local orientations. We use a modified Hausdorff measure to determine which positions of each object model are reported as matches. A search strategy is described that allows these positions to be found efficient...

This paper describes techniques to perform efficient and accurate
recognition in difficult domains by matching dense, oriented edge
pixels. We model three-dimensional objects as the set of two-dimensional
views of the object. Translation, rotation, and scaling of the views are
allowed to approximate full three-dimensional motion. A modified
Hausdor...

We describe a method for computing visual correspondence which employs a formal model of the probability of a false match. This model estimates the chance that the best match for each point could have occurred at random. The model is effective at identifying points in one image for which there is no corresponding point in the other image, as occurs...

We describe a method for computing visual correspondence which employs a formal model of the probability of a false match. This model estimates the chance that the best match for each point could have occurred at random. The model is effective at identifying points in one image for which there is no corresponding point in the other image, as occurs...

Affine transformations of the plane have been used in a number of model-based recognition systems. Because the underlying mathematics are based on exact data, in practice various heuristics are used to adapt the methods to real data where there is positional uncertainty. This paper provides a precise analysis of affine point matching under uncertai...

We present a method for navigating a robot from an initial
position to a specified landmark in its visual field, using a sequence
of monocular images. The location of the landmark with respect to the
robot is determined using the change in size and position of the
landmark in the image as the robot moves. The landmark location is
estimated after th...

This paper describes a method for tracking a moving object in an image, when the camera motion is unknown and other moving objects may be in the image. The method is based on matching two-dimensional geometric structures between successive frames of an image sequence. A bitmap representing the object being tracked at one time frame is matched to fe...

The Hausdorff distance measures the extent to which each point of a model set lies near some point of an image set and vice versa. Thus, this distance can be used to determine the degree of resemblance between two objects that are superimposed on one another. Efficient algorithms for computing the Hausdorff distance between all possible relative po...

The Hausdorff distance measures the extent to which each point of a model set lies near some point of an image set and vice versa. An efficient method of computing this distance is developed, based on a multi-resolution tessellation of the space is possible transformations of the model set. One of the key ideas is that entire cells in this tessella...

The authors describe a model-based method for tracking nonrigid
objects moving in a complex scene. The method operates by extracting
two-dimensional models of an object from a sequence of images. The basic
idea underlying the technique is to decompose the image of a solid
object moving in space into two components: a two-dimensional motion and
a tw...

Given a setS ofsources (points or segments) in 211C;d, we consider the surface in 211C;
d+1 that is the graph of the functiond(x)=min
pS
(x, p) for some metric. This surface is closely related to the Voronoi diagram, Vor(S), ofS under the metric. The upper envelope of a set of theseVoronoi surfaces, each defined for a different set of sources, can...

Consider k sets each consisting of n points in the plane, with each set allowed to move rigidly according to some continuous function of time. H. Aonuma, H. Imai, K. Imai and T. Tokuyama [Maximin location of convex objects in a polygon and related dynamic Voronoi diagrams, in: Proc. Sixth. ACM Symp. on Computational Geometry, 225-234 (1990)] have s...

Object recognition systems that use a small number of pairings of
data and model features to compute the 3D transformation from model to
sensor coordinates are considered. The effects of 2D sensor uncertainty
on such computations are examined. The uncertainty in transformation
parameters is bounded, and the effect of this uncertainty on false
posit...

Efficient algorithms are provided for computing the Hausdorff
distance between a binary image and all possible relative positions
(translations) of a model, or a portion of that model. The computation
is in many ways similar to binary correlation. However, it is more
tolerant of perturbations in the locations of points because it measures
proximity...

In an image, there are groups of intensity edges that are likely to have resulted from the same convex object in a scene. A new method for identifying such groups is described here. Groups of edges that form a convex polygonal chain, such as a convex polygon or a spiral, are extracted from a set of image edge fragments. A key property of the method...

Model-based recognition methods generally search for geometrically consistent pairs of model and image features. The quality of an hypothesis is then measured using some function of the number of model features that are paired with image features. The most common approach is to simply count the number of pairs of consistent model and image features...

Includes bibliographical references (p. 350-504) and indexes. W. Eric L. Grimson ; with contributions from Tomá³ Lozano-Pé²¥z, Daniel P. Huttenlocher.

Ordinary cameras gather light across the area of their lens aperture, and the light striking a given subregion of the aperture is structured somewhat differently than the light striking an adjacent subregion. By analyzing this optical structure, one ...

An abstract is not available.

Model-based recognition methods generally use ad hoc techniques to decide whether or not a model of an object matches a given scene. The most common such technique is to set an empirically determined threshold on the fraction of model features that must be matched to data features. Conditions under which to accept a match as correct are rigorously...

A method for identifying groups of intensity edges in an image
that are likely to result from the same convex object in a scene is
described. A key property of the method is that its output is no more
complex than the original image. The method uses a triangulation of
linear edge segments to define a local neighborhood that is scale
invariant. From...

A model-based recognition method that runs in time proportional to
the actual number of instances of a model that are found in an image is
presented. The key idea is to filter out many of the possible matches
without having to explicitly consider each one. This contrasts with the
hypothesize-and-test paradigm, commonly used in model-based recogniti...