Caroline PantofaruGoogle Inc. | Google
Caroline Pantofaru
About
39
Publications
19,124
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,483
Citations
Additional affiliations
Position
- Researcher
Publications
Publications (39)
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cue...
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cue...
We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person's field-of-view in a given environment and transferring this information onto a reference corpus...
Understanding group activities from images is an important yet challenging task. This is because there is an exponentially large number of semantic and geometrical relationships among individuals that one must model in order to effectively recognize and localize the group activities. Rather than focusing on directly recognizing group activities as...
A method for designing a system on a programmable logic device (PLD) is disclosed. Routing resources are selected for a user specified signal on the PLD in response to user specified routing constraints. Routing resources are selected for a non-user specified signal on the PLD without utilizing the user specified routing constraints.
Truly understanding a scene involves integrating information at multiple levels as well as studying the interactions between scene elements. Individual object detectors, layout estimators and scene classifiers are powerful but ultimately confounded by complicated real-world scenes with high variability, different viewpoints and occlusions. We propo...
Human body detection and pose estimation is useful for a wide variety of applications and environments. Therefore a human body detection and pose estimation system must be adaptable and customizable. This paper presents such a system that extracts skeletons from RGB-D sensor data. The system adapts on-line to difficult unstructured scenes taken fro...
A method for designing a system on a target device utilizing programmable logic devices (PLDs) includes generating options for utilizing resources on the PLDs in response to user specified constraints. The options for utilizing the resources on the PLDs are refined independent of the user specified constraints.
Recovering the spatial layout of cluttered indoor scenes is a challenging problem. Current methods generate layout hypotheses from vanishing point estimates produced using 2D image features. This method fails in highly cluttered scenes in which most of the image features come from clutter instead of the room's geometric structure. In this paper, we...
In this paper, we present a general framework for tracking multiple, possibly interacting, people from a mobile vision platform. To determine all of the trajectories robustly and in a 3D coordinate system, we estimate both the camera's ego-motion and the people's paths within a single coherent framework. The tracking problem is framed as finding th...
We describe our experience exhibiting a human-size robot in a museum, encouraging visitors to interact with the robot and even program it to perform a sequence of timed poses. At the museum, users' programs were run on a real robot for all to see. The installation attracted and engaged visitors from age two to adult. The most intuitive of our inter...
Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At th...
Many modern sensors used for mapping produce 3D point clouds, which are typically registered together using the iterative closest point (ICP) algorithm. Because ICP has many variants whose performances depend on the environment and the sensor, hundreds ...
Assistive mobile manipulators (AMMs) have the potential to one day serve as surrogates and helpers for people with disabilities, giving them the freedom to perform tasks such as scratching an itch, picking up a cup, or socializing with their families.
The Robots for Humanity project aims to enable people with severe motor impairments to interact with their own bodies and their environment through the use of an assistive mobile manipulator, thereby improving their quality of life. Assistive mobile manipulators (AMMs) are mobile robots that physically manipulate the world in order to provide assis...
Home and automation are not natural partners--one homey and the other cold. Most current automation in the home is packaged in the form of appliances. To better understand the current reality and possible future of living with other types of domestic technology, we went out into the field to conduct need finding interviews among people who have alr...
Technologists have long wanted to put robots in the home, making robots truly personal and present in every aspect of our lives. It has not been clear, however, exactly what these robots should do in the home. The difficulty of tasking robots with home chores comes not only from the significant technical challenges, but also from the strong emotion...
The goal of personal robotics is to create machines that help us with the tasks of daily living, co-habiting with us in our homes and offices. These robots must interact with people on a daily basis, navigating with and around people, and approaching people to serve them. To enable this coexistence, personal robots must be able to detect and track...
The consideration of data set design, collection, and distribution methodology is becoming increasingly important as robots move out of fully controlled settings, such as assembly lines, into unstructured environments. Extensive knowledge bases and data sets will potentially offer a means of coping with the variability inherent in the real world. I...
As autonomous personal robots come of age, we expect certain applications to be executed with a high degree of repeatability and robustness. In order to explore these applications and their challenges, we need tools and strategies that allow us to develop them rapidly. Serving drinks (i.e., locating, fetching, and delivering), is one such applicati...
The information available to a robot through a variety of sensors and contextual awareness is rich and unique. In this paper, we have argued that depth and context can improve frontal face detection, in turn improving the ability of robots to interact with humans, and supported this claim with encouraging preliminary experimental results. As future...
Personal robots operate in human environments such as homes and offices, co-habiting with people. To effectively train robot algorithms for such scenarios, a large amount of training data containing both people and the environment is required. Collecting such data involves taking a robot into new environments, observing and interacting with people....
Bottleneck between robots and people presents an enormous challenge to the human-robot interaction community. So in addition to robot object learning, task learning, and natural language understanding, this paper proposes designing interfaces that make up for low communication bandwidth by thoughtfully accounting for the constrained capabilities of...
As robots enter the everyday physical world of people, it is important that they abide by society's unspoken social rules such as respecting people's personal spaces. In this paper, we explore issues related to human personal space around robots, beginning with a review of the existing literature in human-robot interaction regarding the dimensions...
The joint tasks of object recognition and object segmentation from a single image are complex in their requirement of not only correct classification, but also deciding exactly which pixels belong to the object. Exploring all possible pixel subsets is prohibitively expensive, leading to recent approaches which use unsupervised image segmentation to...
Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segme...
A popular approach to problems in image classification is to represent the image as a bag of visual words and then employ a classifier to categorize the image. Unfortunately, a significant shortcoming of this approach is that the clus- tering and classification are disconnected. Since the clus- tering into visual words is unsupervised, the represen...
The continual improvement of object recognition systems has resulted in an increased demand for their application to problems which require an exact pixel-level object segmentation. In this paper, we illustrate an example of an object class recognition and segmentation system which is trained using weakly supervised training data, with the goal of...
We introduce a method for object class detection and localization which combines regions generated by image segmentation with local patches. Region-based descriptors can model and match regular textures reliably, but fail on parts of the object which are textureless. They also cannot repeatably identify interest points on their boundaries. By incor...
Despite significant advances in image segmentation techniques, evaluation of these techniques thus far has been largely subjective. Typically, the effectiveness of a new algorithm is demonstrated only by the presentation of a few segmented images and is otherwise left to subjective evaluation by the reader. Little effort has been spent on the desig...
Abstract Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step...
This paper addresses the problem of extracting information from range and color data acquired by a mobile robot in urban environments. Our approach extracts geometric structures from clouds of 3-D points and regions from the corresponding color images, labels them based on prior models of the objects expected in the environment - buildings in the c...
Work on real-time hand-gesture recognition for SAVI (stereo active vision interface) is presented. Based on the detection of frontal faces, image regions near the face are searched for the existence of skin-tone blobs. Each blob is evaluated to determine if it it is a hand held in a standard pose. A verification algorithm based on the responses of...
A real-time vision system called SAVI is presented which detects faces in cluttered environ- ments and performs particular active control tasks based on changes in the visual field. It is designed as a Perception-Action-Cycle (PAC), processing sensory data of different kinds and qualities in real-time. Hence, the system is able to react instantaneo...