Caroline Pantofaru

Caroline Pantofaru
Google Inc. | Google

About

39
Publications
19,124
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,483
Citations
Additional affiliations
Position
  • Researcher

Publications

Publications (39)
Conference Paper
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cue...
Preprint
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cue...
Preprint
We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person's field-of-view in a given environment and transferring this information onto a reference corpus...
Conference Paper
Full-text available
Understanding group activities from images is an important yet challenging task. This is because there is an exponentially large number of semantic and geometrical relationships among individuals that one must model in order to effectively recognize and localize the group activities. Rather than focusing on directly recognizing group activities as...
Patent
A method for designing a system on a programmable logic device (PLD) is disclosed. Routing resources are selected for a user specified signal on the PLD in response to user specified routing constraints. Routing resources are selected for a non-user specified signal on the PLD without utilizing the user specified routing constraints.
Article
Full-text available
Truly understanding a scene involves integrating information at multiple levels as well as studying the interactions between scene elements. Individual object detectors, layout estimators and scene classifiers are powerful but ultimately confounded by complicated real-world scenes with high variability, different viewpoints and occlusions. We propo...
Article
Human body detection and pose estimation is useful for a wide variety of applications and environments. Therefore a human body detection and pose estimation system must be adaptable and customizable. This paper presents such a system that extracts skeletons from RGB-D sensor data. The system adapts on-line to difficult unstructured scenes taken fro...
Patent
A method for designing a system on a target device utilizing programmable logic devices (PLDs) includes generating options for utilizing resources on the PLDs in response to user specified constraints. The options for utilizing the resources on the PLDs are refined independent of the user specified constraints.
Conference Paper
Recovering the spatial layout of cluttered indoor scenes is a challenging problem. Current methods generate layout hypotheses from vanishing point estimates produced using 2D image features. This method fails in highly cluttered scenes in which most of the image features come from clutter instead of the room's geometric structure. In this paper, we...
Article
Full-text available
In this paper, we present a general framework for tracking multiple, possibly interacting, people from a mobile vision platform. To determine all of the trajectories robustly and in a 3D coordinate system, we estimate both the camera's ego-motion and the people's paths within a single coherent framework. The tracking problem is framed as finding th...
Conference Paper
We describe our experience exhibiting a human-size robot in a museum, encouraging visitors to interact with the robot and even program it to perform a sequence of timed poses. At the museum, users' programs were run on a real robot for all to see. The installation attracted and engaged visitors from age two to adult. The most intuitive of our inter...
Conference Paper
Full-text available
Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At th...
Article
Full-text available
Many modern sensors used for mapping produce 3D point clouds, which are typically registered together using the iterative closest point (ICP) algorithm. Because ICP has many variants whose performances depend on the environment and the sensor, hundreds ...
Article
Full-text available
Assistive mobile manipulators (AMMs) have the potential to one day serve as surrogates and helpers for people with disabilities, giving them the freedom to perform tasks such as scratching an itch, picking up a cup, or socializing with their families.
Conference Paper
The Robots for Humanity project aims to enable people with severe motor impairments to interact with their own bodies and their environment through the use of an assistive mobile manipulator, thereby improving their quality of life. Assistive mobile manipulators (AMMs) are mobile robots that physically manipulate the world in order to provide assis...
Conference Paper
Home and automation are not natural partners--one homey and the other cold. Most current automation in the home is packaged in the form of appliances. To better understand the current reality and possible future of living with other types of domestic technology, we went out into the field to conduct need finding interviews among people who have alr...
Conference Paper
Full-text available
Technologists have long wanted to put robots in the home, making robots truly personal and present in every aspect of our lives. It has not been clear, however, exactly what these robots should do in the home. The difficulty of tasking robots with home chores comes not only from the significant technical challenges, but also from the strong emotion...
Conference Paper
Full-text available
The goal of personal robotics is to create machines that help us with the tasks of daily living, co-habiting with us in our homes and offices. These robots must interact with people on a daily basis, navigating with and around people, and approaching people to serve them. To enable this coexistence, personal robots must be able to detect and track...
Article
Full-text available
The consideration of data set design, collection, and distribution methodology is becoming increasingly important as robots move out of fully controlled settings, such as assembly lines, into unstructured environments. Extensive knowledge bases and data sets will potentially offer a means of coping with the variability inherent in the real world. I...
Conference Paper
Full-text available
As autonomous personal robots come of age, we expect certain applications to be executed with a high degree of repeatability and robustness. In order to explore these applications and their challenges, we need tools and strategies that allow us to develop them rapidly. Serving drinks (i.e., locating, fetching, and delivering), is one such applicati...
Conference Paper
Full-text available
The information available to a robot through a variety of sensors and contextual awareness is rich and unique. In this paper, we have argued that depth and context can improve frontal face detection, in turn improving the ability of robots to interact with humans, and supported this claim with encouraging preliminary experimental results. As future...
Conference Paper
Full-text available
Personal robots operate in human environments such as homes and offices, co-habiting with people. To effectively train robot algorithms for such scenarios, a large amount of training data containing both people and the environment is required. Collecting such data involves taking a robot into new environments, observing and interacting with people....
Conference Paper
Full-text available
Bottleneck between robots and people presents an enormous challenge to the human-robot interaction community. So in addition to robot object learning, task learning, and natural language understanding, this paper proposes designing interfaces that make up for low communication bandwidth by thoughtfully accounting for the constrained capabilities of...
Conference Paper
Full-text available
As robots enter the everyday physical world of people, it is important that they abide by society's unspoken social rules such as respecting people's personal spaces. In this paper, we explore issues related to human personal space around robots, beginning with a review of the existing literature in human-robot interaction regarding the dimensions...
Conference Paper
Full-text available
The joint tasks of object recognition and object segmentation from a single image are complex in their requirement of not only correct classification, but also deciding exactly which pixels belong to the object. Exploring all possible pixel subsets is prohibitively expensive, leading to recent approaches which use unsupervised image segmentation to...
Article
Full-text available
Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segme...
Conference Paper
Full-text available
A popular approach to problems in image classification is to represent the image as a bag of visual words and then employ a classifier to categorize the image. Unfortunately, a significant shortcoming of this approach is that the clus- tering and classification are disconnected. Since the clus- tering into visual words is unsupervised, the represen...
Conference Paper
Full-text available
The continual improvement of object recognition systems has resulted in an increased demand for their application to problems which require an exact pixel-level object segmentation. In this paper, we illustrate an example of an object class recognition and segmentation system which is trained using weakly supervised training data, with the goal of...
Conference Paper
Full-text available
We introduce a method for object class detection and localization which combines regions generated by image segmentation with local patches. Region-based descriptors can model and match regular textures reliably, but fail on parts of the object which are textureless. They also cannot repeatably identify interest points on their boundaries. By incor...
Conference Paper
Full-text available
Despite significant advances in image segmentation techniques, evaluation of these techniques thus far has been largely subjective. Typically, the effectiveness of a new algorithm is demonstrated only by the presentation of a few segmented images and is otherwise left to subjective evaluation by the reader. Little effort has been spent on the desig...
Technical Report
Full-text available
Abstract Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step...
Conference Paper
Full-text available
This paper addresses the problem of extracting information from range and color data acquired by a mobile robot in urban environments. Our approach extracts geometric structures from clouds of 3-D points and regions from the corresponding color images, labels them based on prior models of the objects expected in the environment - buildings in the c...
Conference Paper
Work on real-time hand-gesture recognition for SAVI (stereo active vision interface) is presented. Based on the detection of frontal faces, image regions near the face are searched for the existence of skin-tone blobs. Each blob is evaluated to determine if it it is a hand held in a standard pose. A verification algorithm based on the responses of...
Article
Full-text available
A real-time vision system called SAVI is presented which detects faces in cluttered environ- ments and performs particular active control tasks based on changes in the visual field. It is designed as a Perception-Action-Cycle (PAC), processing sensory data of different kinds and qualities in real-time. Hence, the system is able to react instantaneo...

Network

Cited By