A preview of the PDF is not available
Active Scene Recognition for Programming by Demonstration using Next-Best-View Estimates from Hierarchical Implicit Shape Models
Abstract and Figures
We present an approach that combines passive scene understanding with object search in order to recognize scenes in indoor environments that cannot be perceived from a single point of view. Passive scene recognition is done based on spatial relations between objects using Implicit Shape Models. ISMs, a variant of Generalized Hough Transform, are extended to describe scenes as sets of objects with relations lying in- between. Relations are expressed as six Degree-of-Freedom (DoF) relative object poses. They are extracted from sensor recordings of human demonstrations of actions usually taking place in the corresponding scene. In a scene ISMs solely represent relations of n objects towards a common reference. Violations of other relations are not detectable. To overcome this limitation we extend our scene models to binary trees consisting of ISMs using hierarchical agglomerative clustering. Active scene recognition aims to simultaneously detect present scenes and localize objects these scenes consist of. For a pivoting stereo camera rig, we achieve this by performing recognition with hierarchical ISMs in an object search loop using Next-Best-View (NBV) estimation. A criterion, on which we greedily choose views the rig shall adopt next, is the confidence to detect objects in them. In each search step confidence on potential positions of objects, not found yet, is calculated based on the best available scene hypothesis. This is done by partly reversing the basic principle of ISMs and using spatial relations to predict potential object positions starting from objects already detected.
Content may be subject to copyright.