Abhijit Kundu's research while affiliated with Georgia Institute of Technology and other places

Publications (5)

We present a path-planning algorithm that leverages a multi-scale representation of the environment. The algorithm works in n dimensions. The information of the environment is stored in a tree representing a recursive dyadic partitioning of the search space. The information used by the algorithm is the probability that a node of the tree corresponds to an obstacle in the search space. The complexity of the proposed algorithm is analyzed and its completeness is shown.
We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable.We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences.
Using the inverse sensor model has been popular in occupancy grid mapping. However, it is widely known that applying the inverse sensor model to mapping requires certain assumptions that are not necessarily true. Even the works that use forward sensor models have relied on methods like expectation maximization or Gibbs sampling which have been succeeded by more effective methods of maximum a posteriori (MAP) inference over graphical models. In this paper, we propose the use of modern MAP inference methods along with the forward sensor model. Our implementation and experimental results demonstrate that these modern inference methods deliver more accurate maps more efficiently than previously used methods.

Citations (117)

... The first group is comprised by road scene models. In most cases, occupancy grid maps are used to represent the surrounding of the vehicle[7,17,18,24]. Typically a grid in bird's eye perspective is defined and used to detect occupied grid cells. ...
... Another approach is to leverage devolu- tional layers similar to variational autoencoder to recon- struct full resolution segmentation [25,47]. Such advance- ment produces a variety of applications such as graph- ics [1,38,50,57], autonomous driving [31,32], and first- person vision [18]. When multiple images are used, co- segmentation is possible, i.e., segmenting common objects. ...
... Using large annotated datasets and deep neu- ral networks, the Computer Vision community has steadily pushed the envelope of what was thought possible, not just for semantic understanding but also in terms of 3D prop- erties of scenes and objects. In particular, Deep learning methods on monocular imagery have proven competitive with multi-sensor approaches for important ill-posed in- verse problems like 3D object detection ( [3,31,20,34,24], 6D pose tracking [30,40], depth prediction [9,11,13,42,33], or shape recovery [18,23]. These improvements have been mainly accomplished by incorporating strong implicit or explicit priors that regularize the underconstrained out- put space towards geometrically-coherent solutions. ...
... Other algorithms have been developed using multi-resolution maps, but they are often applied to a given non-uniform grid, without using the information at different resolution scales for the same region of the search space [4], [9]. More recently, the MSPP algorithm [5] extended the work of [2] to n dimensions in a reformulation using 2 n trees instead of wavelets. The notion of ε-obstacles guaranteeing completeness for any value of the threshold ε was also introduced in the same paper. ...

Top co-authors (12)

James Rehg
  • Georgia Institute of Technology
Yin Li
  • Carnegie Mellon University
Frank Dellaert
  • Georgia Institute of Technology
Jason Corso
  • University of Michigan
Panagiotis Tsiotras
  • Georgia Institute of Technology
Vikas Dhiman
  • University at Buffalo, The State University of New York
Fuxin Li
  • Oregon State University
Agata Rozga
  • Georgia Institute of Technology
Rebecca Merrill Jones
  • Weill Cornell Medical College

Publication Stats

Citations
117