Conference Paper

System approach for multi-purpose representations of traffic scene elements

Honda Res. Inst. Eur. GmbH, Offenbach, Germany
DOI: 10.1109/ITSC.2010.5625234 Conference: Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on
Source: IEEE Xplore


A major step towards intelligent vehicles lies in the acquisition of an environmental representation of sufficient generality to serve as the basis for a multitude of different assistance-relevant tasks. This acquisition process must reliably cope with the variety of environmental changes inherent to traffic environments. As a step towards this goal, we present our most recent integrated system performing object detection in challenging environments (e.g., inner-city or heavy rain). The system integrates unspecific and vehicle-specific methods for the detection of traffic scene elements, thus creating multiple object hypotheses. Each detection method is modulated by optimized models of typical scene context features which are used to enhance and suppress hypotheses. A multi-object tracking and fusion process is applied to make the produced hypotheses spatially and temporally coherent. In extensive evaluations we show that the presented system successfully analyzes scene elements under diverse conditions, including challenging weather and changing scenarios. We demonstrate that the used generic hypothesis representations allow successful application to a variety of tasks including object detection, movement estimation, and risk assessment by time-to-contact evaluation.


Available from: Julian Eggert
  • Source
    • "Pedestrian detection has been researched for decades, however it seems that the state-of-the-art approaches [6], [7], [8], [9], [10], [11] are reaching a boundary that is not easy to break[12]. Most approaches include tracking, usually in various flavors of Kalman or particle filtering[13], [14], [15], [16], [17], [18], [19]. Both tracking methods perform a kind of " late fusion " , combining detection results with a motion model, the latter having no infleucne at all on detection process. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we propose a visual pedestrian detection system which couples pedestrian appearance and pedestrian motion in a Bayesian fashion, with the goal of making detection more invariant to appearance changes. More precisely, the system couples dense appearance-based pedestrian likelihoods derived from a sliding-window SVM detector to spatial prior distributions obtained from the prediction step of a particle filter based pedestrian tracker. This mechanism, which we term dynamic attention priors (DAP), is inspired by recent results on predictive visual attention in humans and can be implemented at negligible computational cost. We prove experimentally, using a set of public, annotated pedestrian sequences, that detection performance is improved significantly, especially in cases where pedestrians differ from the learned models, e.g., when they are too small, have an unusual pose or occur before strongly structured backgrounds. In particular, dynamic attention priors allow to use more restrictive detection thresholds without losing detections while minimizing false detections.
    2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC); 10/2014
  • Source
    • "Additionally, many approaches apply context priors to the detected objects in hindsight[10], [8],excluding all detections not compatible with context priors. Bayesian fusion of detection likelihoods and context priors may also be performed, with subsequent decision about detections [9], [21], [15]. Common to all of these approaches is the training of generic detectors which are trained and evaluated independently and only combined with context after training which usually improves results considerably. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an approach for making use of scene or situation context in object detection, aiming for state-of-the-art performance while dramatically reducing computational cost. While existing approaches are inspired by Bayes' rule, training context-independent detectors and combining them with context priors in hindsight, we propose to integrate these context priors into detector design itself, through algorithmic choices and/or pre-selection of training examples. Although such restricted detectors will, as a consequence, be valid only in regions compatible with context priors, the corresponding simplification of the object-vs-background decision problem will lead to reduced computation time and/or increased detection performance. We verify this experimentally by analyzing vehicle detection performance in a realistically simulated inner-city environment where context priors are defined by a road surface mask obtained from the simulation tool. Comparing a restricted detector, based on horizontal edges detection refined by neural network confirmation, to a generic HOG+SVM-based approach which takes into account the road context prior, we show that the restricted detector shows superior vehicle detection performance at a vastly reduced computational cost. We show qualitative results that permit the conclusion that the restricted detector will perform well on real-world scenes if appropriate road context priors are available.
    2014 IEEE Intelligent Vehicles Symposium (IV); 06/2014
  • Source
    • "The investigations described in this article are based on SamSys, a large-scale vehicle detection system in road traffic environments [5] [26] which integrates multimodal information (laser, video) as well as a wide variety of vision-based subsystems such as visual object detection, stereo processing, visual tracking and free-area detection. At the top of the SamSys processing hierarchy, there is a module implementing system-level learning as outlined in Sec. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we explore the potential contribution of multimodal context information to object detection in an “intelligent car”. The used car platform incorporates subsystems for the detection of objects from local visual patterns, as well as for the estimation of global scene properties (sometimes denoted “scene context” or just “context”) such as the shape of the road area or the 3D position of the ground plane. Annotated data recorded on this platform is publicly available as the “HRI RoadTraffic” vehicle video dataset, which forms the basis for this investigation.In order to quantify the contribution of context information, we investigate whether it can be used to infer object identity with little or no reference to local patterns of visual appearance. Using a challenging vehicle detection task based on the “HRI RoadTraffic” dataset, we train selected algorithms (“context models”) to estimate object identity from context information alone. In the course of our performance evaluations, we also analyze the effect of typical real-world conditions (noise, high input dimensionality, environmental variation) on context model performance.As a principal result, we show that the learning of context models is feasible with all tested algorithms, and that object identity can be estimated from context information with similar accuracy as by relying on local pattern recognition methods. We also find that the use of basis function representations[1] (also known as “population codes”) allows the simplest (and therefore most efficient) learning methods to perform best in the benchmark, suggesting that the use of context is feasible even in systems operating under strong performance constraints.
    Neurocomputing 10/2012; 94:77–86. DOI:10.1016/j.neucom.2012.03.008 · 2.08 Impact Factor
Show more