A compact set of 3D models are used to assist matching objects between disparate viewpoints.

Source publication

Matching vehicles under large pose transformations using approximate 3D models and piecewise MRF model

Conference Paper

Full-text available

Jun 2008

We propose a robust object recognition method based on approximate 3D models that can effectively match ob- jects under large viewpoint changes and partial occlusion. The specific problem we solve is: given two views of an object, determine if the views are for the same or differ- ent object. Our domain of interest is vehicles, but the ap- proach c...

Context 1

... object recognition has become an in- creasingly important task in security, surveillance and robotics applications. For example, in persistent surveillance over an extended area, object association has to be carried out across videos acquired from multiple types of platforms. Due to the unconstrained conditions in view- ing angle/position, illumination, occlusion, and background clutter, robust recognition is extremely challenging. A large body of work on object recognition has focused on appearance based methods, where either global or local methods have been exploited. Global methods build an object representation by integrating information over an entire image. Global methods [15] take into consideration the entire object attribute, but they are sensitive to viewpoint change, background clutter and occlusion. Local methods [6] represent images as a collection of features extracted based on local information. Recent research based on local invariant features [14, 11, 4] has demon- strated good performance on object recognition under limited viewpoint changes and occlusion. Despite the progress, these approaches still have limited success in many challenging viewing conditions. For example, in the presence of large large scale/viewpoint changes and/or occlusion, only a sparse set of distinguished features can be reliably extracted, and only a small portion of the object is covered with matched features. It is obvious that to increase the discriminative power of any recognition scheme, dense coverage is desirable since it incorporates the iden- tifying evidence from all parts of an object. For this reason, several recent approaches attempt to increase the coverage of local features by expanding the initial set of corresponding features and integrating information from mul- tiple frames [5, 16, 10]. In addition, some geometry constraints such as affine and homography transformations are employed to provide a more comprehensive representation of 3D objects. We reason that when object domain is known, to perform well in unconstrained object recognition tasks, the explicit utilization of 3D models can largely alleviate the problem of feature matching and achieve robust object recognition under large viewpoint changes, occlusion, and background clutter. For example, in the vehicle recognition domain, many 3D vehicle models exist. Detailed 3D models provide rich constraints to match objects reliably. However, to require that an exact model is available for each instance is unrealistic. Furthermore, there can be large variations of object instances in a broad category. How to utilize a compact set of representative 3D models that can provide sufficient constraints for robust object recognition is the main thrust of this paper. In this paper, we propose a robust object recognition method based on approximate 3D models that can effectively match objects under large viewpoint changes, partial occlusion and background clutter. Our domain of interest is vehicles, but the approach can be generalized to other rigid man-made type of objects. As shown in Fig. 1, to match an object seen from two disparate viewpoints (reference and target views), a set of 3D models that are representative for their categories are first chosen. A 3D model (from the set) that is closest to the image object is selected and its 3D poses with respect to both reference and target images are estimated. The approximate 3D model geometry, together with its poses, are utilized to transfer the object appearance features from the reference view to the target view through photo realistic rendering. Our utilization of the 3D model enables us to compute a global appearance model for each semantic part such as windows and doors of a vehicle. The semantic part ownership is used to extrapolate appearance information that is not visible in the reference image. A piecewise Markov Random Field (MRF) model is employed to combine observations obtained from each in- dividual pixel and from the corresponding semantic part. A Belief Propagation (BP) method that reduces the size of required memory is used to solve the MRF model effectively. No training is required in our method, and a realistic object image in a disparate viewpoint can be obtained from as few as just one reference image. Experimental results on manu- facturers’ vehicle data and real data from multiple platforms demonstrate the efficacy of our method. We review related work in Section 2. We introduce the approach in Section 3, and present experimental results in Section 4. We conclude in Section 5. Tremendous progress has been made in recent years in recognizing objects with large variations in viewing conditions by utilizing both object appearance and geometry information [14, 11, 4]. Most methods represent object classes as collections of salient features with some invariant representations of their appearance. Geometry constraints are enforced in a loose or rigid manner to resolve appearance ambiguity and improve recognition performance. In general these methods only produce a sparse set of features that cover a small portion of the entire object, and therefore may miss some important and discriminative regions for re- liable object recognition. Most recently a flurry of research has attempted to en- large the coverage of local feature sets while enforcing geometry constraints in a flexible fashion. Ferrari et al. [5] deal with the presence of background clutter and large viewpoint change by expanding the matching feature set after initial matched features are produced. The set of matched regions are partitioned into groups and integrated by mea- suring the consistency of configurations of groups arising from different model views. Savarese&Fei-Fei [16] recog- nize the class label and pose for each object instance by learning a model for each class. The model consists of a collection of canonical “diagnostic” parts that are viewed in the most frontal position and linked with some geometry consistency constraints. The linkage structure of canonical parts is built with multiple viewpoints. Kushal et al. [10] represent object parts as partial surface models (or PSMs) which are dense, locally rigid assemblies of texture patches. In the model based vehicle recognition domain [13], [6] build 3D generic vehicle models with templates by projecting 2D features to 3D and clustering 3D features over the sequence of frames. [9] employs a 3D generic vehicle model parameterized by 12 length parameters to instantiate different vehicles. Line segments from the image are matched to the 2D model edge segments obtained by projecting a 3D polyhedral model of the vehicle into the image plane. An illumination model is used to handle lighting change and shadows. This method works well when enough image resolution is available. Another model-based approach is proposed in [8]. A simple sedan model and a probabilistic line feature grouping scheme are used for fast vehicle detection. The approach is more suitable for nadir (top) view detection. [18] also uses 3D CAD vehicle models and other sensor modalities for target identification. The number of vehicles of consideration is limited in their application. In [7], a quasi-rigid 3D model is used to establish dense matching from line correspondences. The scheme can reliably match objects up to 30 ∼ 40 0 . The similar 3D model analysis- by-synthesis loop approaches were proposed for face recognition systems also [1, 2]. Markov Random Field (MRF) models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on belief propagation have been found to yield accurate results [19, 20]. Despite recent advances these methods are often too slow for practical use. Several techniques [3] have been proposed to substantially improve the running time of loopy belief propagation. Our approach in spirit is close to [12], where a high resolution face is synthesized from a low-resolution input using a two-step approach that integrates both a global ...

View in full-text

Patch-Based Semantic Labeling of Road Scene Using Colorized Mobile LiDAR Point Clouds

Article

Full-text available

Dec 2015

Semantic labeling of road scenes using colorized mobile LiDAR point clouds is of great significance in a variety of applications, particularly intelligent transportation systems. However, many challenges, such as incompleteness of objects caused by occlusion, overlapping between neighboring objects, interclass local similarities, and computational...

A Novel FDLSR-Based Technique for View-Independent Vehicle Make and Model Recognition

Article

Full-text available

Sep 2023
SENSORS-BASEL

Citation: Hayee, S.; Hussain, F.; Yousaf, M.H. A Novel FDLSR Based Technique for View-Independent Vehicle Make and Model Recognition. Sensors 2023, 23, 7920. https:// Abstract: Vehicle make and model recognition (VMMR) is an important aspect of intelligent transportation systems (ITS). In VMMR systems, surveillance cameras capture vehicle images for real-time vehicle detection and recognition. These captured images pose challenges, including shadows, reflections, changes in weather and illumination, occlusions, and perspective distortion. Another significant challenge in VMMR is the multiclass classification. This scenario has two main categories: (a) multiplicity and (b) ambiguity. Multiplicity concerns the issue of different forms among car models manufactured by the same company, while the ambiguity problem arises when multiple models from the same manufacturer have visually similar appearances or when vehicle models of different makes have visually comparable rear/front views. This paper introduces a novel and robust VMMR model that can address the above-mentioned issues with accuracy comparable to state-of-the-art methods. Our proposed hybrid CNN model selects the best descriptive fine-grained features with the help of Fisher Discriminative Least Squares Regression (FDLSR). These features are extracted from a deep CNN model fine-tuned on the fine-grained vehicle datasets Stanford-196 and BoxCars21k. Using ResNet-152 features, our proposed model outperformed the SVM and FC layers in accuracy by 0.5% and 4% on Stanford-196 and 0.4 and 1% on BoxCars21k, respectively. Moreover, this model is well-suited for small-scale fine-grained vehicle datasets.

ASDFL: An adaptive super‐pixel discriminative feature‐selective learning for vehicle matching

Article

Full-text available

Oct 2022
EXPERT SYST

There are a large number of cameras in modern transportation system that capture numerous vehicle images continuously. Therefore, automatic analysis of these vehicle images is helpful for traffic flow management, criminal investigations and vehicle inspections. Vehicle matching, which aims to determine whether two input images depict an identical vehicle, is one of the core tasks in vehicle analysis. Recent relevant studies have focused on local feature extraction instead of global extraction, since local details can provide crucial cues to distinguish between cars. However, these methods do not select local features; that is, they do not assign weights to local features. Therefore, in this research, we systematically study the vehicle matching task, and present a novel annotation‐free local‐based deep learning method called Adaptive super‐pixel discriminative feature‐selective learning (ASDFL) to address this issue. In ASDFL, vehicle images are segmented into clusters of super‐pixels of similar size by considering the location and colour similarities of pixels without using any component‐level annotation. These super‐pixels are deemed to be the virtual components of vehicles. Moreover, a convolutional neural network is used to extract the deep features of these virtual components. Thereafter, an instance‐specific mask generation module driven by the extracted global features is enhanced to produce a mask to select the most distinctive virtual components of each vehicle image pair in the feature space. Finally, the vehicle matching task is accomplished by classifying the selected virtual component features of each imaged vehicle pair. Extensive experiments on two popular vehicle identification benchmarks demonstrate that our method is 1.57% and 0.8% more accurate than the previous baselines in a vehicle matching task on the VeRi and VehicleID datasets, respectively, which demonstrates the effectiveness of our method.

PerMO: Perceiving More at Once from a Single Image for Autonomous Driving

Preprint

Jul 2020

We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving. Our approach combines the strengths of deep learning and the elegance of traditional techniques from part-based deformable model representation to produce high-quality 3D models in the presence of severe occlusions. We present a new part-based deformable vehicle model that is used for instance segmentation and automatically generate a dataset that contains dense correspondences between 2D images and 3D models. We also present a novel end-to-end deep neural network to predict dense 2D/3D mapping and highlight its benefits. Based on the dense mapping, we are able to compute precise 6-DoF poses and 3D reconstruction results at almost interactive rates on a commodity GPU. We have integrated these algorithms with an autonomous driving system. In practice, our method outperforms the state-of-the-art methods for all major vehicle parsing tasks: 2D instance segmentation by 4.4 points (mAP), 6-DoF pose estimation by 9.11 points, and 3D detection by 1.37. Moreover, we have released all of the source code, dataset, and the trained model on Github.

Vehicle Make and Model Recognition using Bag of Expressions

Article

Full-text available

Feb 2020
SENSORS-BASEL

Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual words. BoE improves the existing power of a bag of words (BOW) approach, including occlusion handling, scale invariance and view independence. The proposed approach extracts features using a combination of different keypoint detectors and a Histogram of Oriented Gradients (HOG) descriptor. An optimized dictionary of expressions is formed using visual words acquired through k-means clustering. The histogram of expressions is created by computing the occurrences of each expression in the image. For classification, multiclass linear support vector machines (SVM) are trained over the BoE-based features representation. The approach has been evaluated by applying cross-validation tests on the publicly available National Taiwan Ocean University-Make and Model Recognition (NTOU-MMR) dataset, and experimental results show that it outperforms recent approaches for VMMR. With multiclass linear SVM classification, promising average accuracy and processing speed are obtained using a combination of keypoint detectors with HOG-based BoE description, making it applicable to real-time VMMR systems.

Re-ranking Vehicle Re-identiﬁcation With Orientation-Guide Query Expansion

Article

Jan 2020
INT J DISTRIB SENS N

Vehicle re-identification, which aims to retrieve information regarding a vehicle from different cameras with non-overlapping views, has recently attracted extensive attention in the field of computer vision owing to the development of smart cities. This task can be regarded as a type of retrieval problem, where re-ranking is important for performance enhancement. In the vehicle re-identification ranking list, images whose orientations are dissimilar to that of the query image must preferably be optimized on priority. However, traditional methods are incompatible with such samples, resulting in unsatisfactory vehicle re-identification performances. Therefore, in this study, we propose a vehicle re-identification re-ranking method with orientation-guide query expansion to optimize the initial ranking list obtained by a re-identification model. In the proposed method, we first find the nearest neighbor image whose orientation is dissimilar to the queried image and then fuse the features of the query and neighbor images to obtain new features for information retrieval. Experiments are performed on two public data sets, VeRi-776 and VehicleID, and the effectiveness of the proposed method is confirmed.

A survey of advances in vision-based vehicle re-identification

Preprint

May 2019

Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods, hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods.

A survey of advances in vision-based vehicle re-identification

Preprint

Apr 2019

Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods , hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods.

A survey of advances in vision-based vehicle re-identification

Article

Mar 2019
COMPUT VIS IMAGE UND

Development of a Multiple-Camera 3D Vehicle Tracking System for Traffic Data Collection at Intersections

Article

Full-text available

Nov 2018
IET INTELL TRANSP SY

In this paper, we propose and develop a multiple-camera 3D vehicle tracking system for traffic data collection at intersections. Assuming a simple 3D cuboid model for the vehicle, the developed system allows 3D vehicle dimension estimation using fusion of information from multiple cameras. Using a common rectangular road pattern, each camera is first individually calibrated and then jointly post-optimised. Then, the developed 3D vehicle tracking system takes synchronised images from multiple cameras as inputs and processes 2D image frames using object segmentation techniques to derive vehicle silhouettes. After 2D vehicle segmentation, objects in the 2D image frames are projected to the 3D real world to allow estimation of vehicle length and width. The height of the object is sought in the image view that would create the top quadrilateral of the vehicle that has the edge furthest away from the vehicle base quadrilateral. With Kalman filter based vehicle tracking, interested traffic data, such as vehicle count, are derived from the vehicle trajectory. Real-world experimental results for an intersection with two cameras have shown that the developed 3D vehicle tracking system can reliably estimate 3D vehicle dimensions and improve accuracy of traffic data collection compared to a single-camera system. © 2019 Institution of Engineering and Technology. All Rights Reserved.

A Survey of Vision-Based Traffic Monitoring of Road Intersections

Article

Full-text available

Apr 2016
IEEE T INTELL TRANSP

Visual surveillance of dynamic objects, particularly vehicles on the road, has been, over the past decade, an active research topic in computer vision and intelligent transportation systems communities. In the context of traffic monitoring, important advances have been achieved in environment modeling, vehicle detection, tracking, and behavior analysis. This paper is a survey that addresses particularly the issues related to vehicle monitoring with cameras at road intersections. In fact, the latter has variable architectures and represents a critical area in traffic. Accidents at intersections are extremely dangerous, and most of them are caused by drivers' errors. Several projects have been carried out to enhance the safety of drivers in the special context of intersections. In this paper, we provide an overview of vehicle perception systems at road intersections and representative related data sets. The reader is then given an introductory overview of general vision-based vehicle monitoring approaches. Subsequently and above all, we present a review of studies related to vehicle detection and tracking in intersection-like scenarios. Regarding intersection monitoring, we distinguish and compare roadside (pole-mounted, stationary) and in-vehicle (mobile platforms) systems. Then, we focus on camera-based roadside monitoring systems, with special attention to omnidirectional setups. Finally, we present possible research directions that are likely to improve the performance of vehicle detection and tracking at intersections.

A compact set of 3D models are used to assist matching objects between disparate viewpoints.

Context in source publication

Similar publications

Citations