Figure 1 - uploaded by Supun Samarasekera
Content may be subject to copyright.
A compact set of 3D models are used to assist matching objects between disparate viewpoints. 

A compact set of 3D models are used to assist matching objects between disparate viewpoints. 

Source publication
Conference Paper
Full-text available
We propose a robust object recognition method based on approximate 3D models that can effectively match ob- jects under large viewpoint changes and partial occlusion. The specific problem we solve is: given two views of an object, determine if the views are for the same or differ- ent object. Our domain of interest is vehicles, but the ap- proach c...

Context in source publication

Context 1
... object recognition has become an in- creasingly important task in security, surveillance and robotics applications. For example, in persistent surveillance over an extended area, object association has to be carried out across videos acquired from multiple types of platforms. Due to the unconstrained conditions in view- ing angle/position, illumination, occlusion, and background clutter, robust recognition is extremely challenging. A large body of work on object recognition has focused on appearance based methods, where either global or local methods have been exploited. Global methods build an object representation by integrating information over an entire image. Global methods [15] take into consideration the entire object attribute, but they are sensitive to viewpoint change, background clutter and occlusion. Local methods [6] represent images as a collection of features extracted based on local information. Recent research based on local invariant features [14, 11, 4] has demon- strated good performance on object recognition under limited viewpoint changes and occlusion. Despite the progress, these approaches still have limited success in many challenging viewing conditions. For example, in the presence of large large scale/viewpoint changes and/or occlusion, only a sparse set of distinguished features can be reliably extracted, and only a small portion of the object is covered with matched features. It is obvious that to increase the discriminative power of any recognition scheme, dense coverage is desirable since it incorporates the iden- tifying evidence from all parts of an object. For this reason, several recent approaches attempt to increase the coverage of local features by expanding the initial set of corresponding features and integrating information from mul- tiple frames [5, 16, 10]. In addition, some geometry constraints such as affine and homography transformations are employed to provide a more comprehensive representation of 3D objects. We reason that when object domain is known, to perform well in unconstrained object recognition tasks, the explicit utilization of 3D models can largely alleviate the problem of feature matching and achieve robust object recognition under large viewpoint changes, occlusion, and background clutter. For example, in the vehicle recognition domain, many 3D vehicle models exist. Detailed 3D models provide rich constraints to match objects reliably. However, to require that an exact model is available for each instance is unrealistic. Furthermore, there can be large variations of object instances in a broad category. How to utilize a compact set of representative 3D models that can provide sufficient constraints for robust object recognition is the main thrust of this paper. In this paper, we propose a robust object recognition method based on approximate 3D models that can effectively match objects under large viewpoint changes, partial occlusion and background clutter. Our domain of interest is vehicles, but the approach can be generalized to other rigid man-made type of objects. As shown in Fig. 1, to match an object seen from two disparate viewpoints (reference and target views), a set of 3D models that are representative for their categories are first chosen. A 3D model (from the set) that is closest to the image object is selected and its 3D poses with respect to both reference and target images are estimated. The approximate 3D model geometry, together with its poses, are utilized to transfer the object appearance features from the reference view to the target view through photo realistic rendering. Our utilization of the 3D model enables us to compute a global appearance model for each semantic part such as windows and doors of a vehicle. The semantic part ownership is used to extrapolate appearance information that is not visible in the reference image. A piecewise Markov Random Field (MRF) model is employed to combine observations obtained from each in- dividual pixel and from the corresponding semantic part. A Belief Propagation (BP) method that reduces the size of required memory is used to solve the MRF model effectively. No training is required in our method, and a realistic object image in a disparate viewpoint can be obtained from as few as just one reference image. Experimental results on manu- facturers’ vehicle data and real data from multiple platforms demonstrate the efficacy of our method. We review related work in Section 2. We introduce the approach in Section 3, and present experimental results in Section 4. We conclude in Section 5. Tremendous progress has been made in recent years in recognizing objects with large variations in viewing conditions by utilizing both object appearance and geometry information [14, 11, 4]. Most methods represent object classes as collections of salient features with some invariant representations of their appearance. Geometry constraints are enforced in a loose or rigid manner to resolve appearance ambiguity and improve recognition performance. In general these methods only produce a sparse set of features that cover a small portion of the entire object, and therefore may miss some important and discriminative regions for re- liable object recognition. Most recently a flurry of research has attempted to en- large the coverage of local feature sets while enforcing geometry constraints in a flexible fashion. Ferrari et al. [5] deal with the presence of background clutter and large viewpoint change by expanding the matching feature set after initial matched features are produced. The set of matched regions are partitioned into groups and integrated by mea- suring the consistency of configurations of groups arising from different model views. Savarese&Fei-Fei [16] recog- nize the class label and pose for each object instance by learning a model for each class. The model consists of a collection of canonical “diagnostic” parts that are viewed in the most frontal position and linked with some geometry consistency constraints. The linkage structure of canonical parts is built with multiple viewpoints. Kushal et al. [10] represent object parts as partial surface models (or PSMs) which are dense, locally rigid assemblies of texture patches. In the model based vehicle recognition domain [13], [6] build 3D generic vehicle models with templates by projecting 2D features to 3D and clustering 3D features over the sequence of frames. [9] employs a 3D generic vehicle model parameterized by 12 length parameters to instantiate different vehicles. Line segments from the image are matched to the 2D model edge segments obtained by projecting a 3D polyhedral model of the vehicle into the image plane. An illumination model is used to handle lighting change and shadows. This method works well when enough image resolution is available. Another model-based approach is proposed in [8]. A simple sedan model and a probabilistic line feature grouping scheme are used for fast vehicle detection. The approach is more suitable for nadir (top) view detection. [18] also uses 3D CAD vehicle models and other sensor modalities for target identification. The number of vehicles of consideration is limited in their application. In [7], a quasi-rigid 3D model is used to establish dense matching from line correspondences. The scheme can reliably match objects up to 30 ∼ 40 0 . The similar 3D model analysis- by-synthesis loop approaches were proposed for face recognition systems also [1, 2]. Markov Random Field (MRF) models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on belief propagation have been found to yield accurate results [19, 20]. Despite recent advances these methods are often too slow for practical use. Several techniques [3] have been proposed to substantially improve the running time of loopy belief propagation. Our approach in spirit is close to [12], where a high resolution face is synthesized from a low-resolution input using a two-step approach that integrates both a global ...

Similar publications

Article
Full-text available
Semantic labeling of road scenes using colorized mobile LiDAR point clouds is of great significance in a variety of applications, particularly intelligent transportation systems. However, many challenges, such as incompleteness of objects caused by occlusion, overlapping between neighboring objects, interclass local similarities, and computational...

Citations

... Model-based vehicle recognition uses the adaptive model [37], the approximate model [38], and the 3D model [39]. In [39], Prokaj and Medioni adopt the model-based approach and project the pose of a 3D CAD vehicle model to a 2D vehicle image to calculate the similarity score. ...
Article
Full-text available
Citation: Hayee, S.; Hussain, F.; Yousaf, M.H. A Novel FDLSR Based Technique for View-Independent Vehicle Make and Model Recognition. Sensors 2023, 23, 7920. https:// Abstract: Vehicle make and model recognition (VMMR) is an important aspect of intelligent transportation systems (ITS). In VMMR systems, surveillance cameras capture vehicle images for real-time vehicle detection and recognition. These captured images pose challenges, including shadows, reflections, changes in weather and illumination, occlusions, and perspective distortion. Another significant challenge in VMMR is the multiclass classification. This scenario has two main categories: (a) multiplicity and (b) ambiguity. Multiplicity concerns the issue of different forms among car models manufactured by the same company, while the ambiguity problem arises when multiple models from the same manufacturer have visually similar appearances or when vehicle models of different makes have visually comparable rear/front views. This paper introduces a novel and robust VMMR model that can address the above-mentioned issues with accuracy comparable to state-of-the-art methods. Our proposed hybrid CNN model selects the best descriptive fine-grained features with the help of Fisher Discriminative Least Squares Regression (FDLSR). These features are extracted from a deep CNN model fine-tuned on the fine-grained vehicle datasets Stanford-196 and BoxCars21k. Using ResNet-152 features, our proposed model outperformed the SVM and FC layers in accuracy by 0.5% and 4% on Stanford-196 and 0.4 and 1% on BoxCars21k, respectively. Moreover, this model is well-suited for small-scale fine-grained vehicle datasets.
... To mitigate this problem, researchers have attempted to accomplish this challenging task in a variety of ways (Guo et al., 2008;Hou et al., 2009;Shan et al., 2008). Traditionally, vehicle matching is done with various types of sensors, such as magnetic sensors and inductive loop detectors. ...
... In recent years, studies on computer vision and intelligent transportation systems have mainly focused on vehicle detection, vehicle tracking and vehicle classification. Therefore, vehicle matching technology has not received extensive attention, and only a small number of studies (Guo et al., 2008;Hou et al., 2009;Shan et al., 2008) have been conducted. Overall, the methods used in these studies are mainly based on the traditional image-processing methods. ...
... Overall, the methods used in these studies are mainly based on the traditional image-processing methods. (Guo et al., 2008) proposed a model-based approach for vehicle matching. They used approximate 3D models to address pose transformation and a piece-wise Markov Random Field model to estimate the texture of occluded parts. ...
Article
Full-text available
There are a large number of cameras in modern transportation system that capture numerous vehicle images continuously. Therefore, automatic analysis of these vehicle images is helpful for traffic flow management, criminal investigations and vehicle inspections. Vehicle matching, which aims to determine whether two input images depict an identical vehicle, is one of the core tasks in vehicle analysis. Recent relevant studies have focused on local feature extraction instead of global extraction, since local details can provide crucial cues to distinguish between cars. However, these methods do not select local features; that is, they do not assign weights to local features. Therefore, in this research, we systematically study the vehicle matching task, and present a novel annotation‐free local‐based deep learning method called Adaptive super‐pixel discriminative feature‐selective learning (ASDFL) to address this issue. In ASDFL, vehicle images are segmented into clusters of super‐pixels of similar size by considering the location and colour similarities of pixels without using any component‐level annotation. These super‐pixels are deemed to be the virtual components of vehicles. Moreover, a convolutional neural network is used to extract the deep features of these virtual components. Thereafter, an instance‐specific mask generation module driven by the extracted global features is enhanced to produce a mask to select the most distinctive virtual components of each vehicle image pair in the feature space. Finally, the vehicle matching task is accomplished by classifying the selected virtual component features of each imaged vehicle pair. Extensive experiments on two popular vehicle identification benchmarks demonstrate that our method is 1.57% and 0.8% more accurate than the previous baselines in a vehicle matching task on the VeRi and VehicleID datasets, respectively, which demonstrates the effectiveness of our method.
... With the development of deep CNNs, researchers have been able to achieve impressive results for semantic segmentation and object detection with supervised or weakly supervised methods. These works represent an object as a parameterized 3D bounding box [22], [23], [24], a coarse wire-frame skeleton [25], [26], a voxel-based representation [27], [28], or select from a small set of exemplar models [29], [30], [31], [32]. In [33], Wang et al. proposed to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. ...
Preprint
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving. Our approach combines the strengths of deep learning and the elegance of traditional techniques from part-based deformable model representation to produce high-quality 3D models in the presence of severe occlusions. We present a new part-based deformable vehicle model that is used for instance segmentation and automatically generate a dataset that contains dense correspondences between 2D images and 3D models. We also present a novel end-to-end deep neural network to predict dense 2D/3D mapping and highlight its benefits. Based on the dense mapping, we are able to compute precise 6-DoF poses and 3D reconstruction results at almost interactive rates on a commodity GPU. We have integrated these algorithms with an autonomous driving system. In practice, our method outperforms the state-of-the-art methods for all major vehicle parsing tasks: 2D instance segmentation by 4.4 points (mAP), 6-DoF pose estimation by 9.11 points, and 3D detection by 1.37. Moreover, we have released all of the source code, dataset, and the trained model on Github.
... Variation in the pose or change in the position of a camera can reduce the performances of those methods wherein the pose of vehicles is estimated using shape features, as in [9], but the authors do not discuss occlusion, lighting changes or viewpoint changes. Model-based recognition of vehicles includes the adaptive model [11], the approximate model [12] and the 3D model [11,13]. In [11], to classify vehicles, a 3D shape deformable vehicle model is used to detect different vehicle parts and then to recover shape information. ...
Article
Full-text available
Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual words. BoE improves the existing power of a bag of words (BOW) approach, including occlusion handling, scale invariance and view independence. The proposed approach extracts features using a combination of different keypoint detectors and a Histogram of Oriented Gradients (HOG) descriptor. An optimized dictionary of expressions is formed using visual words acquired through k-means clustering. The histogram of expressions is created by computing the occurrences of each expression in the image. For classification, multiclass linear support vector machines (SVM) are trained over the BoE-based features representation. The approach has been evaluated by applying cross-validation tests on the publicly available National Taiwan Ocean University-Make and Model Recognition (NTOU-MMR) dataset, and experimental results show that it outperforms recent approaches for VMMR. With multiclass linear SVM classification, promising average accuracy and processing speed are obtained using a combination of keypoint detectors with HOG-based BoE description, making it applicable to real-time VMMR systems.
... Given a query vehicle image, a vehicle re-ID method attempts to find all images containing that vehicle across multiple non-overlapping cameras. It can be seen from the initial sensor-based methods [1][2][3] for re-ID, hand-crafted-feature-based methods, [4][5][6][7][8][9] and deep-feature-based methods [10][11][12][13][14][15] that the ability to express the acquired features from images is rapidly improving. However, owing to the range of camera capture angles, the orientations of the vehicle images may differ, and these vehicle images often have differences in visual effects. ...
Article
Vehicle re-identification, which aims to retrieve information regarding a vehicle from different cameras with non-overlapping views, has recently attracted extensive attention in the field of computer vision owing to the development of smart cities. This task can be regarded as a type of retrieval problem, where re-ranking is important for performance enhancement. In the vehicle re-identification ranking list, images whose orientations are dissimilar to that of the query image must preferably be optimized on priority. However, traditional methods are incompatible with such samples, resulting in unsatisfactory vehicle re-identification performances. Therefore, in this study, we propose a vehicle re-identification re-ranking method with orientation-guide query expansion to optimize the initial ranking list obtained by a re-identification model. In the proposed method, we first find the nearest neighbor image whose orientation is dissimilar to the queried image and then fuse the features of the query and neighbor images to obtain new features for information retrieval. Experiments are performed on two public data sets, VeRi-776 and VehicleID, and the effectiveness of the proposed method is confirmed.
... The matching problem is formulated as a same-different classification problem, which aims to compute the probability of vehicle images from two distinct cameras being from the same vehicle or different vehicle(s). Guo et al. (2008) and Hou et al. (2009) proposed 3D models for V-reID. They deal with large variations of pose and illumination in a better way. ...
... Sample images of different vehicles are presented from PKU-VD dataset.are recently published methods in both categories. The 8 handcrafted feature based methods are: the 3d and color information (3DCI)Woesler (2003), the edge-map distances (EMD)Shan et al. (2005), the 3d and piecewise model (3DPM)Guo et al. (2008), the 3d pose and illumination model (3DPIM) Hou et al. (2009), the attribute based model (ABM) Feris et al. (2012), the multi-pose model (MPM) Zheng et al. (2015), the bounding box model (BBM) Zapletal and (2016), and the license number plate (LNP) Watchar and (2017). The 12 deep feature methods are: the progressive vehicle re-identification (PROVID) Liu et al. (2016c), the deep relative distance learning (DRDL) Liu et al. (2016a), the deep color and texture (DCT) Liu et al. (2016b), the orientation invariant model (OIM) Wang et al. (2017), the visual spatio-temporal model (VSTM) Shen et al. (2017), the cross-level vehicle recognition (CLVR) Kanacı et al. (2017), the triplet-wise training (TWT) Zhang et al. (2017), the feature fusing model (FFM) Tang et al. (2017), the deep joint discriminative learning (DJDL) Li et al. (2017b), the Null space based Fusion of Color and Attribute feature (NuFACT) Liu et al. (2018), the multi-view feature (MVF) Zhou et al. (2018), and the group sensitive triplet embedding (GSTE) Bai et al. (2018). ...
Preprint
Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods, hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods.
... The matching problem is formulated as a same-different classification problem, which aims to compute the probability of vehicle images from two distinct cameras being from the same vehicle or different vehicle(s). Guo et al. (2008) and Hou et al. (2009) proposed 3D models for V-reID. They deal with large variations of pose and illumination in a better way. ...
... Sample images of different vehicles are presented from PKU-VD dataset.are recently published methods in both categories. The 8 handcrafted feature based methods are: the 3d and color information (3DCI)Woesler (2003), the edge-map distances (EMD)Shan et al. (2005), the 3d and piecewise model (3DPM)Guo et al. (2008), the 3d pose and illumination model (3DPIM) Hou et al. (2009), the attribute based model (ABM) Feris et al. (2012), the multi-pose model (MPM) Zheng et al. (2015), the bounding box model (BBM) Zapletal and (2016), and the license number plate (LNP) Watchar and (2017). The 12 deep feature methods are: the progressive vehicle re-identification (PROVID) Liu et al. (2016c), the deep relative distance learning (DRDL) Liu et al. (2016a), the deep color and texture (DCT) Liu et al. (2016b), the orientation invariant model (OIM) Wang et al. (2017), the visual spatio-temporal model (VSTM) Shen et al. (2017), the cross-level vehicle recognition (CLVR) Kanacı et al. (2017), the triplet-wise training (TWT) Zhang et al. (2017), the feature fusing model (FFM) Tang et al. (2017), the deep joint discriminative learning (DJDL) Li et al. (2017b), the Null space based Fusion of Color and Attribute feature (NuFACT) Liu et al. (2018), the multi-view feature (MVF) Zhou et al. (2018), and the group sensitive triplet embedding (GSTE) Bai et al. (2018). ...
Preprint
Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods , hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods.
... The matching problem is formulated as a same-different classification problem, which aims to compute the probability of vehicle images from two distinct cameras being from the same vehicle or different vehicle(s). Guo et al. (2008) and Hou et al. (2009) proposed 3D models for V-reID. They deal with large variations of pose and illumination in a better way. ...
... They used color histogram and histogram of oriented gradients followed by linear regressors. These methods (Hou et al., 2009;Guo et al., 2008;Zapletal and Herout, 2016) are computationally expensive due to constraints of 3D models. Moreover, the performance of appearance based approaches is limited due to different colors and shapes of vehicles. ...
... These are recently published methods in both categories. The 8 hand-crafted feature based methods are: the 3d and color information (3DCI) Woesler (2003), the edge-map distances (EMD) (Shan et al., 2005), the 3d and piecewise model (3DPM) (Guo et al., 2008), the 3d pose and illumination model (3DPIM) (Hou et al., 2009), the attribute based model (ABM) (Feris et al., 2012), the multi-pose model (MPM) (Zheng et al., 2015), the bounding box model (BBM) (Zapletal and Herout, 2016), and the license number plate (LNP) (Watchar, 2017). The 12 deep feature methods are: the progressive vehicle re-identification (PROVID) (Liu et al., 2016c), the deep relative distance learning (DRDL) (Liu et al., 2016a), the deep color and texture (DCT) (Liu et al., 2016b), the orientation invariant model (OIM) , the visual spatio-temporal model (VSTM) (Shen et al., 2017), the cross-level vehicle recognition (CLVR) (Kanacı et al., 2017), the triplet-wise training (TWT) (Zhang et al., 2017), the feature fusing model (FFM) , the deep joint discriminative learning (DJDL) (Li et al., 2017b), the Null space based Fusion of Color and Attribute feature (NuFACT) , the multi-view feature (MVF) (Zhou et al., 2018), and the group sensitive triplet embedding (GSTE) (Bai et al., 2018). ...
Article
Vehicle re-identification (V-reID) has become significantly popular in the community due to its applications and research significance. In particular, the V-reID is an important problem that still faces numerous open challenges. This paper reviews different V-reID methods including sensor based methods, hybrid methods, and vision based methods which are further categorized into hand-crafted feature based methods and deep feature based methods. The vision based methods make the V-reID problem particularly interesting, and our review systematically addresses and evaluates these methods for the first time. We conduct experiments on four comprehensive benchmark datasets and compare the performances of recent hand-crafted feature based methods and deep feature based methods. We present the detail analysis of these methods in terms of mean average precision (mAP) and cumulative matching curve (CMC). These analyses provide objective insight into the strengths and weaknesses of these methods. We also provide the details of different V-reID datasets and critically discuss the challenges and future trends of V-reID methods.
... Most 3D models have a common use of geometric components, and they differ mainly in the granularity of the model. For example, when the popular polyhedral model is considered, a vehicle may be characterised by a simple 3D cuboid model with only 6 faces [19,[26][27][28] or 80,000 faces in a very elaborate CAD model as in [29]. Simple models are not able to capture the shape details of the vehicles, while complex CAD models may result in unnecessary computational complexity [25]. ...
Article
Full-text available
In this paper, we propose and develop a multiple-camera 3D vehicle tracking system for traffic data collection at intersections. Assuming a simple 3D cuboid model for the vehicle, the developed system allows 3D vehicle dimension estimation using fusion of information from multiple cameras. Using a common rectangular road pattern, each camera is first individually calibrated and then jointly post-optimised. Then, the developed 3D vehicle tracking system takes synchronised images from multiple cameras as inputs and processes 2D image frames using object segmentation techniques to derive vehicle silhouettes. After 2D vehicle segmentation, objects in the 2D image frames are projected to the 3D real world to allow estimation of vehicle length and width. The height of the object is sought in the image view that would create the top quadrilateral of the vehicle that has the edge furthest away from the vehicle base quadrilateral. With Kalman filter based vehicle tracking, interested traffic data, such as vehicle count, are derived from the vehicle trajectory. Real-world experimental results for an intersection with two cameras have shown that the developed 3D vehicle tracking system can reliably estimate 3D vehicle dimensions and improve accuracy of traffic data collection compared to a single-camera system. © 2019 Institution of Engineering and Technology. All Rights Reserved.
... ViBe algorithm incorporates a memoryless update policy and is resilient to noisy data. b) Model-based segmentation: it consists in identifying possible foreground vehicles in an image by fitting vehicle 2Dpredefined or 3D-projected shapes to image regions, without any knowledge of a background model [25] [90]. However, these direct matching approaches are unrealistic because it is impossible to have a model for all possible vehicles that may be present in the scene. ...
... Once a vehicle hypothesis has been verified, it can be used dynamically as a new template if its correlation score or reliability exceeds a certain threshold [123]. Three-dimensional templates can be projected into 2Dtemplates and matched with images regions [90], along with probabilistic frameworks [92]. In [25], the convex hull for 3D vehicle models were generated in the image. ...
Article
Full-text available
Visual surveillance of dynamic objects, particularly vehicles on the road, has been, over the past decade, an active research topic in computer vision and intelligent transportation systems communities. In the context of traffic monitoring, important advances have been achieved in environment modeling, vehicle detection, tracking, and behavior analysis. This paper is a survey that addresses particularly the issues related to vehicle monitoring with cameras at road intersections. In fact, the latter has variable architectures and represents a critical area in traffic. Accidents at intersections are extremely dangerous, and most of them are caused by drivers' errors. Several projects have been carried out to enhance the safety of drivers in the special context of intersections. In this paper, we provide an overview of vehicle perception systems at road intersections and representative related data sets. The reader is then given an introductory overview of general vision-based vehicle monitoring approaches. Subsequently and above all, we present a review of studies related to vehicle detection and tracking in intersection-like scenarios. Regarding intersection monitoring, we distinguish and compare roadside (pole-mounted, stationary) and in-vehicle (mobile platforms) systems. Then, we focus on camera-based roadside monitoring systems, with special attention to omnidirectional setups. Finally, we present possible research directions that are likely to improve the performance of vehicle detection and tracking at intersections.