Conference PaperPDF Available

Low-cost implementation of bird's-eye view system for camera-on-vehicle

Authors:

Abstract

Many papers concentrate on the 3*3 perspective transformation matrix calculation of bird's view system, but few discussed for the whole system implementation. In this paper, a low-cost bird's-eye view system is proposed, which adopts an elaborate software/hardware cooperative system structure. It can be applied to all kinds of vision-based system in vehicles directly as a ready-made module.
7.1-4
Abstract--Many papers concentrate on the 3*3 perspective
transformation matrix calculation of bird’s view system, but few
discussed for the whole system implementation. In this paper, a
low-cost bird’s-eye view system is proposed, which adopts an
elaborate software/hardware cooperative system structure. It can
be applied to all kinds of vision-based system in vehicles directly
as a ready-made module.
I. INTRODUCTION
In general, camera-on-vehicle has serious perspective effect
as Fig. 1(a). Because of the perspective effect, the driver can’t
feel distance correctly and advanced image processing or
analysis becomes difficult also. Consequently, as Fig. 1(b),
perspective transformation is necessary. The raw images have
to be transferred into bird’s-eye view as Fig. 1(c).
The bird’s-eye view image is the perspective projection of
original image as in Fig. 1. The relationship between pixel (x,
y) of bird’s-eye view image and pixel (u, v) of original image
can be formulated by a 3*3 homography matrix as in
11 12 13
21 22 23
31 32 33
'
'
x aaau
yaaav
waaaw
⎡⎤⎡ ⎤
⎢⎥⎢ ⎥
=
⎢⎥⎢ ⎥
⎢⎥⎢ ⎥
⎣⎦⎣ ⎦
(1)
where x=x '/w ' and y=y '/w ', as in [1].
Many researches have been done to calculate the
transformation matrix. For example, [2] used camera height,
focal length and mounting angle, and [3] used vanishing point
of lanes to calculate it. [4] proposed a vision-based method
which used 3 patterns to get the matrix.
However, all previous papers only discussed for the
transformation matrix. Actually, even though we got the
matrix, a real-time implemented by hardware is also needed.
So this paper addresses the implementation of bird’s-eye view
II. PROPOSED SYSTEM
As Fig. 2, the proposed system includes 3 modules. Module
I calculates transformation matrix by software. Module II
generates the look-up tables (LUTs) of address mapping as an
This work was supported by Hynix Semiconductor, and also
sponsored by Seoul R&BD Program (10560) and ETRI SoC Industry
Development Center, Human Resource Development Project for IT
SoC Architect. The IDEC provide research facilities for this study.
initialization of system. Module III transfers input original
images into bird’s-eye view images by hardware.
A. Perspective transformation matrix calculation
Because the calculation of transformation matrix is not the
emphasis of this paper, we adopt the algorithm of [4] to get the
matrix directly. And because the matrix only needs to be
calculated once at the beginning of system initialization,
module I is implemented by software, but not by hardware.
B. Initialization
After getting the transformation matrix, straightforward
approach of the bird’s-eye view system includes 3 steps of
perspective mapping, shearing and enlarging, an in Fig. 3.
To reduce complexity of implementation, we proposed the
3 optimized algorithms based on the straightforward approach.
(1) Real number multiplication and division are substituted
by using LUT when calculates coordinates by (1). So, as in
Fig.1, forward and backward mapping LUTs are used.
(2) Three steps are combined into one-step transformation,
as shown in Fig. 4. Suppose that the inverse perspective
transformation matrix, shearing matrix and enlarging matrix
are H1, H2 and H3, respectively, as formula (2) shows,
H1=
11 12 13
21 22 23
31 32 33
aaa
aaa
aaa
H2=
1
2
10
10
001
s
s
H3=
1
2
00
00
001
e
e
⎡⎤
⎢⎥
⎢⎥
⎢⎥
⎣⎦
(2)
then combined matrix Hcom can be expressed as follows:
Low-cost Implementation of Bird’s-eye View System for
Camera-on-vehicle
Lin-Bo Luo, In-Sung Koh, Kyeong-Yuk Min, Jun Wang, Jong-Wha Chong, Member, IEEE
Shearing
Perspective
mapping Enlarging
(a)Input image;(b)Bird’s-eye view image;(c)Sheared image;(d)Output image
Fig. 3. Straightforward transformation procedure
Y
N
I. Perspective transformation
matrix calculation
Bilinea
r
interpolation
Bird’s-eye view image
Forward mapping
LUT
Backward mapping
LUT
II. Initialization
Ring
buffer
LUT generator
Initialize?
Control
-Counter
-LUT access
-Address generation
Bilinea
r
interpolation
FIFO
Image data
Memory buses
Address & Internal data
Input image
III. Transformation
ASIC/FPGA
Fig. 2. Structure of proposed bird’s-eye view system
G
round
p
lane
(c)Bird’s-eye
view image
(u,
v
) (x,y)
(a) Perspective
view image
Bird’s-eye view
(Virtual view)
Perspective
view
Ө
h
(
b
Perspective trans
f
ormatio
Fig. 1. Illustration of perspective transformation in a parking lot scene
978-1-4244-4316-1/10/$25.00 ©2010 IEEE
Hcom = H1 H2 H3=
1 11 2 12 2 1 11 12 13
1 21 2 22 2 1 21 22 23
1 31 2 32 2 1 31 32 33
()()
()()
()()
ea Sa e Sa a a
ea Sa e Sa a a
ea Sa e Sa a a
++
⎡⎤
⎢⎥
++
⎢⎥
⎢⎥
++
⎣⎦
(3)
The combination can be done by software and it is
calculated once. This scheme can save hardware cost largely.
(3) Picture size needed for perspective transformation
should be calculated in advance. Suppose image size is m*n,
then the one-step of Fig.4 only needs the last (m-1-v0) rows of
image in Fig. 4 (a). If we pre-evaluate the value of v0, the first
v0 rows of input image should not be saved and processed, and
so memory size and computational cost can be saved.
C. Pipelined hardware structure
As in the module III of Fig. 2, the proposed pipelined
hardware structure includes 4 blocks of ring buffer, control,
interpolation and FIFO.
The ring buffer buffers input pixels to get pixels of a 2*2
widows at a time for bilinear interpolation.
Control unit consists of counter, LUT access and address
generation. Counter counts input pixels. Control unit starts to
work when input reaches the pixel (u0,v0) of Fig. 4(a).
LUT access adopts window operation as in [5]. The 2*1
forward mapping (FWD) and n*1 backward mapping (BWD)
are used here, as shown in Fig. 5.
Fox example, along with the raster-scanning order of input
image, the steps of LUT accessing are the following:
finding the forward mapping of inputs (ui, vi) and (ui+1,vi) by
accessing FWD LUT, that is, (xi, yi) and (xi+1, yi) in output
coordinates system as in Fig. 5(b), selecting the points
which x values are between xi with xi+1 and y values are equal
to roof(yi), i.e., (xi, yi),…,( xi+n-1, yi) in the blue dashed
rectangle of Fig. 5(b), finding their backward mapping
(ui,vi),…,( ui+n-1,vi) in the coordinates system of input image
by accessing BWD LUT as in Fig. 5(a).
After backward mapping, the points (ui, vi),…,( ui+n-1, vi)
are usually located space area among the real input pixels as in
Fig. 5(a), so bilinear interpolations are used to improve output
image quality as in Fig. 6. For interpolation, the coordinates of
backward mapping (ui, vi) are divided into two parts of
integer and offset. Integer part is used to determine the
location of 2*2 window for bilinear interpolation, whereas the
offset is used to be the weight of bilinear interpolation. And to
reduce required system frequency, n interpolations is operated
in parallel as in Fig. 2.
Finally, a FIFO is used to transfer parallel data to sequential
again and synchronize output pixels with input frequency.
III. EXPERIMENT
Our experiment conditions are as follows: CMOS rear
camera with 120o horizontal angle, camera height h=95cm and
mounting angle Ө=45o; display screen is 640*480 pixels.
Thanks to the algorithms described in the above, the memory
size reduced and computational cost saved as compared with
the straightforward algorithm is showed in Table 1.
TABLE 1. COST SAVING BY THE PROPOSED ALGORITHMS
Proposed
al
g
orithms LUT One-step
transformatio
n
Pre-evaluation
Memory or cost
reduction
Real number
multiplication
and division free
Memory size ca.67%,
computational cost
ca. 67%
Memory size
ca. 40%
IV. CONCLUSION
This paper proposed a real-time bird’s-eye view system,
which performs the perspective transformation by a low-cost
pipelined hardware structure. The module can be used in
automobile parking assist system directly and be applied to
other vision-based automobile system.
REFERENCES
[1] G. Wolberg, Digital Image Warping, IEEE computer society press: Los
Alamitos, 1990, pp. 52-56
[2] M. Bertozzi and A. Broggi, “GOLD: A parallel real-time stereo vision
system for generic obstacle and lane detection,” IEEE Trans. Image
Processing, vol. 7, pp. 62-81, 1998.
[3] C. Zhaoxue and S. Pengfei, “Efficient method for camera calibration in
traffic scenes,” Electron. Lett., vol. 40, pp. 368–369, 2004.
[4] H. Kano, et al,“Precise top view image generation without global metric
information,” IEICE Trans. Inf. & Syst., vol. E91-D, pp. 1893-1898, 2008.
[5] S.Oh and G.Kim, “FP GA-based fast image warping with data-
parallelization schemes,” IEEE Trans. On Consumer Electron., vol. 54,
pp. 2053-2059, Nov. 2008
Perspective
transformatio
(u
0 ,v0
)
(a) Original image; (b) Bird’s-eye view image
Fig. 4. Combined one-step perspective transformation
(b)Coordinates system of
bird’s-eye view image
(a) Coordinates system
of input image
Location of
input pixel
Forward mapping
of input pixel
Location of
output pixel
Backward mapping
of output pixel
v Backward mapping
Forward
mapping
(xi+n-1,yi)
(xi,yi)
(ui+n-1,vi)
(xi,yi) (xi+1,yi)
x
(ui,vi)
(ui+1,vi)
(ui,vi)
Fig. 5. 2*1 forward mapping and n*1 backward mapping of LUT access
Input
coordinates
(ui,vi) ui
vi
Floor(ui
)
Floor(vi
)
Offset ai
(ui-floor(ui)) Mult. Add
Mult. Add
Mult.
Subtract
Add FIFO
Output pixels
Vi,j-1
Vi-1,j-1
Vi-1,j
Vi,j
Subtract
Subtract
Counter
L
ine
buffer
(m+2)
Dela
y
Dela
y
Input image
Bilinear
interpolation
Offset
b
i
(vi-floor(vi))
Image data
Memory buses
Address & Internal data
FWM
LUT
BWM
LUT
LUT access
2*1 n*1
FWM BWM
Fig. 6. Proposed hardware structure of LUT access and interpolation
... Bird's-eye view về cơ bản là góc nhìn từ trên xuống của một khung cảnh, giúp cải thiện chất lượng của việc tính khoảng cách giữa các đối tượng được theo dõi. (Luo et al., 2010) Hình ảnh chế độ xem bird's-eye là hình chiếu phối cảnh của hình ảnh gốc như trong Hình 4 (Luo et al., 2010). Mối quan hệ giữa (x, y) của hình ảnh xem qua bird's-eye view và (u, v) của hình ảnh gốc có thể được xây dựng bằng ma trận đồng nhất (homography matrix) 3 * 3, với x = x ' /w ' and y = y ' /w '. (I): Ma trận tọa độ đồng nhất của các điểm ảnh trên hình ảnh gốc, (x g , y g ) là tọa độ của điểm ảnh trên hình ảnh gốc hai chiều với tọa độ đồng nhất là (x g , y g , 1) hay có thể được viết (u/w, v/w, 1) hoặc (u, v, w). ...
... Bird's-eye view về cơ bản là góc nhìn từ trên xuống của một khung cảnh, giúp cải thiện chất lượng của việc tính khoảng cách giữa các đối tượng được theo dõi. (Luo et al., 2010) Hình ảnh chế độ xem bird's-eye là hình chiếu phối cảnh của hình ảnh gốc như trong Hình 4 (Luo et al., 2010). Mối quan hệ giữa (x, y) của hình ảnh xem qua bird's-eye view và (u, v) của hình ảnh gốc có thể được xây dựng bằng ma trận đồng nhất (homography matrix) 3 * 3, với x = x ' /w ' and y = y ' /w '. (I): Ma trận tọa độ đồng nhất của các điểm ảnh trên hình ảnh gốc, (x g , y g ) là tọa độ của điểm ảnh trên hình ảnh gốc hai chiều với tọa độ đồng nhất là (x g , y g , 1) hay có thể được viết (u/w, v/w, 1) hoặc (u, v, w). ...
... Sau khi nhận được ma trận chuyển đổi, đơn giản cách tiếp cận của hệ thống Bird's-eye view bao gồm 3 bước lập bản đồ phối cảnh, cắt và phóng to. Quy trình chuyển đổi được trình bày như Hình 5 (Luo et al., 2010). ...
Article
Full-text available
Bài báo này được thực hiện nhằm nghiên cứu phát hiện và kiểm tra việc tuân thủ các quy định về đeo khẩu trang, giữ khoảng cách xã hội ở các địa điểm đông đúc. Mô hình YOLO được sử dụng để xây dựng thuật toán phát hiện đeo khẩu trang đúng hay không đúng quy định, đồng thời kiểm tra việc giữ khoảng cách xã hội bằng việc sử dụng thuật toán tính khoảng cách Euclid giữa các khung bao quanh người được phát hiện trong hình ảnh, kết hợp thuật toán chuyển đổi Bird’s-eye view. Tập dữ liệu được sử dụng bao gồm 40 hình ảnh với hai đối tượng người được phân loại thực tế theo khoảng cách đứng với nhau: lớn hơn hoặc bằng 2 m và nhỏ hơn 2 m. Đồng thời, mỗi đối tượng người trong hình ảnh được phân loại thành ba lớp: đeo khẩu trang đúng hay không đúng và không đeo khẩu trang. Kết quả thử nghiệm khoảng cách đối tượng đạt 90% và nhận diện đối tượng đeo khẩu trang có kết quả như sau: 86,67% đeo khẩu trang đúng, 76,67% không đeo khẩu trang và 65% đeo khẩu trang...
... This paper focuses on applying the IPT technique to the tunnel CCTV image, referred to as the original image (OI), and this involves the initial establishment of a region of interest (ROI) [29]. The ROI is defined as a quadrangular shape, conforming to the road width and CCTV installation spacing when the image is stretched and observed on a horizontal plane, as illustrated in Figure 2a [29]. ...
... This paper focuses on applying the IPT technique to the tunnel CCTV image, referred to as the original image (OI), and this involves the initial establishment of a region of interest (ROI) [29]. The ROI is defined as a quadrangular shape, conforming to the road width and CCTV installation spacing when the image is stretched and observed on a horizontal plane, as illustrated in Figure 2a [29]. In this context, Figure 2b serves as an OI coordinate system, its origin is at the top left corner, the rightward horizontal direction is denoted as the x axis, and the downward vertical direction is denoted as the y axis. ...
Article
Full-text available
CCTVs are commonly used for traffic monitoring and accident detection, but their images suffer from severe perspective distortion causing object size reduction with distance. This issue is exacerbated in tunnel CCTVs, positioned low due to space constraints, leading to challenging object detection, especially for distant small objects, due to perspective effects. To address this, this study proposes a solution involving a region of interest setup and an inverse perspective transformation technique. The transformed images, achieved through this technique, enlarge distant objects, maintaining object detection performance and appearance velocity across distances. To validate this, artificial CCTV images were generated in a virtual tunnel environment, creating original and transformed image datasets under identical conditions. Comparisons were made between the appearance velocity and object size of individual vehicles and for deep learning model performance with multiple moving vehicles. The evaluation was conducted across four distance intervals (50 m to 200 m) from the tunnel CCTV location. The results reveal that the model using original images experiences a significant decline in object detection performance beyond 100 m, while the transformed image-based model maintains a consistent performance up to the distance of 200 m.
... After that, the top view of the given image is obtained. The technique is applied by defining four coordinate points in the front view image (Luo et al., 2010), then the perspective of the image is adjusted to the needed view as shown in Fig. 4. As a result, the detected humans were marked in a circle of their initial form perspective. Accordingly, more precise detection results have been obtained utilizing this bird's-eye view standpoint technique. ...
Conference Paper
The objective of this study is to offer a YOLOv5 deep learning-based system for social distance monitoring. The YOLOv5 model has been used to detect humans in real- time video frames, and to obtain information on the detected bounding box for the bird’s eye view perspective technique. The pairwise distances of the identified bounding box centroid of people are calculated by utilizing Euclidean distance. In addition, a threshold value has been set and applied as an approximation of social distance to pixels for determining social distance violations between people. The effectiveness of this proposed system is tested by experiments on different four video frames. The suggested system’s performance showed a high level of efficiency in monitoring social distancing accurately up to 100%.
... Illustration of a transformation of a perspective from Camera-on-car to Bird's-eye View perspective[1] ...
Conference Paper
Full-text available
Autonomous car research has been conducted for a long time. One of the most important systems in an autonomous car is the navigation system. One of the most commons ways to navigate is by detecting the lanes of the road the cars are going through. It can be done using computer vision using canny edge detection and Hough Lines Transform. However, problems occur when the car is going to make decisions on where to turn because of the perspective of cameras that are usually put in front of the car. One of the ways to solve this problem is by transforming the image into Bird's Eye View perspective. By using Bird's Eye view perspective, we can see the environment surrounding the car from above. This will help on making the decision-making algorithms because now the car can have a broader view of the environment, such as turning left or right and also detecting obstacles in front of the car.
... The method used for obtaining the bird's eye view is referred to as Inverse Perspective Mapping (IPM) [19]. The method works by taking as input a frontal view, applying a homography transformation and mapping of the pixels to a top-view 2D frame. ...
Conference Paper
In the last decades Advanced Driving Assistance System are rapidly growing and offering new features to drivers in the direction of understanding and modeling the environment with the sole goal of achieving Autonomous Driving. In this paper, we present a novel method for detecting intersection based on a mono camera system using features generated by Hu-moments and a Random Forest (RF) classifier. This pipeline is enriched with information regarding the environment like road landmarks. Our approach obtained decent results in a wide range of scenes that were created to cover all possible scenarios that an autovehicle normally can occur on public roads.
Chapter
Bird’s eye view (BEV) perception refers to transforming a perspective view into a bird’s eye view and performing perception tasks such as 3D detection, map segmentation, and motion prediction. Due to its inherent advantages in representing 3D space, fusing multi-modal data, facilitating decision making, and aiding in path planning, BEV perception has garnered significant attention from both academia and industry. In this chapter, we present an overview of BEV perception, covering its definition, categorization, benefits, and mathematical foundations. We then provide a comprehensive review of the current state-of-the-art research, datasets, evaluation metrics, and industrial applications. In the end, we outline several existing challenges and present our own conclusions regarding BEV perception.
Chapter
Aiming at the spliced image quality of 360 degree panoramic view equipment, an image quality evaluation method for 360 degree panoramic view equipment was proposed. Use 360 degree panoramic view stitching image quality evaluation software to evaluate the stitched image quality of 360 degree panoramic view equipment. Based on deep learning, the software generates training data and test data to construct YOLOv3 network; according to YOLOv3 network, it calculates the proportion of splicing loss, proportion of splicing ghosting, splicing dislocation length, and splicing gap width of panoramic spliced images and provides a scientific evaluation method for 360 degree panoramic view equipment.KeywordsMosaic image qualityDeep learningYOLOv3 network
Chapter
COVID-19 disease discovered from the novel corona virus can spread through close contact with a COVID-19 infected person. One of the measures advised to contain the spread of the virus is to maintain social distancing by minimizing contact between potentially infected individuals and healthy individuals or between population groups with high rates of transmission and population groups with no or low-levels of transmission. Motivated by this practice, we propose a deep learning framework for social distance detection and monitoring using surveillance video that can aid in reducing the impact of COVID-19 pandemic. This work utilizes YOLO, Detectron2 and DETR pre-trained models for detecting humans in a video frame to obtain bounding boxes and their coordinates. Bottom-centre points of the boxes were determined and were then transformed to top-down view for accurate measurement of distances between the detected humans. Based on the depth of each bottom-centre point estimated using monodepth2, dynamic distance between pairs of bounding boxes and corresponding distance threshold (safe distance) to prevent violation of social distancing norm were computed. Bounding boxes which violate the distance threshold were categorized as unsafe. All the experiments were conducted on publicly available Oxford Town Center, PETS2009 and VIRAT dataset. Results showed that Detectron2 with top-down view transformation and distance thresholding using pixel depth estimation outperformed other state-of-the-art models. The major contribution of this work is the estimation and integration of variable depth information in obtaining the distance threshold for evaluating social distances between humans in videos.KeywordsCOVID-19Social distanceObject detectionPerspective transformationDepth estimationDetectron2Monodepth2
Chapter
With the continuous improvement of automobile intelligence, the delay, synchronization, reliability, security and concurrency of on-board network are required to be higher. In this paper, the key technologies of the new generation of vehicle-mounted wireless network are studied. According to the characteristics of in-vehicle data communication of intelligent vehicles, the network architecture, basic characteristics of physical layer, synchronization and access, resource allocation, channel coding and modulation are deeply studied and designed.
Article
In this paper, we present an FPGA-based fast image warping method by applying data parallelization schemes. The parallelization of accesses to pixels relieves not only latency problem of the warping, but also bandwidth requirements of off-chip memory. The LUT data parallelization scheme efficiently replaces parallel arithmetic operations with neither of increased memory size for LUT entries nor clock frequency. Two implementations with different characteristics prove the effectiveness and efficiency of the proposed method.
Article
We describe a practical and precise calibration method for generating a top view image that is transformed so that a planar object such as the road can be observed from a direction perpendicular to its surface. The geometric relation between the input and output images is described by a 3×3 homography matrix. Conventional methods use large planar calibration patterns to achieve precise transformations. The proposed method uses much smaller element patterns that are placed in arbitrary positions within the view of the camera. One of the patterns is used to obtain an initial homography. Then, the information from all of the patterns is used by a non-linear optimization scheme to reach a global optimum homography. The experiment done to evaluate the method showed that the precision of the proposed method is comparable to that of the conventional method where a large calibration pattern is used, making it more practical for automotive applications.
Article
This best-selling, original text focuses on image reconstruction, real-time texture mapping, separable algorithms, two-pass transforms, mesh warping, and special effects. The text, containing all original material, begins with the history of the field and continues with a review of common terminology, mathematical preliminaries, and digital image acquisition. Later chapters discuss equations for spatial information, interpolation kernels, filtering problems, and fast-warping techniques based on scanline algorithms.
Article
An efficient method for camera calibration in traffic scenes based on the use of the vanishing line is presented. The approach uses only a set of three parallel edges with identical known distance and a perpendicular line to gain the parameters of the orientation, position and the focal length of the camera.
Article
This paper describes the generic obstacle and lane detection system (GOLD), a stereo vision-based hardware and software architecture to be used on moving vehicles to increment road safety. Based on a full-custom massively parallel hardware, it allows to detect both generic obstacles (without constraints on symmetry or shape) and the lane position in a structured environment (with painted lane markings) at a rate of 10 Hz. Thanks to a geometrical transform supported by a specific hardware module, the perspective effect is removed from both left and right stereo images; the left is used to detect lane markings with a series of morphological filters, while both remapped stereo images are used for the detection of free-space in front of the vehicle. The output of the processing is displayed on both an on-board monitor and a control-panel to give visual feedbacks to the driver. The system was tested on the mobile laboratory (MOB-LAB) experimental land vehicle, which was driven for more than 3000 km along extra-urban roads and freeways at speeds up to 80 km/h, and demonstrated its robustness with respect to shadows and changing illumination conditions, different road textures, and vehicle movement