Figure 5 - uploaded by Xiang Bai
Content may be subject to copyright.
Statistics of instances in DOTA. AR denotes the aspect ratio. (a) The AR of horizontal bounding box. (b) The AR of oriented bounding box. (c) Histogram of number of annotated instances per image. 

Statistics of instances in DOTA. AR denotes the aspect ratio. (a) The AR of horizontal bounding box. (b) The AR of oriented bounding box. (c) Histogram of number of annotated instances per image. 

Source publication
Preprint
Full-text available
Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also...

Contexts in source publication

Context 1
... ratio is an essential factor for anchor-based models, such as Faster RCNN [26] and YOLOv2 [25]. We count two kinds of aspect ratio for all the instances in our dataset to provide a reference for bet- ter model design: 1) Aspect ratio of minimally circumscribed horizontal rectangle bounding box, 2) Aspect ratio of original quadrangle bounding box. Fig. 5 illustrates these two types of distribution of aspect ratio for instances in our dataset. We can see that instances varies greatly in aspect ratio. Moreover, there are a large number of instances with a large aspect ratio in our ...
Context 2
... is common for aerial images to contain thousands of instances, which is different from natural images. For example, images in ImageNet [6] contain on the average 2 categories and 2 instances, while MSCOCO contains 3.5 categories and 7.7 instances, respectively. Our dataset is much richer in instances per image, which can be up to 2000. Fig. 5 illustrates the number of instances in our DOTA ...

Similar publications

Conference Paper
Full-text available
Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also...
Article
Full-text available
Land cover information plays an important role in mapping ecological and environmental changes in Earth’s diverse landscapes for ecosystem monitoring. Remote sensing data have been widely used for the study of land cover, enabling efficient mapping of changes of the Earth surface from Space. Although the availability of high-resolution remote sensi...
Article
Full-text available
Deep Learning (DL) based identification and detection of elements in urban spaces through Earth Observation (EO) datasets have been widely researched and discussed. Such studies have developed state-of-the-art methods to map urban features like building footprint or roads in detail. This study delves deeper into combining multiple such studies to i...

Citations

... Especially, Faster-RCNN (Ren et al., 2015) proposes the region proposal network (RPN) to localize possible object instead of traditional sliding window search methods and achieves the state-of-the-art performance in different datasets in terms of accuracy. However, these existing state-of-the-art detectors cannot be directly applied to detect vehicles in aerial images, due to the different characteristics of ground view images and aerial view images (Xia et al., 2017). The appearance of the vehicles are monotone, as shown in Figure 1. ...
Article
Full-text available
The detection of vehicles in aerial images is widely applied in many applications. Comparing with object detection in the ground view images, vehicle detection in aerial images remains a challenging problem because of small vehicle size, monotone appearance and the complex background. In this paper, we propose a novel double focal loss convolutional neural network framework (DFL-CNN). In the proposed framework, the skip connection is used in the CNN structure to enhance the feature learning. Also, the focal loss function is used to substitute for conventional cross entropy loss function in both of the region proposed network and the final classifier. We further introduce the first large-scale vehicle detection dataset ITCVD with ground truth annotations for all the vehicles in the scene. We demonstrate the performance of our model on the existing benchmark DLR 3K dataset as well as the ITCVD dataset. The experimental results show that our DFL-CNN outperforms the baselines on vehicle detection.