August 2019
·
2,369 Reads
Object detection is a computer vision technique for locating instances of objects in images or videos. It basically deals with the detection of instances of semantic objects of a certain class in digital images and videos. It is a spine of a lot of practical applications of computer vision including image retrieval, self-driving cars, face recognition, object tracking, video surveillance, etc. Hence, object detection is significantly encompassing many fields in today's world. Object detection can be achieved through traditional machine learning approaches which are histogram of oriented gradients (HoG) or scale-invariant feature transform (SIFT) features and also through various deep learning approaches which include two broad categories. First is an architecture which uses two neural networks which includes region proposals (R-CNN, Fast R-CNN & Faster R-CNN) & second is single shot detectors which includes You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD). YOLO and SSD are way faster than RCNN and its derivatives. YOLO basically uses Darknet for feature extraction followed by convolutional layers for object localization while SSD uses VGG-16 for feature extraction. Though the problem of object detection is gaining the attention of the research community, most of the works have concentrated on improving current object detection algorithms. Detection of objects on unseen classes for which the networks were never trained has been overlooked. In this work, an attempt has been made to understand the YOLO architecture and answer various questions related to it and also to improve the existing single shot detectors like YOLO and SSD to classify unseen classes in real time by incremental learning. This can prove very robust as it is very difficult retrain these huge convolutional networks as and when new classes are added, that too in real time.