January 2025
·
36 Reads
IEEE Access
Existing object detection and annotation methods in surveillance systems often suffer from inefficiencies due to manual labeling and a lack of accurate distance estimation, which limits their effectiveness in large-scale environments. These limitations reduce the speed and accuracy required for real-time surveillance, especially in scenarios that necessitate simultaneous monitoring of multiple feeds. To address these challenges, this paper proposes a framework for automated object detection and annotation, specifically designed for surveillance applications. The framework incorporates both manual and automatic modes, offering flexibility in object labeling. A synthetic data is created by using the blender tool which emulates the real-world security and surveillance environments, is utilized to train a yolo model for fast and accurate object recognition and identification. Moreover, for precise distance estimation a depth estimation model is used for calculating the distance for detected objects. The proposed architecture presents both manual and auto modes for the object detection by incorporating the both models in proposed framework. The auto mode of the proposed architecture increases the efficiency of the large surveillance and monitoring applications while lowering the manual labor. The model’s performance is evaluated with state-of-the-art models to assess its performance for auto detection and recognition. The precision and recall above 90% shows that our fine-tuned model demonstrates improved results on synthetic data. With real time surveillance and risk assessment as the foundation, this method seamlessly overthrows drawbacks of existing methods.