Conference Paper

An assault detection system based on human Pose Tracking for video surveillance

To read the full-text of this research, you can request a copy directly from the authors.


The development of new technologies for video surveillance and automatic violence detection can bring more security to our daily lives. Solutions previously published in the state-of-the-art had presented techniques to detect violence at movie scenes, sports matches, or crowds. In this work, we propose a novel system architecture based on human Pose Track for detecting evidence of assaults in real-world videos from closed-circuit television (CCTV) of Brazilian lottery agencies. The results showed that our method can identify individuals with hands up and lying down with accuracy rates up to 85%. We believe that the detection of potentially risky situations in real-time is a crucial tool in the fighting against crime.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The inputs of SVM were features (25 skeleton keypoints, 6 angels and human contact detection) extracted from pose estimation algorithm [10], which is a key tool for analyzing human action from video. SVM with an Radial basis function (RBF) kernel was also used to classify the human poses and identify individuals' actions such as hands up and lying down [11]. To learn complex motion structures, recurrent pose-attention network (RPAN) was used to learn human-part features by sharing attention parameters partially on the semantically related human joints [12]. ...
Conference Paper
Full-text available
Violence detection has been investigated extensively in the literature. Recently, IOT based violence video surveillance is an intelligent component integrated in security system of smart buildings. Violence video detector is a specific kind of detection models that should be highly accurate to increase the model’s sensitivity and reduce the false alarm rate. This paper proposes a novel architecture of CNN-LSTM (Convolutional Neural Network - Long Short-Term Memory) model that can run on low-cost Internet of Things (IOT) device such as raspberry pi board. The paper utilized CNN to learn spatial features from video’s frames that were applied to LSTM for video classification into violence/non-violence classes. A complex dataset including two public datasets: RWF-2000 and RLVS-2000 was used for model training and evaluation. The challenging video content includes crowds and chaos, small object at far distance, low resolution, and transient action. Additionally, the videos were captured in various environments such as street, prison, and schools with several human actions such as playing football, basketball, tennis, swimming and eating. The experimental results show good performance of the proposed violence detection model in terms of average metrics having an accuracy of 73.35 %, recall of 76.90 %, precision of 72.53 %, F1 score of 74.01 %, false negative rate of 23.10 %, false positive rate of 30.20 %, and AUC of 82.0 %. The proposed CNN-LSTM can balance good performance with low number of parameters and thus can be implemented on low-cost IOT node.
ResearchGate has not been able to resolve any references for this publication.