CCTVs have since long been used to enforce security, e.g. to detect fights arising from many different situations. But their effectiveness is questionable, because they rely on continuous and specialized human supervision, demanding automated solutions. Previous work are either too superficial (classification of short-clips) or unrealistic (movies, sports, fake fights). None performed detection ... [Show full abstract] of actual fights on long duration CCTV recordings. In this work, we tackle this problem by firstly proposing CCTV-Fights (http://rose1.ntu.edu.sg/Datasets/cctvFights.asp), a novel and challenging dataset containing 1,000 videos of real fights, with more than 8 hours of annotated CCTV footages. Then we propose a pipeline, on which we assess the impact of different feature extractors, through Two-stream CNN, 3D CNN and a local interest point descriptor, as well as different classifiers, such as end-to-end CNN, LSTM and SVM. Results confirm how challenging the problem is, and highlight the importance of explicit motion information to improve performance.