Football Player Detection in Video Broadcast
Abstract
The paper describes a novel segmentation system based on the combination of Histogram of Oriented Gradients (HOG) descriptors
and linear Support Vector Machine (SVM) classification for football video. Recently, HOG methods were widely used for pedestrian
detection. However, presented experimental results show that combination of HOG and SVM is very promising for locating and
segmenting players. In proposed system a dominant color based segmentation for football playfield detection and a 3D playfield
modeling based on Hough transform is introduced. Experimental evaluation of the system is done for SD (720×576) and HD (1280×720)
test sequences. Additionally, we test proposed system performance for different lighting conditions (non-uniform pith lightning,
multiple player shadows) as well as for various positions of the cameras used for acquisition.
... Traditional model player detection includes connected component analysis [8], shallow convolutional neural networks [9], histogram of orientated gradients and support vector machines (HOG-SVM) [10], and deformable part model (DPM) [11]. Figure 1 shows different situations in football player detection. ...
... Traditional models generally can detect players in this situation, while they can hardly detect adjacent players (Figures 1(b)-1(d)) correctly in a harder situation. Besides, HOG-SVM needs domain knowledge and more labor work in order to conduct background segmentation [10]. Non-maximum suppression restricts the performance of DPM when detecting close players [12]. ...
The main task of football video analysis is to detect and track players. In this work, we propose a deep convolutional neural network-based football video analysis algorithm. This algorithm aims to detect the football player in real time. First, five convolution blocks were used to extract a feature map of football players with different spatial resolution. Then, features from different levels are combined together with weighted parameters to improve detection accuracy and adapt the model to input images with various resolutions and qualities. Moreover, this algorithm can be extended to a framework for detecting players in any other sports. The experimental results assure the effectiveness of our algorithm.
... Several studies have appeared in this domain. In order to tackle the drawbacks of RGB based field recognition [12,15] and edge detection based methods [11], we have appointed the corner points on the two sides of the field. In the first step, we took a frame from the videos and undistorted them with the distortion matrix of the camera lens to get straight lines on the image. ...
... Player detection is a broad research area in sport video analysis. It can be based on Histogram of Oriented Gradients (HOG) [11], or some feature selection algorithms such as dominant color-based background subtraction [9], and edge detection [3]. Since our videos were recorded from stationary points, we opted for a movement-based background-foreground separation [6] method to separate moving objects from the background. ...
Sports analytics are on the rise in European football, however, due to the high cost so far only the top tier leagues and championships have had the privilege of collecting high precision data to build upon. We believe that this opportunity should be available for everyone especially for youth teams, to develop and recognize talent earlier. We therefore set the goal of creating a low-cost player tracking system that could be applied in a wide base of football clubs and pitches, which in turn would widen the reach for sports analytics, ultimately assisting the work of scouts and coaches in general. In this paper, we present a low-cost optical tracking solution based on cheap action cameras and cloud-deployed data processing. As we build on existing research results in terms of methods for player detection, i.e., background-foreground separation, and for tracking, i.e., Kalman filter, we adapt those algorithms with the aim of sacrificing as least as possible on accuracy while keeping costs low. The results are promising: our system yields significantly better accuracy than a standard deep learning based tracking model at the fraction of its cost. In fact, at a cost of $2.4 per match spent on cloud processing of videos for real-time results, all players can be tracked with a 11-meter precision on average.
... In 2010, Sławomir Maćkowiak and his team developed a system employing Histogram of Oriented Gradients (HOG) descriptors and Support Vector Machine (SVM) classification to detect football players in broadcast videos [3]. This method incorporated playfield detection and player tracking, showing promising results in dynamic scenes, though it faced challenges with high occlusion and rapid camera movements. ...
This study presents an advanced YOLOv10n-based method for the automatic detection of football players and balls directly from match videos. We enhance the YOLOv10 architecture with several significant improvements, including additional detection heads, the integration of C2f_faster and C3_faster modules for enhanced processing speed and accuracy, and the inclusion of BotNet modules with self-attention mechanisms for managing complex visual scenes. Further, we incorporate GhostConv modules to reduce computational overhead while maintaining effective feature extraction. These architectural modifications ensure robust detection capabilities in real-time sports environments, addressing challenges such as high-speed movements, frequent occlusions, and variable lighting conditions typical of both indoor and outdoor stadiums. Validation on internet-sourced images from football matches demonstrates the practicality and effectiveness of our model.
... This method employs the head and shoulder HOG characteristics of football players as the research object, intercepts the positive samples that contain a moving head and shoulder and the negative samples that do not, trains the SVM classifier with the HOG feature of the samples, and scans the foreground region in various scales to detect the dynamic moving target. An innovative segmentation approach for detecting football players in broadcast footage is provided in the research [11]. The system is built on a mix of histogram of oriented gradient descriptors and linear support vector machine classification. ...
The study presents a significantly improved version of the YOLOv5 real-time object detection model for football player recognition. The proposed technique includes feature-tuning and hyper-parameter optimization methods that have been carefully selected to enhance both speed and accuracy, resulting in a superior real-time performance of the YOLOv5 architecture. Furthermore, the YOLOv5 model incorporates a SimSPPF module that enables multi-scale feature extraction with less computational power, making it a highly efficient and effective solution. We selected the GhostNet module to reduce complexity and the Slim scale detection layer for precise bounding box prediction. Our tests, conducted with recordings of multiple football matches, demonstrate that our model accurately detects both the football and players even in complex scenarios with occlusions and dynamic illumination. The suggested method outperforms the original YOLOv5n model in terms of precision, recall, and mean average precision at 0.5 IoU. It is also more computationally efficient. This method has potential applications in live broadcasting, player monitoring, and sports analytics. The upgraded YOLOv5 model demonstrates superior accuracy and efficiency compared to previous methods that rely on traditional image processing techniques or two-stage detectors. This makes it highly suitable for practical, real-world deployments.
... The traditional object detection model from sport video usually works to identify the players. It involves using different techniques such as connected component analysis [10], shallow convolutional neural networks [11], histogram of orientated gradients and support vector machines (HOG-SVM) [12], and deformable part model (DPM) [13]. The Fig.1 includes 4 images (a)-(d) that shows the various frames of sport video which targeted for object detection. ...
Object detection is the most common task in Sports Video Analysis.
This task requires accurate object detection that can handle a variety of objects of different sizes that are partially occluded, have poor lighting, or are presented in complicated surroundings. Object in field sports includes player’s team and ball detection; this is a difficult task resulting from the rapid movement of the player and speed of the object of concern. This paper proposes a pre-trained YOLOv3, deep learningbased object detection model. We have prepared a hockey dataset consisting of four main entities: Team 1 (AUS), Team 2 (BEL), Hockey Ball, and Umpire. We constructed own dataset because there are no existing field hockey datasets available. Experimental results indicate that the pre-trained YOLOV3 deep learning model generates comparative results on this dataset by modifying the hyperparameters of this pre-trained model.
... In the previous methods presented in this field, an attempt has been made to detect the playfield using the presence of a dominant color in the far view frames. According to this, some researchers have tried to use the lighting-independent image features by converting RGB space to HIS (Spagno et al., 2007 [7]), YCbCr (Bernard et al., 2004 [24]), and YIQ (Palo et al., 2008 [15]) and detect the playfield more accurately. ...
Today, due to the growth of data and the development of receiving and storing technologies, large datasets have been created in various fields, such as soccer video datasets. Since obtaining the information manually from large datasets is very difficult, an automated system to capture important information from soccer videos is strongly needed. Automated analysis of soccer videos includes many applications such as: analyzing team tactics, confirming referees’ decisions, summarizing videos, etc.
In this paper, a forward-backward algorithm is proposed to increase the performance of player detection and tracking. The purpose of this algorithm is to identify and resolve the occlusions among the players and improve the preprocessing steps (playfield extraction and field lines elimination). We also proposed a new method for each preprocessing step to improve the performance of the tracking system. The evaluations show that our tracking algorithm has performed better than previous methods (89% locally and 78% globally).
... The accuracy rate is 93%. J.K. Slawomir Mackowiak et al proposed a dominant color-based segmentation method for football playfield detection [11]. A 3D playfield modeling based on Hough transform was introduced. ...
... This feature vector can be used to classify objects into different classes, e.g., player, background, and ball. This method is used by Mackowiak et al. [2010] and Cheshire et al. [2015]. ...
Sports analysis has gained paramount importance for coaches, scouts, and fans. Recently, computer vision researchers have taken on the challenge of collecting the necessary data by proposing several methods of automatic player and ball tracking. Building on the gathered tracking data, data miners are able to perform quantitative analysis on the performance of players and teams. With this survey, our goal is to provide a basic understanding for quantitative data analysts about the process of creating the input data and the characteristics thereof. Thus, we summarize the recent methods of optical tracking by providing a comprehensive taxonomy of conventional and deep learning methods, separately. Moreover, we discuss the preprocessing steps of tracking, the most common challenges in this domain, and the application of tracking data to sports teams. Finally, we compare the methods by their cost and limitations, and conclude the work by highlighting potential future research directions.
Camera calibration has attracted much attention, as a primary step of various soccer game analyses, such as real-world player tracking and offside detection. Moreover, metric evaluation of player detection and tracking can be provided via camera calibration, which is a crucial task for player localization assessment. Camera calibration is also challenging due to the blur and duplicated lines, freely moving camera in broadcast streams and lack of well distributed 3D features on the playfield. The main goal of this paper is to review the state-of-the-art in camera calibration and its different prior steps, including transformation computation, feature extraction, image to model registration, image to image registration, camera parameters estimation and conclude future research directions.
In this paper, a novel multiple objects detection and tracking approach based on support vector machine and particle filter is proposed to track players in broadcast sports video. Com- pared with previous work, the contributions of this paper are focused on three aspects. First, an improved particle filter called SVR particle filter is proposed as the player tracker by integrating support vector regression (SVR) into sequen- tial Monte Carlo framework. SVR particle filter enhances the performance of classical particle filter with small sample set and improves the efficiency of tracking system. Second, support vector classification combined with playfield seg- mentation is employed to automatically detect the players in sports video as the initialization of tracker. Third, a unified framework for automatic object detection and tracking is proposed based on support vector machine and particle filter. The experimental results are encouraging and demonstrate that our approach is effective.
A novel thin line detection algorithm for use in low-altitude aerial vehicles is presented. This algorithm is able to detect thin obstacles such as cables, power lines, and wires. The system is intended to be used during urban search and rescue operations, capable of dealing with low-quality images, robust to image clutter, bad weather, and sensor artifacts. The detection process uses motion estimation at the pixel level, combined with edge detection, followed by a windowed Hough transform. The evidence of lines is tracked over time in the resulting parameter spaces using a dynamic line movement model. The algorithm's receiver operating characteristic curve (ROC) is shown, based on a multi-site dataset with 86 videos with 10160 wires spanning in 5576 frames.
Classifying video content into different semantic granularities is a possible way for flexible video indexing, browsing and retrieval. In this paper, a placed kick refinement algorithm is proposed after semantic based event detection or manually annotation. The placed kick event is further classified into following three types: free kick, corner kick and penalty according to the ball and field lines detection and their relationships determination. Firstly, we carry out ball detection in the global shot of the placed kick event. According to the ball detection results, we further determine whether to detect field lines using Hough transform. Finally, the ball and field lines detection results are integrated in decision making stage. Experimental results show the effectiveness of the proposed method.
Detecting pedestrians accurately is the first fundamental step for many computer vision applications such as video surveillance, smart vehicles, intersection traffic analysis and so on. The authors present an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vector machine (SVM) classifiers. The performance of pedestrian detection using region covariance, histogram of oriented gradients (HOG) and local receptive fields (LRF) feature descriptors is experimentally evaluated. The experiments are performed on the DaimlerChrysler benchmarking data set, the MIT CBCL data set and 'Intitut National de Recherche en Informatique et Automatique (INRIA) data set. All can be publicly accessed. The experimental results show that region covariance features with radial basis function kernel SVM and HOG features with quadratic kernel SVM outperform the combination of LRF features with quadratic kernel SVM. Furthermore, the results reveal that both covariance and HOG features perform very well in the context of pedestrian detection.
Detecting lines from a digital image is very important in image processing. An efficient line detection based on randomized method is presented. Different from the previous HT (Hough Transform) - based methods which vote on a parameter space, this algorithm does not need an accumulator for representing parameter space. The main concept in the proposed method is: firstly, it selects two different edge points from an edge image to form a candidate line; secondly, under the given distance tolerance, a strip image region along the candidate line direction can be got, then the number of edge points in the determined image strip region is accumulated; and lastly, the threshold rules will be applied to further determine whether the candidate line is the desired one. Experimental results show that this approach can accurately find the lines in noisy images. Compared with HT and RHT (Randomized Hough Transform), the proposed algorithm has the advantages of fewer storage space and shorter computational time.
The detection of lines in an image is an important task. The well-known Standard Hough Transform (SHT) and Progressive Probabilistic Hough Transform (PPHT) are two of the most efficient algorithms for line detection. SHT can detect almost straight lines in the image; moreover, it is highly resistant to noise. Line segments are found effectually by PPHT, but there are a few problems, resulting in this algorithm having lower accuracy than SHT. This paper proposes an extension of this robust algorithm to detect line segments accurately. The proposal contains three extensions: the technique of accumulation, the application of a local maxima rule in the SHT pace, and detection of line segments. The PPHT algorithm is used to compare the experimental results to the results of the proposed method.
In this paper, we propose an original approach in order to improve the results of color image segmentation by pixel classification. We define a new kind of color space by selecting a set of color components which can belong to any of the different classical color spaces. Such spaces, which have neither psycho-visual nor physical color significance, are named hybrid color spaces. We propose to classify pixels represented in the hybrid color space which is specifically designed to yield the best discrimination between the pixel classes. This space, which is called the adapted hybrid color space, is built by means of a sequential supervised feature selection scheme. This procedure determines the adapted hybrid color space associated with a given family of images. Its dimension is not always equal to three, as for classical color spaces. The effectiveness of our color segmentation method is assessed in the framework of soccer image analysis. The team of each player is identified by the colors of its soccer suit. The aim of the segmentation procedure is to extract meaningful regions representing the players and to recognize their teams.
We present a statistical approach for parsing football video structures. Based on video production conventions, a new generic structure called attack is identified, which is an equivalent of scene in other video domains. We define four video segments to construct it, namely play, focus, replay and break. Two middle level visual features, play field ratio and zoom size, are also computed. The detection process includes a two-pass classifier, a combination of Gaussian Mixture Model and Hidden Markov Models. A general suffix tree is introduced to identify and organize attack. In experiments, video structure classification accuracy of about 86% is achieved on broadcasting World Cup 2002 video data.
The dominant color descriptor (DCD) is widely applied in the image retrieval taken as one of MPEG-7 color descriptors. DCD describes the representative color distributions and features in an image or a region of interest through an effective, compact and intuitive format. A novel image retrieval method based on the fixed number's MPEG-7 dominant color descriptor is proposed. The feature extraction process does not need the intervention of the threshold value and the dominant color number is fixed as eight. The histogram intersection algorithm is used to measure features, simplifies the similarity computation complexity. The experiment results show that the precision and recall rate of this method is higher than that of non-fixed number's dominant color retrieval method.