To read the full-text of this research, you can request a copy directly from the authors.
... Sports are an exciting domain for machine learning due to their global appeal, well-defined rules, and multitude of machine learning tasks, such as win probability estimation, trajectory prediction, and tactic identification. Especially in the last decade, sports analytics has garnered much interest from teams, leagues, and fans [4]. However, while conventional sports, like soccer, basketball, and baseball, produce large and varied data, such as videos or player locations, the data suffer from various drawbacks that hamper its usability in research [10]. ...
Sports, due to their global reach and impact-rich prediction tasks, are an exciting domain to deploy machine learning models. However, data from conventional sports is often unsuitable for research use due to its size, veracity, and accessibility. To address these issues, we turn to esports, a growing domain that encompasses video games played in a capacity similar to conventional sports. Since esports data is acquired through server logs rather than peripheral sensors, esports provides a unique opportunity to obtain a massive collection of clean and detailed spatiotemporal data, similar to those collected in conventional sports. To parse esports data, we develop awpy, an open-source esports game log parsing library that can extract player trajectories and actions from game logs. Using awpy, we parse 8.6m actions, 7.9m game frames, and 417k trajectories from 1,558 game logs from professional Counter-Strike tournaments to create the Esports Trajectory and Actions (ESTA) dataset. ESTA is one of the largest and most granular publicly available sports data sets to date. We use ESTA to develop benchmarks for win prediction using player-specific information. The ESTA data is available at https://github.com/pnxenopoulos/esta and awpy is made public through PyPI.
... Sports analytics has been gaining increased traction as detailed spatio-temporal player event and trajectory data become readily available, enabled by enhanced data acquisiton systems [1]. To acquire these data, practitioners rely on methods that range from manual annotation [2] to automated tracking systems that record player locations at high frequencies. ...
Predicting outcomes in sports is important for teams, leagues, bettors, media, and fans. Given the growing amount of player tracking data, sports analytics models are increasingly utilizing spatially-derived features built upon player tracking data. However, player-specific information, such as location, cannot readily be included as features themselves, since common modeling techniques rely on vector input. Accordingly, spatially-derived features are commonly constructed in relation to anchor objects, such as the distance to a ball or goal, through global feature aggregations, or via role-assignment schemes, where players are designated a distinct role in the game. In doing so, we sacrifice inter-player and local relationships in favor of global ones. To address this issue, we introduce a sport-agnostic graph-based representation of game states. We then use our proposed graph representation as input to graph neural networks to predict sports outcomes. Our approach preserves permutation invariance and allows for flexible player interaction weights. We demonstrate how our method provides statistically significant improvements over the state of the art for prediction tasks in both American football and esports, reducing test set loss by 9% and 20%, respectively. Additionally, we show how our model can be used to answer "what if" questions in sports and to visualize relationships between players.
... The revolution of using the numbers to play the game more effectively started in baseball through the famous Moneyball book (Lewis, 2004) and movie, and this approach has been adapted by almost all sports. However, it has only recently been utilized to facilitate the operation of sports franchises (Assunção & Pelechrinis, 2019). One of the main reasons for this is the technological advancements that have allowed us to collect more fine-grained data and more sport informatics and analytics resources to be publicly available. ...
Sports big data has been an emerging research area in recent years. The purpose of this study was to ascertain the most frequent research topics, application areas, data sources, and data usage characteristics in the existing literature, in order to understand the development of data-driven baseball research and the multidisciplinary participation in the big data era. A scoping review was conducted, focusing on the diversity of using publicly available major league baseball data. Next, the co-occurrence analysis in bibliometrics was used to present a knowledge map of the reviewed literature. Finally, we propose a comprehensive baseball data research domain framework to visualize the ecosystem of publicly available sports data applications mapped to the four application domains in the big data maturity model. After searching and screening process from the Web of Science, Science Direct, and SPORTDiscus database, 48 relevant papers with clearly indicated data sources and data fields used were finally selected and full reviewed for advanced analysis. The most relevant research hotspots for sports data are sequentially economics and finance, sports injury, and sports performance evaluation. Subjects studied ranged from pitchers, position players, catchers, umpires, batters, free agents, and attendees. The most popular data sources are PITCHf/x, the Lahman Baseball Database, and baseball-reference.com. This review can serve as a valuable starting point for researchers to plan research strategies, to discover opportunities for cross-disciplinary research innovations, and to categorize their work in the context of the state of research.
... The key factor affecting the development of involved analytics in esports is the lack of easily accessible and clean esports data. Improved data capture, along with publicly accessible data, has traditionally enabled analytics to grow in a sport [2]. Since esports are played entirely virtually, one would expect data capture to be straightforward. ...
Esports, despite its expanding interest, lacks fundamental sports analytics resources such as accessible data or proven and reproducible analytical frameworks. Even Counter-Strike: Global Offensive (CSGO), the second most popular esport, suffers from these problems. Thus, quantitative evaluation of CSGO players, a task important to teams, media, bettors and fans, is difficult. To address this, we introduce (1) a data model for CSGO with an open-source implementation; (2) a graph distance measure for defining distances in CSGO; and (3) a context-aware framework to value players' actions based on changes in their team's chances of winning. Using over 70 million in-game CSGO events, we demonstrate our framework's consistency and independence compared to existing valuation frameworks. We also provide use cases demonstrating high-impact play identification and uncertainty estimation.
... Big data can be used to quantify the factors that affect athletes' performance in training by using audio-visual analysis equipment. Simulation technology and big data are used to simulate the movement methods of elite athletes, quantify each strength, Angle and power, analyze the physiological indicators and physical characteristics of elite athletes, and provide scientific basis for the selection of athletes and training diagnosis [5]. Managers, coaches, athletes and parents can learn about the training effect and communicate with each other on the platform. ...
This research takes the training, competition, management and talent database of college sports teams as the entry point, and the cross theory of computer, sports and management as the basis, aiming to build the intelligent service management platform of college sports teams. In order to achieve this goal, this study combined with the survey results and big data analysis, carried out a detailed program design for the training, competition and management of college sports teams. Then, combined with the data collected by wearable devices, an intelligent service management platform for college sports teams is designed and built. The platform is used to help sports teams to carry out competition, training and management in a more convenient, efficient and scientific way, provide data support for scientific decision-making, and provide integrated information resources, interactive information network and personalized information services for scientific training, competition and management of college sports teams. Solve the current cumbersome offline process and work mode of college sports teams, collect all data online, improve the overall control and data accumulation, improve the quality of reserve of college sports teams, and consolidate the foundation of sports reserve talents.
... 2,5,24 In addition to privacy, safety and voluntary consent, which are common to address in academic data collection ethics, there are a number of issues transferrable to the contexts of big data and healthcare. Examples include [25][26][27] : legal aspects of data ownership, access, sensitive information, potential exploitation and data misuse, data collections by employer, insurance companies, gambling parties, and other stakeholders combined with wearable technology in sport analytics, 28 preventative medical benefits, optimisation of human performance while reducing the risk of injury. Thus, if legislation is lagging behind technological advancements and current trends towards private-public sector partnerships 2, 3 it is important to consider possible opportunities for exchange of sensitive information. ...
Big data in healthcare has made a positive difference in advancing analytical capabilities and lowering the costs of medical care. In addition to providing analytical capabilities on platforms supporting current and near-future AI with machine-learning and data-mining algorithms, there is also a need for ethical considerations mandating new ways to preserve privacy, all of which are preconditioned by the growing body of regulations and expectations. The purpose of this study is to improve existing clinical care by implementing a big data platform for the Czech Republic National Health Service. Based on the achieved performance and its compliance with mandatory guidelines, the reported big-data platform was selected as the winning solution from the Czech Republic national tender (Tender Id. VZ0036628, No. Z2017-035520). The platform, based on analytical Vertica NoSQL database for massive data processing, complies with the TPC-H1 for decision support benchmark, the European Union (EU) and the Czech Republic requirements, well-exceeding defined system performance thresholds. The reported artefacts and concepts are transferrable to healthcare systems in other countries and are intended to provide personalised autonomous assessment from big data in a cost-effective, scalable and high-performance manner. The implemented platform allows: (1) scalability; (2) further implementations of newly-developed machine learning algorithms for classification and predictive analytics; (3) security improvements related to Electronic Health Records (EHR) by using automated functions for data encryption and decryption; and (4) the use of big data to allow strategic planning in healthcare.
Vexing political questions of power, inequality and coloniality permeate the tech sector and its growing use of global ‘virtual’ assembly lines that see them penetrate even refugee camps in efforts to extract value. As a response, tech companies have been expanding non-commercial activities within a presumed framework of humanitarianism, in part, trying to outweigh the negative implications of unjust business practices often characterised by third-party avoidance of responsibility. This commentary focuses on tech companies’ engagement with people in the Global South – not as recipients of tech beneficence – but as labourers who make tech possible. First, we document why companies are brought into humanitarian crises, and then we briefly chart examples of the practices of tech companies in the Global South. Then, we argue that ‘tech for good’, often presumed as altruistic, instead reproduces an expansive history of questionable corporate social responsibility efforts that sustain inequalities more than assuaging them. We conclude by reflecting on the impact of commodifying compassion for humanitarian helping and argue that tech companies should stop trying to ‘help’ through self-perceived altruistic activities. Instead, corporations should focus on remaking their core business practices in an image of justice, protection, and equal value creation, particularly in contexts characterised by vulnerability.
The collection, processing, storage and circulation of data are fundamental element of contemporary societies. While the positivistic literature on ‘data revolution’ finds it essential for improving development delivery, critical data studies stress the threats of datafication. In this article, we demonstrate that datafication has been happening continuously through history, driven by political and economic pressures. We use historical examples to show how resource and personal data were extracted, accumulated and commodified by colonial empires, national governments and trade organizations, and argue that similar extractive processes are a present-day threat in the Global South. We argue that the decoupling of earlier and current datafication processes obscures the underlying, complex power dynamics of datafication. Our historical perspective shows how, once aggregated, data may become imperishable and can be appropriated for problematic purposes in the long run by both public and private entities. Using historical case studies, we challenge the current regulatory approaches that view data as a commodity and frame it instead as a mobile, non-perishable, yet ideally inalienable right of people.
Developed countries develop their production sites within the scope of industry 4.0 technology components and experience constant change and transformation to establish economic superiority. This situation allows them to produce more in various fields and thus to rise to a more advantageous position economically. Industry 4.0 technology affects areas within the scope of the sports industry such as sports tourism, athlete performance, athlete health, sports publishing, sports textile products, sports education and training, sports management and human resources, and creates an international competition environment in terms of production and performance. In this study, it is aimed to examine the researches about the usage areas of industry 4.0 in sports. From this point on, researches in the context of the subject have been presented with bibliographic method. In the conclusion section, the weaknesses and possibilities of youth sociology were discussed, and efforts were made to present a projection on what to do about the field. In this respect, a youth sociology evaluation has been tried to be made on the prominent topics, forgotten aspects and themes left incomplete in youth sociology studies. Extended English summary is in the end of Full Text PDF (TURKISH) file. Özet Gelişmiş ülkeler endüstri 4.0 teknolojisi bileşenleri kapsamında üretim sahalarını geliştirmekte ve ekonomik üstünlük kurmak amacıyla sürekli değişim ve dönüşüm yaşamaktadır. Bu durum onların çeşitli alanlarda daha fazla üretmelerine dolayısıyla ekonomik yönden daha avantajlı konuma yükselmelerine olanak sağlamaktadır. Endüstri 4.0 teknolojisi spor turizmi, sporcu performansı, sporcu sağlığı, spor yayıncılığı, spor tekstil ürünleri, spor eğitimi ve öğretimi, spor yönetimi ve insan kaynakları gibi spor endüstrisi kapsamındaki alanları etkilemekte üretim ve performans yönünden ülkeler arası bir rekabet ortamı oluşturmaktadır. Bu çalışmada endüstri 4.0’ın sporda kullanım alanları ile ilgili araştırmaların incelenmesi hedeflenmektedir. Bu noktadan hareketle konu bağlamındaki araştırmalar bibliyografik metodla ortaya konmuştur. Sonuç bölümünde ise sporda endüstri 4.0 kullanım alanları tartışılmış, alana olan katkıları ve olumuz etkilerinin değerlendirilmesi yapılmıştır.
With the widespread use of Internet big data, all walks of life are developing rapidly, among which the sports industry has also encountered new development opportunities and challenges. Based on the background of big data era, this paper studies the financial risk control of sports industry in the era of big data. This paper will first give readers a specific introduction to what is big data and let them know the main characteristics of the era of big data; then, it will analyze and further discuss the potential financial risks in the sports industry; finally, it will study the development of sports industry and how to control financial risks in the era of big data. In the analysis, this paper through the investigation of the development trend of the fitness industry, shows that there are risks in the fitness industry, and this risk is fatal to the fitness industry. Then, based on the background of big data, this paper puts forward corresponding suggestions on the risk control of the fitness industry, hoping that this study can provide certain reference for the healthy development of the fitness industry.
The traditional sports industry structure has undergone radical changes under the background of big data. Big data with the top technology level provides numerous sports consumers with more personalized, customized, precise and other services that the traditional sports industry cannot imagine. Big data under the background of development of sports industry should speed up the industrialization of sports work practice, aim at sports market positioning, execution concept updating and solid talent pool, take the initiative to improve sports personalized content customization platform, software and information service as the core to build sports industry chain, industry focus on the core elements to optimize the perfect sports industry development environment, the value of using big data to grasp the sports industry development opportunities.
ResearchGate has not been able to resolve any references for this publication.