Adria Recasens’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (38)


A bird’s eye overview of TacticAI
A How corner kick situations are converted to a graph representation. Each player is treated as a node in a graph, with node, edge and graph features extracted as detailed in the main text. Then, a graph neural network operates over this graph by performing message passing; each node’s representation is updated using the messages sent to it from its neighbouring nodes. B How TacticAI processes a given corner kick. To ensure that TacticAI’s answers are robust in the face of horizontal or vertical reflections, all possible combinations of reflections are applied to the input corner, and these four views are then fed to the core TacticAI model, where they are able to interact with each other to compute the final player representations—each internal blue arrow corresponds to a single message passing layer from (A). Once player representations are computed, they can be used to predict the corner’s receiver, whether a shot has been taken, as well as assistive adjustments to player positions and velocities, which increase or decrease the probability of a shot being taken.
Corner kicks represented in the latent space shaped by TacticAI
We visualise the latent representations of attacking and defending teams in 1024 corner kicks using t-SNE. A latent team embedding in one corner kick sample is the mean of the latent player representations on the same attacking (A–C) or defending (D) team. Given the reference corner kick sample (A), we retrieve another corner kick sample (B) with respect to the closest distance of their representations in the latent space. We observe that (A) and (B) are both out-swing corner kicks and share similar patterns of their attacking tactics, which are highlighted with rectangles having the same colours, although they bear differences with respect to the absolute positions and velocities of the players. All the while, the latent representation of an in-swing attack (C) is distant from both (A) and (B) in the latent space. The red arrows are only used to demonstrate the difference between in- and out-swing corner kicks, not the actual ball trajectories.
Example of refining a corner kick tactic with TacticAI
TacticAI makes it possible for human coaches to redesign corner kick tactics in ways that help maximise the probability of a positive outcome for either the attacking or the defending team by identifying key players, as well as by providing temporally coordinated tactic recommendations that take all players into consideration. As demonstrated in the present example (A), for a corner kick in which there was a shot attempt in reality (B), TacticAI can generate a tactically-adjusted setting in which the shot probability has been reduced, by adjusting the positioning of the defenders (D). The suggested defender positions result in reduced receiver probability for attacking players 2–5 (see bottom row), while the receiver probability of Attacker 1, who is distant from the goalpost, has been increased (C). The model is capable of generating multiple such scenarios. Coaches can inspect the different options visually and additionally consult TacticAI’s quantitative analysis of the presented tactics.
Statistical analysis for the case study tasks
In task 1, we tested the statistical difference between the real corner kick samples and the synthetic ones generated by TacticAI from two aspects: (A.1) the distributions of their assigned ratings, and (A.2) the corresponding histograms of the rating values. Analogously, in task 2 (receiver prediction), (B.1) we track the distributions of the top-3 accuracy of receiver prediction using those samples, and (B.2) the corresponding histogram of the mean rating per sample. No statistical difference in the mean was observed in either cases ((A.1) (z = −0.34, p > 0.05), and (B.1) (z = 0.97, p > 0.05)). Additionally, we observed a statistically significant difference between the ratings of different raters on receiver prediction, with three clear clusters emerging (C). Specifically, Raters A and E had similar ratings (z = 0.66, p > 0.05), and Raters B and D also rated in similar ways (z = −1.84, p > 0.05), while Rater C responded differently from all other raters. This suggests a good level of variety of the human raters with respect to their perceptions of corner kicks. In task 3—identifying similar corners retrieved in terms of salient strategic setups—there were no significant differences among the distributions of the ratings by different raters (D), suggesting a high level of agreement on the usefulness of TacticAI’s capability of retrieving similar corners (F1,4 = 1.01, p > 0.1). Finally, in task 4, we compared the ratings of TacticAI’s strategic refinements across the human raters (E) and found that the raters also agreed on the general effectiveness of the refinements recommended by TacticAI (F1,4 = 0.45, p > 0.05). Note that the violin plots used in B.1 and C–E model a continuous probability distribution and hence assign nonzero probabilities to values outside of the allowed ranges. We only label y-axis ticks for the possible set of ratings.
Examples of the tactic refinements recommended by TacticAI
These examples are selected from our case study with human experts, to illustrate the breadth of tactical adjustments that TacticAI suggests to teams defending a corner. The density of the yellow circles coincides with the number of times that the corresponding change is recognised as constructive by human experts. Instead of optimising the movement of one specific player, TacticAI can recommend improvements for multiple players in one generation step through suggesting better positions to block the opposing players, or better orientations to track them more efficiently. Some specific comments from expert raters follow. In A, according to raters, TacticAI suggests more favourable positions for several defenders, and improved tracking runs for several others—further, the goalkeeper is positioned more deeply, which is also beneficial. In B, TacticAI suggests that the defenders furthest away from the corner make improved covering runs, which was unanimously deemed useful, with several other defenders also positioned more favourably. In C, TacticAI recommends improved covering runs for a central group of defenders in the penalty box, which was unanimously considered salient by our raters. And in D, TacticAI suggests substantially better tracking runs for two central defenders, along with a better positioning for two other defenders in the goal area.
TacticAI: an AI assistant for football tactics
  • Article
  • Full-text available

March 2024

·

2,106 Reads

·

48 Citations

Zhe Wang

·

Petar Veličković

·

·

[...]

·

Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI’s model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.

Download

Zorro: the masked multimodal transformer

January 2023

·

32 Reads

·

1 Citation

Attention-based models are appealing for multimodal processing because inputs from multiple modalities can be concatenated and fed to a single backbone network - thus requiring very little fusion engineering. The resulting representations are however fully entangled throughout the network, which may not always be desirable: in learning, contrastive audio-visual self-supervised learning requires independent audio and visual features to operate, otherwise learning collapses; in inference, evaluation of audio-visual models should be possible on benchmarks having just audio or just video. In this paper, we introduce Zorro, a technique that uses masks to control how inputs from each modality are routed inside Transformers, keeping some parts of the representation modality-pure. We apply this technique to three popular transformer-based architectures (ViT, Swin and HiP) and show that with contrastive pre-training Zorro achieves state-of-the-art results on most relevant benchmarks for multimodal tasks (AudioSet and VGGSound). Furthermore, the resulting models are able to perform unimodal inference on both video and audio benchmarks such as Kinetics-400 or ESC-50.


TAP-Vid: A Benchmark for Tracking Any Point in a Video

November 2022

·

14 Reads

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.



Stylized visualization of the multiagent time-series imputation setting. (a) Agent trajectories up to and including time t. Dark blue indicates trajectory portions that are observed (with light indicating otherwise); the camera field of view at the current time t is indicated in grey. (b) Visualization of masks m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}$$\end{document} for all timesteps, where mti=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}^i_t=1$$\end{document} where dark, and mti=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{m}}}^i_t=0$$\end{document} where light. The mask at time t, which corresponds to the frame shown in (a), is highlighted in grey.
Graph Imputer model. Our model imputes missing information at each timestep using a combination of bidirectional LSTMs and graph networks. An exposition of a forward-direction update (corresponding to directionalupdate in Algorithm 1 in the “Methods” section) is provided in the left portion of the figure. Dark blue boxes indicate trajectory segments that are observed for each agent (with light blue indicating otherwise). In each direction, agent-specific temporal context is updated via LSTMs with shared parameters. All agents’ LSTM hidden states, h→t-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathop {{{\varvec{h}}}}\limits ^{{\tiny \rightarrow }}}_{t-1}$$\end{document}, are subsequently used as node features in variational graph networks to ensure information-sharing across agents. This enables learning of a distribution over agent state deviations, Δx→t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta {\mathop {{{\varvec{x}}}}\limits ^{{\tiny \rightarrow }}}_{t}$$\end{document}. The process is likewise repeated in the backward-direction (right portion of the figure), with the directional updates fused to produce an imputed estimate x^t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{{\varvec{x}}}}}_t$$\end{document} at each time t. The dotted line indicates that the Graphnet encoder is used only at training time, with the GraphNet prior being used for the final evaluations conducted at test time.
Trajectory visualizations (best viewed when zoomed in). Each column provides an example trajectory sequence, with the first row illustrating the ground truth, and subsequent rows showing results from various models, including the Graph Imputer (ours). For all examples, the Graph Imputer trajectories seamlessly adhere to the boundary value constraints imposed at the moments of disappearance and reappearance of players.
Pitch control error visualizations. The first column shows the ground truth pitch control field, player positions, and the camera field of view. Each remaining column provides a visualization of the absolute error between pitch control fields based on predicted model outputs and ground truth.
Predicted vs. ground truth pitch control 14 Mean Average Error (MAE) across models, under partially observable settings. Mean and standard deviations are reported over all trajectories in our validation dataset. The Graph Imputer model yields the lowest pitch control error across all baselines. Note that the Bidir. Role- invariant VRNN model, which comes closest to our Graph Imputer model in terms of performance, was handcrafted by us specifically for the football domain.
Multiagent off-screen behavior prediction in football

May 2022

·

741 Reads

·

24 Citations

In multiagent worlds, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents’ dynamic behaviors, make such systems complex and interesting to study from a decision-making perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. In many settings, only sporadic observations of agents may be available in a given trajectory sequence. In football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation in the context of human football play, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer , uses past and future information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We demonstrate our approach on multiagent settings involving players that are partially-observable, using the Graph Imputer to predict the behaviors of off-screen players. To quantitatively evaluate the approach, we conduct experiments on football matches with ground truth trajectory data, using a camera module to simulate the off-screen player state estimation setting. We subsequently use our approach for downstream football analytics under partial observability using the well-established framework of pitch control, which traditionally relies on fully observed data. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football, across all considered metrics.


Figure 3. Example masked auto-encoding results on one ImageNet images using 85% masking rate for groupwise-masking. On the left we show the original image, in the middle the corresponding masked image and on the right the outputs of the 16-group model. Note that the masks were shared across groups (groups in HiP-16 for 224x224 images are sequences of 14 consecutive pixel rows), and this is visible as a vertically recurring pattern.
Hierarchical Perceiver

February 2022

·

108 Reads

General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by exclusively using global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of locality can be introduced back into these models, greatly improving their efficiency while preserving their generality. To scale them further, we introduce a self-supervised approach that enables learning dense low-dimensional positional embeddings for very large signals. We call the resulting model a Hierarchical Perceiver (HiP). HiP retains the ability to process arbitrary modalities, but now at higher-resolution and without any specialized preprocessing, improving over flat Perceivers in both efficiency and accuracy on the ImageNet, Audioset and PASCAL VOC datasets.


Towards Learning Universal Audio Representations

November 2021

·

81 Reads

The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learning systems on that benchmark. We discover that previous sound event classification or speech models do not generalize outside of their domains. We observe that more robust audio representations can be learned with the SimCLR objective; however, the model's transferability depends heavily on the model architecture. We find the Slowfast architecture is good at learning rich representations required by different domains, but its performance is affected by the normalization scheme. Based on these findings, we propose a novel normalizer-free Slowfast NFNet and achieve state-of-the-art performance across all domains.



Game Plan: What AI can do for Football, and What Football can do for AI

May 2021

·

586 Reads

·

100 Citations

Journal of Artificial Intelligence Research

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).


Figure 2: Density functions of the β distributions we consider for the mixing ratio α.
Multimodal Self-Supervised Learning of General Audio Representations

April 2021

·

88 Reads

We present a multimodal framework to learn general audio representations from videos. Existing contrastive audio representation learning methods mainly focus on using the audio modality alone during training. In this work, we show that additional information contained in video can be utilized to greatly improve the learned features. First, we demonstrate that our contrastive framework does not require high resolution images to learn good audio features. This allows us to scale up the training batch size, while keeping the computational load incurred by the additional video modality to a reasonable level. Second, we use augmentations that mix together different samples. We show that this is effective to make the proxy task harder, which leads to substantial performance improvements when increasing the batch size. As a result, our audio model achieves a state-of-the-art of 42.4 mAP on the AudioSet classification downstream task, closing the gap between supervised and self-supervised methods trained on the same dataset. Moreover, we show that our method is advantageous on a broad range of non-semantic audio tasks, including speaker identification, keyword spotting, language identification, and music instrument classification.


Citations (21)


... Among them, soccer, widely celebrated as "the beautiful game", holds a particularly prominent position, engaging billions of fans worldwide through its universal appeal and intricate strategies. Recent advances in artificial intelligence (AI) are transforming soccer understanding and viewing experiences by enabling automated tactical analysis [47,53] and enriching fan engagement through automatic content generation [38,40,43,44]. ...

Reference:

Multi-Agent System for Comprehensive Soccer Understanding
TacticAI: an AI assistant for football tactics

... The behaviours of agents (players and the ball) in soccer form a rich and important testbed for the study of multiagent adversarial systems (Yeh et al. 2019;Tuyls et al. 2021;Omidshafiei et al. 2022;Wang et al. 2024). In this paper, we model the fine-grained spatiotemporal behaviours of agents in professional soccer games. ...

Multiagent off-screen behavior prediction in football

... Evaluating such models requires testing their capacity to accurately encode diverse properties, which can be humangiven labels or characteristics collected together with the data. In the case of audio, this was pioneered by NOSS [13], LeBenchmark [14], and SUPERB [15], which focus on speech tasks, and first extended beyond speech by HARES [16]. The HEAR [17] and LAPE [18] benchmarks followed, addressing the comprehensiveness of evaluation tasks and low-resource environments, respectively. ...

Towards Learning Universal Audio Representations
  • Citing Conference Paper
  • May 2022

... Several works have focused on improving contrastive learning through better sampling and data augmentation strategies. This includes active learning approaches for mining hard negatives [26], robust sampling to handle temporal misalignment [28], and multi-view techniques [32,33,39,41] that leverage both global and local temporal context. Recent work has explored making representations more robust by relaxing temporal synchronicity constraints [35] and introducing equivariance [21]. ...

Broaden Your Views for Self-Supervised Video Learning
  • Citing Conference Paper
  • October 2021

... • In computer vision tasks [4][5][6], including image analysis, video processing, and point cloud interpretation, DNNs excel in scenarios with limited computational resources. They achieve higher accuracy levels, showcasing their adaptability to resource-constrained environments [5,[59][60][61][62][63][64][65][66][67][68][69][70][71][72]. • In natural language processing [7][8][9], dynamic models stand out by identifying critical features that convey emotional states, enhancing the creation of emotionally intelligent interfaces. ...

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks
  • Citing Article
  • January 2018

... Although current models can achieve high accuracy when tailored to specific datasets or individuals [57], they consistently show significant performance degradation in novel environments. This challenge is evidenced by the persistent difficulty in training one single model to achieve high accuracy across various gaze estimation datasets [20,37,43,89,90]. Despite decades of research on data collection and model architecture, the problem of generalization across variations in head pose, identity, and lighting conditions remains largely unsolved. ...

Gaze360: Physically Unconstrained Gaze Estimation in the Wild
  • Citing Article
  • January 2019

... Details information also include links to papers and the corresponding datasets. [9] 2,500 MASSVIS [10] 5,693 DiagramFlyer [11] 300,000 Aesthetics [12] 330 WikiTable [13] 2,108 D, 22,033 QA Network Repository [14] 5,000 Figureseer [15] 60,000 REV [16] 5,125 ChartSense [17] 6,997 DataClips [18] 70 BubbleView [19] 393 FigureQA [20] 100,000 1,000,000 QA Sightline [21] 1,300 Beagle [22] 41,000 ScatterNet [23] 50,677 DeepEye [24] 33,400 DVQA [25] 300,000 3,400,000 QA VizByWiki [26] 3,000,000 GMVR [27] 200000 Perception [28] 100,000 Text-to-Viz [29] 200 D3 search [30] 7,860 VizNet [5] 31,000,000 Timeline [31] 4,689 VizML [7] 1,000,000 Data2Vis [32] 4,300 Sherlock [33] 686, 675 columns InSituNet [34] 125,000 DNN-VolVis [35] 10000 AutoCaption [36] 3,000 Auto Annotation [37] 400 MultiViewLayout [38] 360 VisImages [39] 12,267 InfoVIF [40] 13,000 Retrie-then-adapt [41] 1,000 InteractiveArticles [42] 60 Leaf-QA [43] 240,000 2,000,000 QA Leaf-QA++ [44] 244,000 2,500,000 QA PlotQA [45] 224,000 28,000,000 QA Vis-QA [46] 52 629 QA FigCAP [47] 124,217 VizCommender [6] 18,820 ChartSeer [48] 9,925 Calliope [49] 230 4,186 segments DVAO [50] 477 CrisisVis [51] 663 ViralVis [52] 41,000 VIS30K [53] 29,689 ExcelChart400K [54] 386,966 Visually29K [55] 29,000 AutoClips [56] 230 Motion [57] 82 Kineticharts [58] 259 Data-GIF [59] 108 NLV Corpus [60] 893 Sentences NL2VIS [61] 25,750 25,750 SCICAP [62] 358,972 358,972 Table2Charts [63] 266,000 165,000 D STNet [64] 650 650 ChartQA [65] 9,600 Q, 23, 100 A MultiVision [66] 3,920,941 QA GoTreeScape [67] 62,340 62,340 Pictorial [68] 1,371 1,371 CoordNet [69] 32,000 32,000 AutoTitle [70] 6,000 fact-title OldVisOnline [71] 13,511 MoneyVis Fig. 2: Relationship of different datasets in "What". The underlying data are objects for visual mapping and rendering to build visualization components. ...

Parsing and Summarizing Infographics with Synthetically Trained Icon Detection
  • Citing Conference Paper
  • April 2021

... Artificial intelligence and reinforcement learning (RL) have proven to excel in complex games, such as Chess [30], Go [29], Starcraft [37], and Minecraft [21]. Apart from board and video games, computer vision models have recently started playing an important role in sports with several applications in generating sports analytics [13] and analyzing game strategies and tactics [36,24]. ...

Game Plan: What AI can do for Football, and What Football can do for AI
  • Citing Article
  • May 2021

Journal of Artificial Intelligence Research

... In addition to data augmentation techniques, some researchers have explored the idea of training multimodal models while removing certain modalities, with the goal of avoiding overfitting on the original dataset. This approach has shown promising results in some cases, but the degree of improvement can vary greatly depending on the characteristics of the dataset [4]. These findings suggest that there may be other factors, beyond the procedural details of the model itself, that influence the performance of SSL algorithms [5]. ...

Self-Supervised MultiModal Versatile Networks
  • Citing Preprint
  • June 2020

... Our architecture addresses this challenge through a hybrid convolutionaltransformer design with a dual head-eye cross-attention (DHECA) module that processes eye and head image inputs in static and temporal (dynamic) settings. Furthermore, by integrating image super-resolution (SR) and multiscale processing into the method, DHECA-SuperGaze achieves state-of-the-art (SOTA) performance across Gaze360 [16] and GFIE [14] datasets. ...

Gaze360: Physically Unconstrained Gaze Estimation in the Wild
  • Citing Conference Paper
  • October 2019