Mattia Soldan

Mattia Soldan
King Abdullah University of Science and Technology | KAUST · Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE)

Master of Science
PhD student at King Abdullah University of Science and Technology (KAUST).

About

8
Publications
566
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7
Citations

Publications

Publications (8)
Technical Report
Full-text available
In Large-Scale Sensor Networks where powered sinks are not present, sensors must not only collect data but also store it until an external entity gathers the sensed information. A smart approach is to consider a network configuration in which K out of the N total nodes are able to sense the environment and collect information packets while the rema...
Preprint
Full-text available
In this paper, we introduce the task of retrieving relevant video moments from a large corpus of untrimmed, unsegmented videos given a natural language query. Our task poses unique challenges as a system must efficiently identify both the relevant videos and localize the relevant moments in the videos. This task is in contrast to prior work that lo...
Preprint
Smartphones and wearable devices are fast growing technologies that, in conjunction with advances in wireless sensor hardware, are enabling ubiquitous sensing applications. Wearables are suitable for indoor and outdoor scenarios, can be placed on many parts of the human body and can integrate a large number of sensors capable of gathering physiolog...
Preprint
Full-text available
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query. The solution to this challenging task demands the understanding of videos' and queries' semantic content and the fine-grained reasoning about their multi-modal interactions. Our key idea is to recast this challenge into...
Preprint
Full-text available
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitat...
Preprint
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pi...
Preprint
Full-text available
Video-Language Pretraining (VLP), aiming to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Dominant works that achieve strong performance rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D...

Network

Cited By