Vasileios Mezaris's research while affiliated with Information Technologies Institute (ITI) and other places

Publications (229)

Preprint
In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is proposed. The ViGAT head consists of graph attent...
Article
Full-text available
This survey considers the vision of TV broadcasting where content is personalised and personalisation is data-driven, looks at the AI and data technologies making this possible and surveys the current uptake and usage of those technologies. We examine the current state-of-the-art in standards and best practices for data-driven technologies and iden...
Article
Full-text available
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades, and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on th...
Chapter
Full-text available
Artificial Intelligence brings exciting innovations in all aspects of life and creates new opportunities across industry sectors. At the same time, it raises significant questions in terms of trust, ethics, and accountability. This paper offers an introduction to the AI4Media project, which aims to build on recent advances of AI in order to offer i...
Chapter
Migration, and especially irregular migration, is a critical issue for border agencies and society in general. Migration-related situations and decisions are influenced by various factors, including the perceptions about migration routes and target countries. An improved understanding of such factors can be achieved by systematic automated analyses...
Chapter
Full-text available
In this paper, we present an overview of the MOVING platform, a user-driven approach that enables young researchers, decision makers, and public administrators to use machine learning and data mining tools to search, organize, and manage large-scale information sources on the web such as scientific publications, videos of research talks, and social...
Chapter
This paper presents VERGE, an interactive video search engine that supports efficient browsing and searching into a collection of images or videos. The framework involves a variety of retrieval approaches as well as reranking and fusion capabilities. A Web application enables users to create queries and view the results in a fast and friendly manne...
Preprint
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the...
Book
The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021. Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were a...
Book
The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021. Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were a...
Article
This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that increme...
Conference Paper
People with intellectual disabilities (ID) encounter several problems regarding the interaction with their environment in terms of their daily needs, activities and communication. On this concept, an interactive support system with multiple functionalities is proposed, aiming at optimizing the opportunities provided to people with ID in order to co...
Chapter
This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we first develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of l...
Chapter
During minibatch gradient-based optimization, the contribution of observations to the updating of the deep neural network’s (DNN’s) weights for enhancing the discrimination of certain classes can be small, despite the fact that these classes may still have a large generalization error. This happens, for instance, due to overfitting, i.e. to classes...
Chapter
This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a...
Conference Paper
In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization. Our starting point is the SUM-GAN model, which creates a representative summary based on the intuition that such a summary should make it possible to reconstruct a video that is indistinguishable from the original one. We buil...
Conference Paper
Technological developments in comprehensive video understanding - detecting and identifying visual elements of a scene, combined with audio understanding (music, speech), as well as aligned with textual information such as captions, subtitles, etc. and background knowledge - have been undergoing a significant revolution during recent years. The wor...
Chapter
This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the...
Chapter
This chapter presents the techniques researched and developed within InVID for the forensic analysis of videos, and the detection and localization of forgeries within User-Generated Videos (UGVs). Following an overview of state-of-the-art video tampering detection techniques, we observed that the bulk of current research is mainly dedicated to fram...
Chapter
Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content,...
Chapter
This chapter analyses the literature and presents the research efforts for improving concept‐based and event‐based video search. It focuses on feature extraction using hand‐crafted and deep convolutional neural networks (DCNN)‐based descriptors, dimensionality reduction using accelerated generalised subclass discriminant analysis (AGSDA), cascades...
Chapter
The MOVING platform enables its users to improve their information literacy by training how to exploit data and text mining methods in their daily research tasks. In this paper, we show how it can support researchers in various tasks, and we introduce its main features, such as text and video retrieval and processing, advanced visualizations, and t...
Book
The two-volume set LNCS 11295 and 11296 constitutes the thoroughly refereed proceedings of the 25th International Conference on MultiMedia Modeling, MMM 2019, held in Thessaloniki, Greece, in January 2019. Of the 172 submitted full papers, 49 were selected for oral presentation and 47 for poster presentation; in addition, 6 demonstration papers, 5...
Chapter
Full-text available
This paper describes the combination of advanced technologies for social-media-based story detection, story-based video retrieval and concept-based video (fragment) labeling under a novel approach for multimodal video annotation. This approach involves textual metadata, structural information and visual concepts - and a multimodal analytics dashboa...
Book
This book presents the latest technological advances and practical tools for discovering, verifying and visualizing social media video content, and managing related rights. The digital media revolution is bringing breaking news to online video platforms, and news organizations often rely on user-generated recordings of new and developing events sha...
Article
Full-text available
Automatic understanding and analysis of groups has attracted increasing attention in the vision and multimedia communities in recent years. However, little attention has been paid to the automatic analysis of the non-verbal behaviors and how this can be utilized for analysis of group membership, i.e., recognizing which group each individual is part...
Article
In this work we propose a DCNN (Deep Convolutional Neural Network) architecture that addresses the problem of video/image concept annotation by exploiting concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn conceptspecific representations that are sparse, linear...
Article
In the Big Data era, people can access vast amounts of information, but often lack the time, strategies and tools to efficiently extract the necessary knowledge from it. Research and innovation staff needs to effectively obtain an overview of publications, patents, products, funding opportunities, etc., to derive an innovation strategy. The MOVING...
Chapter
Multimedia content and especially personal multimedia content is created in abundance today. Short- to mid-term storage of this content is typically no problem due to decreased storage prices and the availability of storage services. However, for the long-term perspective, i.e., preservation, adequate technologies and best practices for keeping the...
Chapter
As multimedia applications have become part of our life, preservation and long-term access to the multimedia elements that are continuously produced is a major consideration, both for many organizations that generate or collect and need to maintain digital content, and for individuals. In this chapter, we focus primarily on the following multimedia...
Chapter
Without context, words have no meaning, and the same is true for documents, in that often a wider context is required to fully interpret the information they contain. For example, a family photo is practically useless if you do not know who the people portrayed in it are, and likewise, a document that refers to the president of the US is of little...
Chapter
Full-text available
This paper presents an algorithm for the temporal segmentation of user-generated videos into visually coherent parts that correspond to individual video capturing activities. The latter include camera pan and tilt, change in focal length and camera displacement. The proposed approach identifies the aforementioned activities by extracting and evalua...
Article
Full-text available
In this paper, we propose a maximum margin classifier that deals with uncertainty in data input. More specifically, we reformulate the SVM framework such that each training example can be modeled by a multi-dimensional Gaussian distribution described by its mean vector and its covariance matrix -- the latter modeling the uncertainty. We address the...
Conference Paper
Full-text available
This paper presents a novel open-source browser plug-in that aims at supporting journalists and news professionals in their efforts to verify user-generated video. The plug-in, which is the result of an iterative design thinking methodology, brings together a number of sophisticated multimedia analysis components and third party services, with the...
Conference Paper
Educational and Knowledge Technologies (EdTech), especially in connection to multimedia content and the vision of mobile and personalized learning, is a hot topic in both academia and the business start-ups ecosystem. The driver and enabler of this is on the one side the development and widespread availability of multimedia materials and MOOCs, whi...
Conference Paper
In this paper a novel incremental dimensionality reduction (DR) technique called incremental accelerated kernel discriminant analysis (IAKDA) is proposed. Consisting of the eigenvalue decomposition of a relatively small-size matrix and the recursive block Cholesky factorization of the kernel matrix, a nonlinear DR transformation is efficiently comp...
Conference Paper
Full-text available
This paper gives an overview of the First International Workshop on Multimedia Verification, organized as part of the 2017 ACM Multimedia Conference. The paper outlines the current verification scene and needs, discusses the goals of the workshop, and presents the workshop's program, consisting of two invited keynote talks and three presentations o...
Conference Paper
Full-text available
Automatic understanding and analysis of groups has attracted increasing attention in the vision and multimedia communities in recent years. However, little attention has been paid to the automatic analysis of group membership-i.e., recognizing which group the individual in question is part of. This paper presents a novel two-phase Support Vector Ma...
Article
Full-text available
Deep Learning (DL) has become a crucial technology for multimedia computing. It offers a powerful instrument to automatically produce high-level abstractions of complex multimedia data, which can be exploited in a number of applications, including object detection and recognition, speech-to- text, media retrieval, multimodal data analysis, and so o...
Conference Paper
Full-text available
This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each...
Conference Paper
The diffusion of digital photography lets people take hundreds of photos during personal events, such as trips and ceremonies. Many methods have been developed for summarizing such large personal photo collections. However, they usually emphasize the coverage of the original collection, without considering which photos users would select, i.e. thei...
Conference Paper
This paper presents a fully-automatic method that combines video concept detection and textual query analysis in order to solve the problem of ad-hoc video search. We present a set of NLP steps that cleverly analyse different parts of the query in order to convert it to related semantic concepts, we propose a new method for transforming concept-bas...
Conference Paper
Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we present a fully-automatic zero-example event detection method that is based on translating the event description to a predefined set of concepts for whic...
Conference Paper
Full-text available
In this study we compare three different fine-tuning strategies in order to investigate the best way to transfer the parameters of popular deep convolutional neural networks that were trained for a visual annotation task on one dataset, to a new, considerably different dataset. We focus on the concept-based image/video annotation problem and use Im...
Conference Paper
This paper presents VERGE interactive video retrieval engine, which is capable of browsing and searching into video content. The system integrates several content-based analysis and retrieval modules including concept detection, clustering, visual similarity search, object-based search, query analysis and multimodal and temporal fusion.
Article
Full-text available
This paper presents a web-based interactive tool for time-efficient instance-level spatiotemporal labeling of videos, based on the re-detection of manually selected objects of interest that appear in them. The developed tool allows the user to select a number of instances of the object that will be used for annotating the video via detecting and sp...
Conference Paper
In this paper, a combined nonlinear dimensionality reduction and multiclass classification framework is proposed. Specifically, a novel discriminant analysis (DA) technique, called accelerated kernel subclass discriminant analysis (AKSDA), derives a discriminant subspace, and a linear multiclass support vector machine (MSVM) computes a set of separ...
Conference Paper
In this work we propose a method that integrates multi-task learning (MTL) and deep learning. Our method appends a MTL-like loss to a deep convolutional neural network, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. We apply the proposed method on a trans...
Article
In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicabi...