Antonis A. Argyros

Antonis A. Argyros
University of Crete | UOC · Department of Computer Science

Professor of Computer Science

About

347
Publications
79,084
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,989
Citations
Citations since 2017
95 Research Items
4676 Citations
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
Introduction
Antonis Argyros is a Professor at the Computer Science Department, University of Crete and an Associate Researcher at the Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH) in Heraklion, Crete, Greece.
Additional affiliations
September 2006 - present
University of Crete
Position
  • Professor (Associate)
September 2006 - present
Foundation for Research and Technology - Hellas
Position
  • Researcher
September 1996 - February 1997
KTH Royal Institute of Technology
Position
  • PostDoc Position
Education
December 1992 - November 1996
University of Crete
Field of study
  • Computer Science, Computer Vision
September 1989 - December 1992
University of Crete
Field of study
  • Computer Science
September 1985 - September 1989
University of Crete
Field of study
  • Computer Science

Publications

Publications (347)
Preprint
Full-text available
We investigate the problem of Object State Classification (OSC) as a zero-shot learning problem. Specifically, we propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class. In that direction, we capitalize on Knowledge Graphs...
Article
Full-text available
A roadmap is proposed that defines a systematic approach for craft preservation and its evaluation. The proposed roadmap aims to deepen craft understanding so that blueprints of appropriate tools that support craft documentation, education, and training can be designed while achieving preservation through the stimulation and diversification of prac...
Preprint
Full-text available
A roadmap is proposed that defines a systematic approach for craft preservation and its evaluation. The proposed roadmap aims at deepening craft understanding, so blueprints of appropriate tools that support craft documentation, education, and training can be designed while achieving preservation through the stimulation and diversification of pract...
Chapter
Full-text available
Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate...
Conference Paper
Full-text available
Human and robot collaboration in assembly tasks is an integral part in modern manufactories. Robots provide advantages in both process and productivity with their repeatability and usability in different tasks, while human operators provide flexibility and can act as safeguards. However, process complexity increases which can lower the overall qual...
Conference Paper
Nowadays, deep learning approaches lead the state-of-the-art scores in human activity recognition (HAR). However, the supervised nature of these approaches still relies heavily on the size and the quality of the available training datasets. The complexity of activities of existing HAR video datasets ranges from simple coarse actions, such as sittin...
Chapter
We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a) the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that wi...
Preprint
Full-text available
We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a)the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that wil...
Conference Paper
Full-text available
We present an overview of the SignGuide project. Its main goal is to develop a prototype interactive museum guide system for deaf visitors using mobile devices that will be able to receive visitors’ questions in their native (sign language) with regard to the exhibits and to provide additional content also in sign language using an avatar or video,...
Chapter
Vision transformer architectures have been demonstrated to work very effectively for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. In this paper we investigate the use of a pure transformer architecture (i.e., one with no CNN backbone) for the pro...
Article
Full-text available
In this paper, a representation based on digital assets and semantic annotations is established for Traditional Craft instances, in a way that captures their socio-historic context and preserves both their tangible and intangible Cultural Heritage dimensions. These meaningful and documented experiential presentations are delivered to the target aud...
Article
Full-text available
In the field of human action recognition (HAR), the recognition of actions with large duration is hindered by the memorization capacity limitations of the standard probabilistic and recurrent neural network (R-NN) approaches that are used for temporal sequence modeling. The simplest remedy is to employ methods that reduce the input sequence length,...
Preprint
Full-text available
The detection of object states in images (State Detection - SD) is a problem of both theoretical and practical importance and it is tightly interwoven with other important computer vision problems, such as action recognition and affordance detection. It is also highly relevant to any entity that needs to reason and act in dynamic domains, such as r...
Preprint
Full-text available
Vision transformer architectures have been demonstrated to work very effectively for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. In this paper we investigate the use of a pure transformer architecture (i.e., one with no CNN backbone) for the pro...
Chapter
Full-text available
Action prediction is defined as the inference of an action label while the action is still ongoing. Such a capability is extremely useful for early response and further action planning. In this paper, we consider the problem of action prediction in scenarios involving humans interacting with objects. We formulate an approach that builds time series...
Article
Efficiently coordinating different types of robots is an important enabler for many commercial and industrial automation tasks. Here, we present a distributed framework that enables a team of heterogeneous robots to dynamically generate actions from a common, user-defined goal specification. In particular, we discuss the integration of various robo...
Preprint
Full-text available
The amount and quality of datasets and tools available in the research field of hand pose and shape estimation act as evidence to the significant progress that has been made. We find that there is still room for improvement in both fronts, and even beyond. Even the datasets of the highest quality, reported to date, have shortcomings in annotation....
Preprint
Full-text available
We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the computation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher b...
Conference Paper
Full-text available
Action Quality Assessment (AQA) is a video understanding task aiming at the quantification of the execution quality of an action. One of the main challenges in relevant, deep learning-based approaches is the collection of training data annotated by experts. Current methods perform fine-tuning on pre-trained backbone models and aim to improve perfor...
Article
In social and industrial facilities of the future such as hospitals, hotels, and warehouses, teams of robots will be deployed to assist humans in accomplishing everyday tasks like object handling, transportation, or pickup and delivery operations. In such a context, different robots (e.g., mobile platforms, static manipulators, or mobile manipulato...
Article
Full-text available
Most people touch their faces unconsciously, for instance to scratch an itch or to rest one’s chin in their hands. To reduce the spread of the novel coronavirus (COVID-19), public health officials recommend against touching one’s face, as the virus is transmitted through mucous membranes in the mouth, nose and eyes. Students, office workers, medica...
Preprint
Full-text available
We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive...
Article
Full-text available
The solutions to many computer vision problems, including that of 6D object pose estimation, are dominated nowadays by the explosion of the learning-based paradigm. In this paper, we investigate 6D object pose estimation in a practical, real-word setting in which a mobile device (smartphone/tablet) needs to be localized in front of a museum exhibit...
Preprint
Full-text available
We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3) a static, yet very effective load balancing strategy. The simulator further features an easy to use API and the abili...
Article
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable...
Chapter
Existing, fully supervised methods for person re-identification (ReID) require annotated data acquired in the target domain in which the method is expected to operate. This includes the IDs as well as images of persons in that domain. This is an obstacle in the deployment of ReID methods in novel settings. For solving this problem, semi-supervised...
Article
Background and Objective: The study of small vessels allows for the analysis and diagnosis of diseases with strong vasculopathy. This type of vessels can be observed non-invasively in the retina via fundoscopy. The analysis of these vessels can be facilitated by applications built upon Retinal Image Registration (RIR), such as mosaicing, Super Reso...
Chapter
MuseLearn is a platform that enhances the presentation of the exhibits of a museum with multimedia-rich content that is adapted and recommended for certain visitor profiles and playbacks on their mobile devices. The platform consists mainly of a content management system that stores and prepares multimedia material for the presentation of exhibits;...
Preprint
Full-text available
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable...
Article
formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $Objective:$ In-vivo assessment of small vessels can promote accurate diagnosis and monitoring of diseases related to vasculopathy, such as hypertension and diabetes. The eye provides a unique, open, and accessible window for directly imaging small v...
Article
Full-text available
We present a sensor-fusion method that exploits a depth camera and a gyroscope to track the articulation of a hand in the presence of excessive motion blur. In case of slow and smooth hand motions, the existing methods estimate the hand pose fairly accurately and robustly, despite challenges due to the high dimensionality of the problem, self-occlu...
Article
Full-text available
In this work, we present a novel framework to perform single-shot hand pose estimation using depth data as input. The method follows a coarse to fine strategy and employs several radial basis function networks (RBFNs) that are trained on a dataset containing only synthetically generated depth maps. Thus, compared to most contemporary deep learning...
Preprint
Object perception is a fundamental sub-field of Computer Vision, covering a multitude of individual areas and having contributed high-impact results. While Machine Learning has been traditionally applied to address related problems, recent works also seek ways to integrate knowledge engineering in order to expand the level of intelligence of the vi...
Article
In this article, we present results obtained from field trials with the Hobbit robotic platform, an assistive, social service robot aiming at enabling prolonged independent living of older adults in their own homes. Our main contribution lies within the detailed results on perceived safety, usability, and acceptance from field trials with autonomou...
Preprint
Full-text available
We present a clock-driven Spiking Neural Network simulator which is up to 3x faster than the state of the art while, at the same time, being more general and requiring less programming effort on both the user's and maintainer's side. This is made possible by designing our pipeline around "work queues" which act as interfaces between stages and grea...
Chapter
We present an unsupervised method for the detection of all temporal segments of videos or motion capture data, that correspond to periodic motions. The proposed method is based on the detection of similar segments (commonalities) in different parts of the input sequence and employs a two-stage approach that operates on the matrix of pairwise distan...
Chapter
Full-text available
This paper addresses the problem of 3D hand pose estimation by modeling specific hand actions using probabilistic Principal Component Analysis. For each of the considered actions, a parametric subspace is learned based on a dataset of sample action executions. The developed method tracks the 3D hand pose either in the case of unconstrained hand mot...
Chapter
Preprocessing and enhancement is a prerequisite for a wide range of retinal image analysis methods. The goals of such tasks are to improve images and facilitate their subsequent analysis. Registration of retinal images enables the generation of images of higher definition retinal mosaics and facilitates the comparison of images from different exami...
Preprint
Full-text available
We address the problem of temporal localization of repetitive activities in a video, i.e., the problem of identifying all segments of a video that contain some sort of repetitive or periodic motion. To do so, the proposed method represents a video by the matrix of pairwise frame distances. These distances are computed on frame representations obtai...
Article
Full-text available
We present RFOVE, a region-based method for approximating an arbitrary 2D shape with an automatically determined number of possibly overlapping ellipses. RFOVE is completely unsupervised, operates without any assumption or prior knowledge on the object's shape and extends and improves the Decremental Ellipse Fitting Algorithm (DEFA) [1]. Both RFOVE...
Conference Paper
Full-text available
We present an unsupervised method for the detection of all temporal segments of videos or motion capture data, that correspond to periodic motions. The proposed method is based on the detection of similar segments (commonalities) in different parts of the input sequence and employs a two-stage approach that operates on the matrix of pairwise distan...
Conference Paper
Full-text available
We present a method for 3D hand tracking that exploits spatial constraints in the form of end effector (fingertip) locations. The method follows a generative, hypothesize-and-test approach and uses a hierarchical particle filter to track the hand. In contrast to state of the art methods that consider spatial constraints in a soft manner, the propos...
Conference Paper
Full-text available
We present MocapNET, an ensemble of SNN [28] encoders that estimates the 3D human body pose based on 2D joint estimations extracted from monocular RGB images. MocapNET provides an efficient divide and conquer strategy for supervised learning. It outputs skeletal information directly into the BVH [41] format which can be rendered in real-time or imp...
Poster
Full-text available
We present MocapNET, an ensemble of SNN encoders that estimates the 3D human body pose based on 2D joint estimations extracted from monocular RGB images. MocapNET provides BVH file output which can be rendered in real-time or imported without any additional processing in most popular 3D animation software. The proposed architecture achieves 3D huma...
Chapter
This report outlines the proceedings of the Fourth International Workshop on Observing and Understanding Hands in Action (HANDS 2018). The fourth instantiation of this workshop attracted significant interest from both academia and the industry. The program of the workshop included regular papers that are published as the workshop’s proceedings, ext...
Chapter
Full-text available
In this paper, we present a novel framework for horizon line (HL) detection that can be effectively used for Unmanned Air Vehicle (UAV) navigation. Our scheme is based on a Canny edge and a Hough detector along with an optimization step performed by a Particle Swarm Optimization (PSO) algorithm. The PSO’s objective function is based on a variation...
Book
This book constitutes the refereed proceedings of the 12th International Conference on Computer Vision Systems, ICVS 2019, held in Thessaloniki, Greece, in September 2019. The 72 papers presented were carefully reviewed and selected from 114 submissions. The papers are organized in the following topical sections; hardware accelerated and real time...
Preprint
Full-text available
We present a novel approach for 2D hand keypoint localization from regular color input. The proposed approach relies on an appropriately designed Convolutional Neural Network (CNN) that computes a set of heatmaps, one per hand keypoint of interest. Extensive experiments with the proposed method compare it against state of the art approaches and dem...
Preprint
Full-text available
We propose the first approach to the problem of inferring the depth map of a human hand based on a single RGB image. We achieve this with a Convolutional Neural Network (CNN) that employs a stacked hourglass model as its main building block. Intermediate supervision is used in several outputs of the proposed architecture in a staged approach. To ai...
Article
Full-text available
We present a comparative study of three matrix completion and recovery techniques based on matrix inversion, gradient descent, and Lagrange multipliers, applied to the problem of human pose estimation. 3D human pose estimation algorithms may exhibit noise or may completely fail to provide estimates for some joints. A post-process is often employed...
Preprint
Full-text available
This report outlines the proceedings of the Fourth International Workshop on Observing and Understanding Hands in Action (HANDS 2018). The fourth instantiation of this workshop attracted significant interest from both academia and the industry. The program of the workshop included regular papers that are published as the workshop's proceedings, ext...
Conference Paper
This paper presents the HealthSign project, which deals with the problem of sign language recognition with focus on medical interaction scenarios. The deaf user will be able to communicate in his native sign language with a physician. The continuous signs will be translated to text and presented to the physician. Similarly, the speech will be recog...
Conference Paper
We present a comparative study of three matrix completion and recovery techniques, applied to the problem of human pose estimation. Human pose estimation algorithms may exhibit estimation noise or may completely fail to provide estimates for some joints. A post-process is often employed to recover the missing joints' locations from the available on...
Conference Paper
Full-text available
We present a region based method for segmenting and splitting images of cells in an automatic and unsupervised manner. The detection of cell nuclei is based on the Bradley's method. False positives are automatically identified and rejected based on shape and intensity features. Additionally, the proposed method is able to automatically detect and s...
Conference Paper
Full-text available
We present a solution to the problem of discovering all periodic segments of a video and of estimating their period in a completely unsupervised manner. These segments may be located anywhere in the video, may differ in duration, speed, period and may represent unseen motion patterns of any type of objects (e.g., humans, animals, machines, etc). Th...