
Konstantinos AvgerinakisCatalink Ltd · Computer vision and deep learning laboratory
Konstantinos Avgerinakis
Computer and Communication Engineering
About
61
Publications
70,456
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
780
Citations
Citations since 2017
Introduction
Received his Diploma degree in computer and telecommunication engineering from the University of Thessaly in 2009, and the Phd degree in Electrical Engineering from University of Surrey in 2015. He is the Chief Technical Officer (CTO) and head of the research and development in computer vision and deep learning department with Catalink Ltd since December of 2017. His current research interests include computer vision and statistical video processing for event detection and recognition, human activity and facial expression recognition. Co-authored more than 40 publications in refereed journals and international conferences, while he has also served as a reviewer in a high number of international journals and conferences.
Additional affiliations
March 2010 - present
March 2009 - present
March 2009 - April 2015
Education
January 2010
September 2003 - April 2009
Publications
Publications (61)
The recognition of activities of daily living (ADLs) refers to the classification of activities commonly carried out in daily life, which are of particular interest in numerous applications, such as health monitoring, smart home environments and surveillance systems. We introduce a novel method for activity recognition, which achieves high recognit...
Combining multimodal concept streams from heterogeneous sensors is a problem superficially explored for activity recognition. Most studies explore simple sensors in nearly perfect conditions, where temporal synchronization is guaranteed. Sophisticated fusion schemes adopt problem-specific graphical representations of events that are generally deepl...
Human activity recognition has gained a lot of attention in the computer vision society , due to its usefulness in numerous contexts. This work focuses on the recognition of Activities of Daily Living (ADL), which involves recordings constrained to specific daily activities that are of interest in assisted living or smart home environments. We pres...
The latest advancements in Machine Learning have led to impressive capabilities in distinguishing emotions from facial expressions, allowing computers and smart devices to accurately detect and interpret human emotions through computer vision. While a lot of work has been conducted on understanding human expressions by utilizing visual information,...
Central nervous system diseases (CNSDs) lead to significant disability worldwide. Mobile app interventions have recently shown the potential to facilitate monitoring and medical management of patients with CNSDs. In this direction, the characteristics of the mobile apps used in research studies and their level of clinical effectiveness need to be e...
This paper presents Zenon, an affective, multi-modal conversational agent (chatbot) specifically designed for treatment of brain diseases like multiple sclerosis and stroke. Zenon collects information from patients in a non-intrusive way and records user sentiment using two different modalities: text and video. A user-friendly interface is designed...
Autonomous driving is undoubtedly the future of the automotive industry. However, the complete migration to those technologies is not a short-term goal and it will not happen simultaneously throughout the world. Thus, existing technologies such the Driver State Monitoring systems are still relevant and could potentially foresee and prevent the occu...
Semantic Web technologies are increasingly being deployed in various e-health scenarios, prominently due to their inherent capacity to harmonize heterogeneous information from diverse sources and devices, as well as their capability to provide meaningful interpretations and higher-level insights. This paper reports on ongoing work in the recently s...
Nowadays, vast amounts of multimedia content are being produced, archived, and digitized, resulting in great troves of data of interest. Examples include user-generated content, such as images, videos, text, and audio posted by users on social media and wikis, or content provided through official publishers and distributors, such as digital librari...
Driver's drowsiness and inattention have been proven to be two of the major contributing factors to car accidents. Therefore, both academia and industry have directed their research interest to the development of Driver State Monitoring solutions to foresee and prevent the occurrence of such incidents. To this end, in this paper we present IRIS, a...
A novel first-person human activity recognition framework is proposed in this work. Our proposed methodology is inspired by the central role moving objects have in egocentric activity videos. Using a Deep Convolutional Neural Network we detect objects and develop discriminant object flow histograms in order to represent fine-grained micro-actions d...
In the last few years, terrorism, organized crime and cybercrime have gained more ground as society is becoming increasingly digitized. In this landscape, Law Enforcement Agencies (LEAs) should adapt by applying cross-domain expertise and follow an integrated approach that supersedes the traditional barriers in policing practices. In this light, th...
Although many Ambient Intelligence frameworks either address heterogeneous ambient sensing or computer vision techniques, very limited work integrates both techniques in the scope of activity recognition in pervasive environments. This paper presents such a framework that integrates both a computer vision component and heterogeneous sensors with un...
Recognition of daily actions is an essential part of Ambient Assisted Living (AAL) applications and still not fully solved. In this work, we propose a novel framework for the recognition of actions of daily living from depth-videos. The framework is based on low-level human pose movement descriptors extracted from 3D joint trajectories as well as d...
Oil spill is considered one of the main threats to marine and coastal environments. Efficient monitoring and early identification of oil slicks are vital for the corresponding authorities to react expediently, confine the environmental pollution and avoid further damage. Synthetic aperture radar (SAR) sensors are commonly used for this objective du...
Object detection is a hot topic with various applications in computer vision, e.g., image understanding, autonomous driving, and video surveillance. Much of the progresses have been driven by the availability of object detection benchmark datasets, including PASCAL VOC, ImageNet, and MS COCO. However, object detection on the drone platform is still...
Single-object tracking, also known as visual tracking, on the drone platform attracts much attention recently with various applications in computer vision, such as filming and surveillance. However, the lack of commonly accepted annotated datasets and standard evaluation platform prevent the developments of algorithms. To address this issue, the Vi...
Drones equipped with cameras have been fast deployed to a wide range of applications, such as agriculture, aerial photography, fast delivery, and surveillance. As the core steps in those applications, video object detection and tracking attracts much research effort in recent years. However, the current video object detection and tracking algorithm...
Oil spill pollution comprises a significant threat of the oceanic and coastal ecosystems. A continuous monitoring framework with automatic detection capabilities could be valuable as an early warning system so as to minimize the response time of the authorities and prevent any environmental disaster. The usage of Synthetic Aperture Radar (SAR) data...
This work reports the methodology that CERTH-ITI team developed so as to recognize the emotional impact that movies have to its viewers in terms of valence/arousal and fear. More Specifically, deep convolutional neural newtworks and several machine learning techniques are utilized to extract visual features and classify them based on the predicted...
This paper presents the algorithms that CERTH-ITI team deployed
to tackle flood detection and road passability from social media
and satellite data. Computer vision and deep learning techniques
are combined in order to analyze social media and satellite images,
while word2vec is used to analyze textual data. Multimodal fusion
is also deployed in CE...
are in great need of acquiring, re-using and re-purposing visual
and textual data to recreate, renovate or produce a novel target
space, building or element. This come in align with the abrupt
increase, which is lately observed, in the use of immersive VR
environments and the great technological advance that can be found
in the acquisition and mani...
A novel work for Ambient Assisted Living applications is presented here. More specifically, this paper focuses on activity recognition from recordings of daily living captured by wearable cameras. It constructs a discriminant object centric motion descriptor for representing the micro-actions within the viewpoint of the action maker so as to later...
The identification of flooded areas over Earth Observation (EO) satellite images has paved the way to monitor damaged areas and take effective actions. Classifying all pixels of a satellite image as a flooded area or not allows for creating maps which are then used by civil protection agencies and first responders. In this work, a method, firstly i...
This work focuses on detecting and localizing a wide range of dynamic textures in video sequences captured by surveillance cameras. Their reliable and robust analysis constitutes a challenging task for traditional computer vision methods, due to barriers like occlusions, the highly non-rigid nature of the moving entities and the complex stochastic...
The analysis of dynamic scenes in video is a very useful task especially for the detection and monitoring of natural hazards such as floods and fires. In this work, we focus on the challenging problem of real-world dynamic scene understanding, where videos contain dynamic textures that have been recorded in the "wild". These videos feature large il...
LBP-Flow for dynamic scene understanding presentation
Deep Nets (visual) and DBpedia spotlight (textual) were combined under a late fusion technique to detect flood events in satellite and social media data
This paper presents the algorithms that CERTH team deployed in order to tackle disaster recognition tasks and more specifically Disaster Image Retrieval from Social Media (DIRSM) and Flood-Detection in Satellite images (FDSI). Visual and textual analysis, as well as late fusion of their similarity scores, were deployed in social media images, while...
CERTH traffic management presentation
Surveillance and more specifically traffic management technologies constitute one of the most intriguing aspects of smart city applications. In this work we investigate the applicability of an object detector for vehicle detection and propose a novel hybrid shallow-deep representation to surpass its limits. Furthermore, we leverage the detector's o...
Human activity detection from video that is recorded continuously over time has been gaining increasing attention due to its use in applications like security monitoring, smart homes and assisted living setups. The analysis of continuous videos for the detection of specific activities, called Activities of Interest (AoI) in this work, is particular...
This work proposes a framework for the efficient recognition of activities of daily living (ADLs), captured by static color cameras, applicable in real world scenarios. Our method reduces the computational cost of ADL recognition in both compressed and uncompressed domains by introducing system level improvements in State-of-the-Art activity recogn...
The objective of Dem@Care is the development of a complete system providing personal health services to people with dementia, as well as medical professionals and caregivers, by using a multitude of sensors, for context-aware, multi-parametric monitoring of lifestyle, ambient environment, and health parameters. Multi-sensor data analysis, combined...
This paper presents VERGE interactive search engine, which is capable of browsing and searching into video content. The system integrates content-based analysis and retrieval modules such as video shot segmentation, concept detection, clustering, as well as visual similarity and object-based search.
Detection and recognition for Activities of Daily Living (ADLs) from visual data is a useful tool for enabling unobtrusive home environment monitoring. ADLs are detected spatio-temporally in long videos, while activity recognition is applied for the purposes of human behaviour analysis and life logging. We propose a novel ADL detection schema for t...
The abrupt expansion of the Internet use over the last decade led to an uncontrollable amount of media stored in the Web. Image, video and news information has flooded the pool of data that is at our disposal and advanced data mining techniques need to be developed in order to take full advantage of them. The focus of this thesis is mainly on devel...
This paper presents VERGE interactive video retrieval engine, which is capable of searching into video content. The system integrates several content-based analysis and retrieval modules such as video shot boundary detection, concept detection, clustering and visual similarity search.
This paper presents VERGE interactive video retrieval engine, which is capable of searching and browsing video content. The system integrates several content-based analysis and retrieval modules such as video shot segmentation and scene detection, concept detection, hierarchical clustering and visual similarity search into a user friendly interface...
This paper provides an overview of the tasks submitted to TRECVID 2013 by ITI-CERTH. ITI-
CERTH participated in the Semantic Indexing (SIN), the Event Detection in Internet Multimedia
(MED), the Multimedia Event Recounting (MER) and the Instance Search (INS) tasks. In the SIN
task, techniques are developed, which combine new video representations (...
Activity recognition is one of the most active topics within computer vision. Despite its popularity, its application in real life scenarios is limited because many methods are not entirely automated and consume high computational resources for inferring information. In this work, we contribute two novel algorithms: (a) one for automatic video sequ...
Computer vision technologies and more specifically activity recognition can be
considered one of the most helpful tools that computer science can provide to the
society’s disposal. Activity recognition deals with the visual analysis of video sequences and provides semantic information about the activities that may occur within them. In state
-of-th...
The recognition of Activities of Daily Living (ADL) from video can prove particularly useful in assisted living and smart home environments, as behavioral and lifestyle profiles can be constructed through the recognition of ADLs over time. Often, existing methods for recognition of ADLs have a very high computational cost, which makes them unsuitab...
This paper presents a new method for human action recognition which exploits advantages of both trajectory and space-time based approaches in order to identify action patterns in given sequences. Videos with both a static and moving camera can be tackled, where camera motion effects are overcome via motion compensation. Only pixels undergoing chang...
We present a novel algorithm for detecting and localizing
smoke in video. The first step of our method focuses on de-
tecting the presence of smoke in video frames, while in the
second part localization of smoke particles in the scene takes
place. In our implementation, we take advantage of both ap-
pearance and motion information, so that we can e...
A novel algorithm for helping patients with dementia, based on computer vision and machine learning technologies, is presented. Static and wearable cameras are used in order to record the activities that an elder performs throughout day. The goal of this task is to recognize daily activities of the patients with dementia in order to develop behavio...
An original approach for real time detection of changes in motion is presented, for detecting and recognizing events. Current video change detection focuses on shot changes, based on appearance, not motion. Changes in motion are detected in pixels that are found to be active, and this motion is input to sequential change detection, which detects ch...
An original approach for real time detection of changes in motion is presented, which can lead to the detection and recognition of events. Current video change detection focuses on shot changes which depend on appearance, not motion. Changes in motion are detected in pixels that are found to be active via the kurtosis. Statistical modeling of the m...
The use of multimedia data has expanded into many domains and applications beyond technical usage, such as surveillance, home mon- itoring, health supervision, judicial applications. This work is concerned with the application of video processing techniques to judicial trials in or- der to extract useful information from them. The automated process...
Questions
Questions (11)
I was wondering if anyone knows or have published a technique that sucessfully combines shallow (HOG, SIFT, LBP) with deep (GoogLeNet) representation? I am interested both for images and video cases.
Hello everyone,
Does anyone know if I can use video sequences from movies in my research? More specifically, is it legal to trim video segments from movies and run computer vision algorithms on them or would I have copyright issues when I am going to present my experimental results in a international conference? Is there a company that could grants permits for digital media? Does the same law applies for EU projects?
Mirror neuron and facial expressions when watching TV.
I would like to ask if there is a benchmark dataset with video samples that is shown in human subjects and induce specific emotions-moods-sentiments (happiness, sadness, anger, etc)?
The purpose of my work is to gather a group of people and monitor their reactions with a face expression software when watching specific videos.
The experiment would look like clockwork orange movie:
Which are the best available face expression techniques? Any literature/survey, code and dataset available free online? Is there any ground of progress for Kinect sensor?
Particle filtering, PSO, mean shift, Kalman filter are used for tracking objects within video sequences. Under your consideration which one is the most accurate one? Do you have in mind any code available?