Lazaros Vrysis

Lazaros Vrysis
Aristotle University of Thessaloniki | AUTH · Department of Cinema

Doctor of Engineering
Lecturer - Researcher - Consultant

About

59
Publications
9,737
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
420
Citations
Education
February 2012 - July 2019
Aristotle University of Thessaloniki
Field of study
  • Electrical & Computer Engineer

Publications

Publications (59)
Article
Semantic audio analysis has become a fundamental task in modern audio applications, making the improvement and optimization of classification algorithms a necessity. Standard frame-based audio classification methods have been optimized and modern approaches introduce engineering methodologies that capture the temporal dependency between successive...
Article
Full-text available
Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have se...
Article
Full-text available
Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creat...
Article
Modern feature-based methodologies in semantic audio applications attempt to capture the temporal dependency of successive feature observations, which form the so-called texture windows. This paper proposes an enhancement of this type of processing, known as temporal feature integration, by employing and testing alternative deployable strategies. S...
Preprint
Full-text available
Deep learning has been applied to diverse audio semantics tasks, enabling the construction of models that learn hierarchical levels of features from high-dimensional raw data, delivering state-of-the-art performance. But do these algorithms perform similarly in real-world conditions, or just at the benchmark, where their high learning capability as...
Conference Paper
Reverberation is ubiquitous in everyday listening environments, from meeting rooms to concert halls and record-ing studios. While reverberation is usually described by the reverberation time, getting further insight concerning the characteristics of a room requires to conduct acoustic measurements and calculate each reverberation param-eter manuall...
Conference Paper
Full-text available
In the present work, a crowdsourcing approach is designed, to investigate the correlation between air and noise pollution in urban areas. Citizens are requested to provide air quality measurements and audio recordings using a prototype mobile application specially designed to motivate them to undertake the task of audiovisual capturing. Different u...
Article
Full-text available
This manuscript discusses the robustness to noise of deep learning models for two audio classification tasks. The first task is a speaker recognition application, trying to identify five different speakers. The second one is a speech command identification where the goal is to classify ten voice commands. These two tasks are very important to make...
Article
Full-text available
Media authentication relies on the detection of inconsistencies that may indicate malicious editing in audio and video files. Traditionally, authentication processes are performed by forensics professionals using dedicated tools. There is rich research on the automation of this procedure, but the results do not yet guarantee the feasibility of prov...
Article
The current paper introduces a multimodal framework to provide Web-TV automations for live broadcasting and overall big streaming data management. The term indexing refers to the spatiotemporal localization of speakers participating in a discussion panel. Multiple modalities acting in parallel form the data-driven decision-making pipeline. The auto...
Article
Full-text available
Speech Emotion Recognition (SER) is an important part of Affective Computing and emotionally aware Human-Computer Interaction. Emotional expression may vary depending on the language, culture, and the speaker’s personality and vocal attributes. Speaker-adaptive systems can address this issue. In real-world applications, it is not feasible to obtain...
Conference Paper
In this research a novel deep learning architecture is proposed for the problem of speech commands recognition. The problem is examined in the context of internet-of-things where most devices have limited resources in terms of computation and memory. The uniqueness of the architecture is that it uses a new feature pooling mechanism, named entropy p...
Article
Full-text available
The current paper focuses on the development of an enhanced Mobile Journalism (MoJo) model for soundscape heritage crowdsourcing, data-driven storytelling, and management in the era of big data and the semantic web. Soundscapes and environmental sound semantics have a great impact on cultural heritage, also affecting the quality of human life, from...
Article
In this paper, an audio-driven, multimodal approach for speaker diarization in multimedia content is introduced and evaluated. The proposed algorithm is based on semi-supervised clustering of audio-visual embeddings, generated using deep learning techniques. The two modes, audio and video, are separately addressed; a long short-term memory Siamese...
Conference Paper
Digital storytelling is a topic of great academic interest, due to its countless uses in a variety of areas, educational or not. Its impact on learning contexts has become the center of research for a number of studies, trying to examine the field from a diverse set of perspectives. The latest technological achievements, including the adoption of m...
Conference Paper
publication description During the recent years, convolutional neural networks have been the standard on audio semantics, surpassing traditional classification approaches which employed hand-crafted feature engineering as front-end and various classifiers as back-end. Early studies were based on prominent 2D convolutional topologies for image recog...
Conference Paper
Prototypical part network (ProtoPNet) is a novel method proposed for the task of image classification, offering the ability to interpret the network’s reasoning process during classification. The subject of this work is the examination of ProtoPNet as an unsupervised anomaly detection method, through its application at the Detection and Classificat...
Article
A model for Speech Emotion Recognition, based on a Convolutional Neural Networks (CNN) architecture is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), while data augmentation tech-niques are ap...
Conference Paper
A collaborative mobile platform called MoJo-MATE (Mobile Journalism Machine AssisTEd reporting) has been proposed for application in newsrooms and their working cycles. The challenge with such a client-server service is that it should cope with the traditional journalistic practices in the mainstream media organizations. Because of the large expans...
Article
Full-text available
MoJo refers to the emerging framework of covering news reporting workflows using smart mobile devices with dedicated software or hardware. However, a fully integrated and dedicated computing environment that can address the individual needs of both professional and citizen journalists is still missing. We introduce MoJo–MATE, (Mobile Journalism Mac...
Article
Full-text available
Art and technology have always been very tightly intertwined, presenting strong influences on each other. On the other hand, technological evolution led to today’s digital media landscape, elaborating mediated communication tools, thus providing new creative means of expression (i.e., new-media art). Rich-media interaction can expedite the whole pr...
Article
Full-text available
Over the past decade, mobile news production has had a growing prevalence and has been established as a new type by modern journalism industry. Journalists understand content capturing and sharing as parts of their role in newsrooms. Mobile journalism (mojo) is an evolving form of reporting in which where people use only a smartphone to create and...
Poster
This work focuses on the interpretation of the spectrotemporal parameters of the auditory brainstem response (ABR) through a machine learning approach to predict the relationship between the ABR waveform shape and perceived loudness in different degrees of hearing loss. A total of 397 tone-burst (1 & 4 kHz) auditory brainstem responses from 8 heari...
Article
Full-text available
During the last years, there has been a growing multidisciplinary interest in alternative educational approaches, such as serious games, aiming at enhancing thinking skills and media literacy. Likewise, the objective of this study is to present the design and the development of an educational web application for learning the necessary steps towards...
Article
Full-text available
Temporal feature integration refers to a set of strategies attempting to capture the information conveyed in the temporal evolution of the signal. It has been extensively applied in the context of semantic audio showing performance improvements against the standard frame-based audio classification methods. This paper investigates the potential of a...
Conference Paper
MoJo refers to the emerging framework of covering news reporting workflows using smart mobile devices with dedicated software or hardware. However, a fully integrated and dedicated computing environment that can address the individual needs of both professional and citizen journalists is still missing. We introduce MoJo-MATE, (Mobile Journalism Mac...
Article
Objective: To develop and evaluate a software application capable of conducting Pure-Tone Audiometry tests in clinical practice. Design: We designed and developed a mobile software application for iPad devices that performs Pure- Tone Audiometry according to ANSI and IEC standards. The application is proposed to be operated by a trained audiologist...
Conference Paper
Full-text available
Hearing impaired (ΗΙ) listeners often struggle to follow conversations when exposed in a complex acoustic environment. This is partly due to the reduced ability in recovering the target speech Temporal Envelope (ENV) cues from Temporal Fine Structure (TFS). This study investigates the enhancement of speech intelligibility in HI listeners, by proces...
Conference Paper
The evaluation of sound level measuring mobile applications, shows that the development of a sophisticated audio analysis framework for voice-recording purposes may be useful for journalists. In many audio recording scenarios, the repetition of the procedure is not an option, and under unwanted conditions the quality of the capturing is possibly de...
Conference Paper
Full-text available
Some of the main issues that emerge during crisis news reporting concern the information quality and credibility, as well as the efficiency of delivering news to the public. In many paradigms, citizens using their smartphones become reporters and contribute potentially valuable audiovisual information via social media platforms from the ground duri...
Conference Paper
Early components of the auditory evoked potentials (AEP) reflect the neural processing of acoustic stimuli in the brainstem and the sub-cortical regions. Relating AEP patterns to their stimulus characteristics is a notoriously difficult task, due to the variability of their morphology. In this study, tone-Burst evoked auditory brainstem and middle-...
Conference Paper
Evaluation of mobile applications serving sound measurement procedures demonstrate that the establishment of a "smart" framework for sound signal management of recordings for journalism/reporting needs is valid. In most scenarios concerning sound capturing for journalistic purposes, repetition of recording is not possible, and, thus, error detectio...
Conference Paper
Full-text available
The current work focuses on the implementation of an online wavelet domain Wiener Filter denoiser (on the cloud), that is proposed for speech enhancement purposes. The outmost goal of the current project is to provide a quick and easy way to real-world speech denoising, with the easiest and most direct way feasible. Optimum configuration and adapta...
Conference Paper
Continuous technologic advances and growing adoption of technology by society highlight the importance of integrating educational technology in learning activities. This integration can be introduced to existing teaching tools but can also be used to design new tools that will focus to new emerging learning needs and new media literacies. Game base...
Conference Paper
Full-text available
Speech emotion is an important paralinguistic element of speech communication, which undoubtedly involves high level of subjectivity, without concrete modeling of the implicated emotional states. Specifically, sentimental expression varies in great proportions among different spoken languages and persons. The current work is focused on the investig...
Article
The current paper investigates the design of a collaborative Mobile Cloud Computing model to support the workflow of collecting, editing and publishing news reporting material, aiming at better managing technology and human resources. While semantic services and tools have made tremendous progress in both academic and applied level, journalists don...
Conference Paper
Full-text available
The industrialization and mobilization of human endeavor have led to increased noise production. Low-frequency noise is a major component of occupational noise, which is emitted from a variety of sources. Reduced perception abilities and risks to workers’ health and safety are some of the effects caused by the exposure to low-frequency noise. The m...
Conference Paper
The current paper investigates the design of a collaborative Mobile Cloud Computing model to support the workflow of collecting, editing and publishing news reporting material, aiming at better managing technology and human resources. While semantic services and tools have made tremendous progress in both academic and applied level, journalists don...
Article
Full-text available
The present paper focuses on high-accuracy block-based sub-pixel motion estimation utilizing a straightforward error minimization approach. In particular, the mathematics of bilinear interpolation are utilized for the selection of the candidate motion vectors that minimize the error criterion, by estimating local minima in the error surface with ar...
Article
Sub-pixel motion estimation plays a vital role in a multitude of video applications, including encoding, audiovisual archiving/heritage and super-resolution enhancement. Most existing block-based methods rely on the implicit assumption that blocks can be accurately predicted through appropriate shifts. In particular, shifted blocks in the target fr...
Conference Paper
Recent technologic advances and the continuous increase of software and hardware integration in people’s daily lives has extended research interest for the field of human computer interaction. User involvement in the design of software, among which games consist a big part, has also been on researchers’ spotlight, even from the early childhood. The...
Article
Full-text available
In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this contex...
Conference Paper
Semantic audio analysis has become a fundamental task in contemporary audio applications; consequently, further improvement and optimization of classification algorithms has also become a necessity. During the recent years, standard frame-based audio classification methods have been optimized and modern approaches introduced additional feature engi...
Article
The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focu...
Conference Paper
In recent years, audiometric equipment remains expensive and unaffected by technological progress. This paper presents the design and development of an iOS-based application, which can be used to conduct audiometric tests comparable to a diagnostic audiometer, without additional external equipment, ergo updating the procedure and reducing the cost...
Poster
In order to estimate speech onset latencies in naming experiments, it is common to use technologies that rely on sound pressure changes, such as voice-key devices. These devices are used in online experiments and are prone to data loss. Moreover, several studies have revealed that voice-key devices suffer from low accuracy due to poor detection of...
Conference Paper
As technology infiltrates every aspect of students' daily lives, game design acquires a bigger and more influential impact on shaping students' personalities and development of learning competencies. Considering the continuously increasing research on the use of educational games in classrooms, the researcher identifies another great interest in th...
Conference Paper
During the previous years, technology has infiltrated in the daily lives of a great number of people. During this revolution, teachers and instructional designers have been introduced with a variety of new tools, such as computers, tablets, interactive tabletops and surfaces, augmented and virtual reality kits, etc. Consequently, schools, instituti...
Conference Paper
Full-text available
With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music Classification and Detection task is proposed and evaluated. The main algorithm for the Detection task employs a self-similarity matrix analysis technique to detect homogeneous segments of audio that can be subsequently classified as music or speech by...
Conference Paper
Full-text available
The task of general audio detection and segmentation based in means of machine learning is very popular and high-demanding procedure nowadays. Most relevant works in the last decade aim at modelling audio in order to conduct a semantics analysis and a high–level categorization. A generic strategy that would detect audio events as means of transitio...
Conference Paper
Full-text available
Multimedia semantic analysis is a key element in managing the exponentially growing amount of produced multimedia content, available on the web and the social media. Towards this direction, a semantically enhanced Web-TV environment providing video-on-demand and simulcast streaming services, is proposed. The system offers content management and ana...
Conference Paper
Full-text available
This paper investigates methods aiming at the automatic recognition and classification of discrete environmental sounds, for the purpose of subsequently applying these methods to the recognition of soundscapes. Research in audio recognition has traditionally focused on the domains of speech and music. Comparatively little research has been done tow...
Conference Paper
Full-text available
Music structure analysis has been one of the challenging problems in the field of music information retrieval during the last decade. Past years advances in the field have contributed towards the establishment and standardization of a framework covering repetition, homogeneity and novelty based approaches. With this paper an optimized fusion algori...
Conference Paper
Full-text available
Spatial thinking is an important mental ability that contributes to the development of mathematical thinking. The importance of developing competencies related to spatial thinking, such as orientation and navigation in space, from preschool age onwards is significant for the development of other mathematical skills, as well as geometric thinking. M...
Conference Paper
Full-text available
The purpose of this paper is the review and evaluation of state of the art techniques and tools for the semantic analysis of audio content using Machine Learning algorithms. Available tools and techniques for annotation, audio features extraction and application of machine learning algorithms are referenced and investigated. In addition to reportin...
Conference Paper
This paper presents the structure and first applications of SpeakGreek, an online biofeedback speech training tool that can be used in second/foreign language education and in clinical intervention for individuals with speech disorders. The tool provides training in the perception and production of key segmental and suprasegmental aspects of Greek....
Conference Paper
Full-text available
This paper presents the implementation of a mobile software environment that provides a suite of professional-grade audio and acoustic analysis tools for smartphones and tablets. The suite includes sound level monitoring, real-time time-frequency analysis, reverberation time, and impulse response measurements, whereas feature-based intelligent cont...
Conference Paper
Full-text available
This work involves the design, development and evaluation of a software sound level meter application for smartphones (iPhone). The paper investigates the potential of implementing a flexible and user-friendly environment for measuring sound levels, which can easily be used by non-specialists. The resulting software focuses on providing similar fun...

Network

Cited By

Projects

Projects (6)
Project
We are investigating the potential of Speech Emotion Recognition (SER) for multimedia interaction, mediated communication and artistic performance. Database formulation, real-time applications, personalized Speech Emotion multi-modal repositories, audio features and machine learning techniques are aspects that are taken into consideration.
Project
The current projects investigates the design of a collaborative Mobile Cloud Computing model to support the workflow of collecting, editing and publishing news reporting material, aiming at better managing technology and human resources. While semantic services and tools have made tremendous progress in both academic and applied level, journalists don’t seem to make the most of modern technological possibilities. With the proposed framework, journalists, reporters, technical experts and editors can cooperate remotely and simultaneously on the cloud, collaboratively producing and publish timely, authentic and high-quality content, with proper documentation. Context- and location-aware semantic metadata, provided by mobile devices, guide the field reporter, while also serving annotation and authentication purposes. State-of-the-art mobile publishing tools are used for capturing and processing multimodal assets, which are then uploaded to the cloud. Augmented interaction tools (speech-to-text, voice commands, etc.) can boost usability to overcome the functional constraints of mobile devices, thus facilitating reporting services and improving the overall media experience.
Project
Semantic video analysis has become a fundamental task in contemporary video applications; consequently, further improvement and optimization of classification algorithms has also become a necessity. Under this scope, new methods for video event detection and classification are investigated evaluated against the existing state-of-the-art.performance in video classification tasks.