Lazaros Vrysis

Lazaros Vrysis
  • Doctor of Engineering
  • Lecturer at Aristotle University of Thessaloniki

AI & IT Strategist

About

87
Publications
15,609
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
793
Citations
Current institution
Aristotle University of Thessaloniki
Current position
  • Lecturer
Education
February 2012 - July 2019
Aristotle University of Thessaloniki
Field of study
  • Electrical & Computer Engineer

Publications

Publications (87)
Article
Semantic audio analysis has become a fundamental task in modern audio applications, making the improvement and optimization of classification algorithms a necessity. Standard frame-based audio classification methods have been optimized and modern approaches introduce engineering methodologies that capture the temporal dependency between successive...
Article
Full-text available
Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have se...
Article
Full-text available
Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creat...
Article
Modern feature-based methodologies in semantic audio applications attempt to capture the temporal dependency of successive feature observations, which form the so-called texture windows. This paper proposes an enhancement of this type of processing, known as temporal feature integration, by employing and testing alternative deployable strategies. S...
Preprint
Full-text available
Deep learning has been applied to diverse audio semantics tasks, enabling the construction of models that learn hierarchical levels of features from high-dimensional raw data, delivering state-of-the-art performance. But do these algorithms perform similarly in real-world conditions, or just at the benchmark, where their high learning capability as...
Article
Full-text available
This research investigates the utilization of entertainment approaches, such as serious games and gamification technologies, to address various challenges and implement targeted tasks. Specifically, it details the design and development of an innovative gamified application named “J-Plus”, aimed at both professionals and non-professionals in journa...
Conference Paper
This work presents a framework for intelligent processing and management automation of sports content utilizing algorithmic techniques and Artificial Intelligence (AI) methods, such as Machine/Deep Learning (ML/DL). In the modern digital media landscape, sports data is among the most popular news/informing categories, favoring mediated communicatio...
Article
Full-text available
The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of suffic...
Conference Paper
Location scouting is an integral part of the pre-production filming process. The SCENE program aims to implement an integrated system that meets the needs of this investigation. In the current paper, the room acoustic simulation tool is presented. It is an environment where users can provide space records related to heritage sites, following a set...
Conference Paper
This paper introduces AudioScout, a sound management application designed to upgrade the work of sound designers. The application facilitates efficient recording, organization, editing, and playback of audio data. Sound designers face the challenge of storing and managing numerous recordings to create the appropriate audio backdrop for each product...
Conference Paper
This work is carried out within the research project SHAZAAM (Science Hoaxes to Avoid Alienation in Generation Z - A Media literacy Approach). SHAZAAM aims to support the youth of Generation Z to combat the spread of pseudoscientific content. From the scientific literature on the phenomenon of misinformation in texts, it appears that to a large ext...
Conference Paper
This study investigates the potentials of acoustic virtual navigation in a room/place through an acoustic simulation of the space’s response for different source and receiver positions. The usefulness of this functionality is to provide indicative information regarding, on the one hand, the suitability of the place for conducting recordings, and on...
Conference Paper
Artificial Intelligence (AI) is revolutionizing video editing and music composition, offering tools that enhance creative processes and operational efficiency. This literature review aims to provide a detailed examination of the specific AI tools currently shaping these fields, exploring their technological advancements and the opportunities they c...
Conference Paper
In the digital age, the authenticity of media content has become a critical concern due to the ease and frequency of manipulations in text, images, audio and videos. This study explores the application of advanced artificial intelligence (AI) techniques to detect such tampered content, addressing a significant challenge in journalism and media inte...
Article
Full-text available
Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to assess the effectiveness of combining speaker embedding...
Conference Paper
The SHAZAAM project aims to combat misinformation targeted at Generation Z through the integration of state-of-the-art technological tools. Evidence from the literature indicates that misinformation videos share common emotional patterns. Automated models using machine learning have shown promise in identifying misinformation in videos, particularl...
Conference Paper
The SCENE project aims to create a platform based on a data lake architecture to facilitate collaboration and knowledge sharing among film production stakeholders. Within the overall functionality, the platform will provide access to 3D models of European cultural heritage locations, with lighting and audio simulation modules. The audio simulation...
Conference Paper
Speech includes paralinguistic elements that reflect language, personality, intention, and emotion. Advances in machine learning and deep learning have improved Speech Emotion Recognition (SER) systems, which detect emotions in speech. However, SER systems still face challenges, requiring large amounts of data for their training and performance. Th...
Article
Full-text available
This study presents a novel audio compression technique, tailored for environmental monitoring within multi-modal data processing pipelines. Considering the crucial role that audio data play in environmental evaluations, particularly in contexts with extreme resource limitations, our strategy substantially decreases bit rates to facilitate efficien...
Article
Full-text available
The usage of smartphones is increasingly widespread, and the usefulness of mobile applications as low-vision aids is evident but not thoroughly examined. In this study, we surveyed people with low vision to assess the usability of common, preloaded mobile applications, to evaluate the usage of typical assistive technologies of smartphones, and to m...
Conference Paper
Full-text available
The current paper presents a serious game approach to engage citizens in monitoring pollution, mainly in urban areas, and raise awareness regarding its effects in modern human lifestyle. A prototype mobile application is proposed, urging users to capture images showing polluting sources (e.g., traffic jams, trucks) and their audio footprint (if app...
Conference Paper
Speech Emotion Recognition (SER) systems have become indispensable tools in human-computer interaction, communication, and knowledge representation. Most SER systems are based on machine learning techniques and require large amounts of ground-truth data to be trained and achieve decent performance. This work presents a serious game and crowdsourcin...
Conference Paper
Full-text available
This paper investigates the effect of prior knowledge of a room's geometry on the accuracy of predicting its acoustic properties. The motivation comes from the need for the creation of a framework that reduces the complexity of modeling real spaces. The project aims to model cultural heritage spaces for film-making applications, like location scout...
Conference Paper
Full-text available
In urban areas, the levels of air pollution play a significant role in the quality of life. These levels are rapidly increasing due to exponential urbanization. Although the field of real-time monitoring and forecasting of air pollution levels by fusing multiple modalities has been studied extensively, there is limited work on considering environme...
Article
Full-text available
Social media platforms have led to the creation of a vast amount of information produced by users and published publicly, facilitating participation in the public sphere, but also giving the opportunity for certain users to publish hateful content. This content mainly involves offensive/discriminative speech towards social groups or individuals (ba...
Conference Paper
One of the main components of cinematic sound is the ambiences. Ambiences are the sounds that are characterized by continuity and allow the coexistence of all the other sounds that make up the whole cinematic sound (dialogues, effects, music). They are essential components of sound composition. In this work, the uses of atmospheres are distinguishe...
Conference Paper
Full-text available
Urban areas are those in which high levels of environmental noise and low levels of air quality are observed. Properly informed citizens become much more aware and active in participating in actions towards limiting such problems. The present research is based on the assumption that there is a correlation between air and noise pollution, up to an e...
Conference Paper
This paper presents a prototype software application, which is designed to automatically find audio recordings that match a given image. The application facilitates the ordinary workflow of sound design. In this direction, the work includes an initial analysis of the users’ requirements, along with the design of a prototype application, which extra...
Conference Paper
Drum pattern generation is a part of the wider research field of algorithmic composition which focuses on the music element of the rhythm. Related research offers various methodologies, however there is a lack of approaches relying on audio data to generate rhythmic drum sequences. The current paper addresses this issue by introducing a system whic...
Conference Paper
This work investigates the possibilities of using serious games and gamification technologies as means of entertainment and informal education. The focus is on Speech Emotion Recognition Systems (SER), which can be a useful tool both in the industry and in the academia. Most commonly, SER systems rely on machine learning algorithms that require hug...
Article
Full-text available
Hate speech spreading online is a matter of growing concern since social media allows for its rapid, uncontrolled, and massive dissemination. For this reason, several researchers are already working on the development of prototypes that allow for the detection of cyberhate automatically and on a large scale. However, most of them are developed to d...
Article
Full-text available
This manuscript discusses the robustness to noise of deep learning models for two audio classification tasks. The first task is a speaker recognition application, trying to identify five different speakers. The second one is a speech command identification where the goal is to classify ten voice commands. These two tasks are very important to make...
Conference Paper
Reverberation is ubiquitous in everyday listening environments, from meeting rooms to concert halls and record-ing studios. While reverberation is usually described by the reverberation time, getting further insight concerning the characteristics of a room requires to conduct acoustic measurements and calculate each reverberation param-eter manuall...
Conference Paper
Full-text available
In the present work, a crowdsourcing approach is designed, to investigate the correlation between air and noise pollution in urban areas. Citizens are requested to provide air quality measurements and audio recordings using a prototype mobile application specially designed to motivate them to undertake the task of audiovisual capturing. Different u...
Article
Full-text available
Media authentication relies on the detection of inconsistencies that may indicate malicious editing in audio and video files. Traditionally, authentication processes are performed by forensics professionals using dedicated tools. There is rich research on the automation of this procedure, but the results do not yet guarantee the feasibility of prov...
Article
The current paper introduces a multimodal framework to provide Web-TV automations for live broadcasting and overall big streaming data management. The term indexing refers to the spatiotemporal localization of speakers participating in a discussion panel. Multiple modalities acting in parallel form the data-driven decision-making pipeline. The auto...
Article
Full-text available
Speech Emotion Recognition (SER) is an important part of Affective Computing and emotionally aware Human-Computer Interaction. Emotional expression may vary depending on the language, culture, and the speaker’s personality and vocal attributes. Speaker-adaptive systems can address this issue. In real-world applications, it is not feasible to obtain...
Conference Paper
In this research a novel deep learning architecture is proposed for the problem of speech commands recognition. The problem is examined in the context of internet-of-things where most devices have limited resources in terms of computation and memory. The uniqueness of the architecture is that it uses a new feature pooling mechanism, named entropy p...
Article
Full-text available
The current paper focuses on the development of an enhanced Mobile Journalism (MoJo) model for soundscape heritage crowdsourcing, data-driven storytelling, and management in the era of big data and the semantic web. Soundscapes and environmental sound semantics have a great impact on cultural heritage, also affecting the quality of human life, from...
Article
In this paper, an audio-driven, multimodal approach for speaker diarization in multimedia content is introduced and evaluated. The proposed algorithm is based on semi-supervised clustering of audio-visual embeddings, generated using deep learning techniques. The two modes, audio and video, are separately addressed; a long short-term memory Siamese...
Conference Paper
Digital storytelling is a topic of great academic interest, due to its countless uses in a variety of areas, educational or not. Its impact on learning contexts has become the center of research for a number of studies, trying to examine the field from a diverse set of perspectives. The latest technological achievements, including the adoption of m...
Conference Paper
publication description During the recent years, convolutional neural networks have been the standard on audio semantics, surpassing traditional classification approaches which employed hand-crafted feature engineering as front-end and various classifiers as back-end. Early studies were based on prominent 2D convolutional topologies for image recog...
Conference Paper
Prototypical part network (ProtoPNet) is a novel method proposed for the task of image classification, offering the ability to interpret the network’s reasoning process during classification. The subject of this work is the examination of ProtoPNet as an unsupervised anomaly detection method, through its application at the Detection and Classificat...
Article
A model for Speech Emotion Recognition, based on a Convolutional Neural Networks (CNN) architecture is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), while data augmentation tech-niques are ap...
Conference Paper
A collaborative mobile platform called MoJo-MATE (Mobile Journalism Machine AssisTEd reporting) has been proposed for application in newsrooms and their working cycles. The challenge with such a client-server service is that it should cope with the traditional journalistic practices in the mainstream media organizations. Because of the large expans...
Article
Full-text available
MoJo refers to the emerging framework of covering news reporting workflows using smart mobile devices with dedicated software or hardware. However, a fully integrated and dedicated computing environment that can address the individual needs of both professional and citizen journalists is still missing. We introduce MoJo–MATE, (Mobile Journalism Mac...
Article
Full-text available
Art and technology have always been very tightly intertwined, presenting strong influences on each other. On the other hand, technological evolution led to today’s digital media landscape, elaborating mediated communication tools, thus providing new creative means of expression (i.e., new-media art). Rich-media interaction can expedite the whole pr...
Article
Full-text available
Over the past decade, mobile news production has had a growing prevalence and has been established as a new type by modern journalism industry. Journalists understand content capturing and sharing as parts of their role in newsrooms. Mobile journalism (mojo) is an evolving form of reporting in which where people use only a smartphone to create and...
Poster
This work focuses on the interpretation of the spectrotemporal parameters of the auditory brainstem response (ABR) through a machine learning approach to predict the relationship between the ABR waveform shape and perceived loudness in different degrees of hearing loss. A total of 397 tone-burst (1 & 4 kHz) auditory brainstem responses from 8 heari...
Article
Full-text available
During the last years, there has been a growing multidisciplinary interest in alternative educational approaches, such as serious games, aiming at enhancing thinking skills and media literacy. Likewise, the objective of this study is to present the design and the development of an educational web application for learning the necessary steps towards...
Article
Full-text available
Temporal feature integration refers to a set of strategies attempting to capture the information conveyed in the temporal evolution of the signal. It has been extensively applied in the context of semantic audio showing performance improvements against the standard frame-based audio classification methods. This paper investigates the potential of a...
Conference Paper
MoJo refers to the emerging framework of covering news reporting workflows using smart mobile devices with dedicated software or hardware. However, a fully integrated and dedicated computing environment that can address the individual needs of both professional and citizen journalists is still missing. We introduce MoJo-MATE, (Mobile Journalism Mac...
Article
Objective: To develop and evaluate a software application capable of conducting Pure-Tone Audiometry tests in clinical practice. Design: We designed and developed a mobile software application for iPad devices that performs Pure- Tone Audiometry according to ANSI and IEC standards. The application is proposed to be operated by a trained audiologist...
Conference Paper
Full-text available
Hearing impaired (ΗΙ) listeners often struggle to follow conversations when exposed in a complex acoustic environment. This is partly due to the reduced ability in recovering the target speech Temporal Envelope (ENV) cues from Temporal Fine Structure (TFS). This study investigates the enhancement of speech intelligibility in HI listeners, by proces...
Conference Paper
The evaluation of sound level measuring mobile applications, shows that the development of a sophisticated audio analysis framework for voice-recording purposes may be useful for journalists. In many audio recording scenarios, the repetition of the procedure is not an option, and under unwanted conditions the quality of the capturing is possibly de...
Conference Paper
Full-text available
Some of the main issues that emerge during crisis news reporting concern the information quality and credibility, as well as the efficiency of delivering news to the public. In many paradigms, citizens using their smartphones become reporters and contribute potentially valuable audiovisual information via social media platforms from the ground duri...
Conference Paper
Early components of the auditory evoked potentials (AEP) reflect the neural processing of acoustic stimuli in the brainstem and the sub-cortical regions. Relating AEP patterns to their stimulus characteristics is a notoriously difficult task, due to the variability of their morphology. In this study, tone-Burst evoked auditory brainstem and middle-...
Conference Paper
Evaluation of mobile applications serving sound measurement procedures demonstrate that the establishment of a "smart" framework for sound signal management of recordings for journalism/reporting needs is valid. In most scenarios concerning sound capturing for journalistic purposes, repetition of recording is not possible, and, thus, error detectio...
Conference Paper
Full-text available
The current work focuses on the implementation of an online wavelet domain Wiener Filter denoiser (on the cloud), that is proposed for speech enhancement purposes. The outmost goal of the current project is to provide a quick and easy way to real-world speech denoising, with the easiest and most direct way feasible. Optimum configuration and adapta...
Conference Paper
Continuous technologic advances and growing adoption of technology by society highlight the importance of integrating educational technology in learning activities. This integration can be introduced to existing teaching tools but can also be used to design new tools that will focus to new emerging learning needs and new media literacies. Game base...
Conference Paper
Full-text available
Speech emotion is an important paralinguistic element of speech communication, which undoubtedly involves high level of subjectivity, without concrete modeling of the implicated emotional states. Specifically, sentimental expression varies in great proportions among different spoken languages and persons. The current work is focused on the investig...
Article
The current paper investigates the design of a collaborative Mobile Cloud Computing model to support the workflow of collecting, editing and publishing news reporting material, aiming at better managing technology and human resources. While semantic services and tools have made tremendous progress in both academic and applied level, journalists don...
Conference Paper
Full-text available
The industrialization and mobilization of human endeavor have led to increased noise production. Low-frequency noise is a major component of occupational noise, which is emitted from a variety of sources. Reduced perception abilities and risks to workers’ health and safety are some of the effects caused by the exposure to low-frequency noise. The m...
Conference Paper
The current paper investigates the design of a collaborative Mobile Cloud Computing model to support the workflow of collecting, editing and publishing news reporting material, aiming at better managing technology and human resources. While semantic services and tools have made tremendous progress in both academic and applied level, journalists don...
Article
Full-text available
The present paper focuses on high-accuracy block-based sub-pixel motion estimation utilizing a straightforward error minimization approach. In particular, the mathematics of bilinear interpolation are utilized for the selection of the candidate motion vectors that minimize the error criterion, by estimating local minima in the error surface with ar...
Article
Sub-pixel motion estimation plays a vital role in a multitude of video applications, including encoding, audiovisual archiving/heritage and super-resolution enhancement. Most existing block-based methods rely on the implicit assumption that blocks can be accurately predicted through appropriate shifts. In particular, shifted blocks in the target fr...
Conference Paper
Recent technologic advances and the continuous increase of software and hardware integration in people’s daily lives has extended research interest for the field of human computer interaction. User involvement in the design of software, among which games consist a big part, has also been on researchers’ spotlight, even from the early childhood. The...
Article
Full-text available
In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this contex...
Conference Paper
Semantic audio analysis has become a fundamental task in contemporary audio applications; consequently, further improvement and optimization of classification algorithms has also become a necessity. During the recent years, standard frame-based audio classification methods have been optimized and modern approaches introduced additional feature engi...
Article
The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focu...
Conference Paper
In recent years, audiometric equipment remains expensive and unaffected by technological progress. This paper presents the design and development of an iOS-based application, which can be used to conduct audiometric tests comparable to a diagnostic audiometer, without additional external equipment, ergo updating the procedure and reducing the cost...
Poster
In order to estimate speech onset latencies in naming experiments, it is common to use technologies that rely on sound pressure changes, such as voice-key devices. These devices are used in online experiments and are prone to data loss. Moreover, several studies have revealed that voice-key devices suffer from low accuracy due to poor detection of...
Conference Paper
As technology infiltrates every aspect of students' daily lives, game design acquires a bigger and more influential impact on shaping students' personalities and development of learning competencies. Considering the continuously increasing research on the use of educational games in classrooms, the researcher identifies another great interest in th...
Conference Paper
During the previous years, technology has infiltrated in the daily lives of a great number of people. During this revolution, teachers and instructional designers have been introduced with a variety of new tools, such as computers, tablets, interactive tabletops and surfaces, augmented and virtual reality kits, etc. Consequently, schools, instituti...
Conference Paper
Full-text available
With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music Classification and Detection task is proposed and evaluated. The main algorithm for the Detection task employs a self-similarity matrix analysis technique to detect homogeneous segments of audio that can be subsequently classified as music or speech by...
Conference Paper
Full-text available
The task of general audio detection and segmentation based in means of machine learning is very popular and high-demanding procedure nowadays. Most relevant works in the last decade aim at modelling audio in order to conduct a semantics analysis and a high–level categorization. A generic strategy that would detect audio events as means of transitio...
Conference Paper
Full-text available
Multimedia semantic analysis is a key element in managing the exponentially growing amount of produced multimedia content, available on the web and the social media. Towards this direction, a semantically enhanced Web-TV environment providing video-on-demand and simulcast streaming services, is proposed. The system offers content management and ana...
Conference Paper
Full-text available
This paper investigates methods aiming at the automatic recognition and classification of discrete environmental sounds, for the purpose of subsequently applying these methods to the recognition of soundscapes. Research in audio recognition has traditionally focused on the domains of speech and music. Comparatively little research has been done tow...
Conference Paper
Full-text available
Music structure analysis has been one of the challenging problems in the field of music information retrieval during the last decade. Past years advances in the field have contributed towards the establishment and standardization of a framework covering repetition, homogeneity and novelty based approaches. With this paper an optimized fusion algori...
Conference Paper
Full-text available
Spatial thinking is an important mental ability that contributes to the development of mathematical thinking. The importance of developing competencies related to spatial thinking, such as orientation and navigation in space, from preschool age onwards is significant for the development of other mathematical skills, as well as geometric thinking. M...
Conference Paper
Full-text available
The purpose of this paper is the review and evaluation of state of the art techniques and tools for the semantic analysis of audio content using Machine Learning algorithms. Available tools and techniques for annotation, audio features extraction and application of machine learning algorithms are referenced and investigated. In addition to reportin...
Conference Paper
This paper presents the structure and first applications of SpeakGreek, an online biofeedback speech training tool that can be used in second/foreign language education and in clinical intervention for individuals with speech disorders. The tool provides training in the perception and production of key segmental and suprasegmental aspects of Greek....
Conference Paper
Full-text available
This paper presents the implementation of a mobile software environment that provides a suite of professional-grade audio and acoustic analysis tools for smartphones and tablets. The suite includes sound level monitoring, real-time time-frequency analysis, reverberation time, and impulse response measurements, whereas feature-based intelligent cont...
Conference Paper
Full-text available
This work involves the design, development and evaluation of a software sound level meter application for smartphones (iPhone). The paper investigates the potential of implementing a flexible and user-friendly environment for measuring sound levels, which can easily be used by non-specialists. The resulting software focuses on providing similar fun...

Network

Cited By