Aki Härmä

Aki Härmä
Maastricht University | UM · DACS

PhD

About

148
Publications
42,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,441
Citations
Additional affiliations
May 2023 - present
Maastricht University
Position
  • Assistant Professor

Publications

Publications (148)
Preprint
Full-text available
Large pretrained self-attention neural networks, or transformers, have been very successful in various tasks recently. The performance of a model on a given task depends on its ability to memorize and generalize the training data. Large transformer models, which may have billions of parameters, in theory have a huge capacity to memorize content. Ho...
Preprint
Recent years has witnessed an increase in technologies that use speech for the sensing of the health of the talker. This survey paper proposes a general taxonomy of the technologies and a broad overview of current progress and challenges. Vocal biomarkers are often secondary measures that are approximating a signal of another sensor or identifying...
Article
Full-text available
The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leverag-ing HUman-MAchine Natura...
Preprint
Full-text available
Purpose: This study aims to investigate the impact of personalized health insights generated from wearable device data on users' health behaviors. The primary objective is to assess whether user feedback-driven algorithms enhance the relevance and effectiveness of health insights, ultimately influencing positive changes in users' daily activities....
Preprint
Full-text available
We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative mod...
Preprint
Full-text available
In natural language generation (NLG), insight mining is seen as a data-to-text task, where data is mined for interesting patterns and verbalised into 'insight' statements. An 'over-generate and rank' paradigm is intuitively used to generate such insights. The multidimensionality and subjectivity of this process make it challenging. This paper intro...
Preprint
Full-text available
Respiratory chest belt sensor can be used to measure the respiratory rate and other respiratory health parameters. Virtual Respiratory Belt, VRB, algorithms estimate the belt sensor waveform from speech audio. In this paper we compare the detection of inspiration events (IE) from respiratory belt sensor data using a novel neural VRB algorithm and t...
Preprint
Full-text available
Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the...
Article
Full-text available
Background Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this...
Article
Full-text available
In this paper we give an overview of the field of patient simulators and provide qualitative and quantitative comparison of different modeling and simulation approaches. Simulators can be used to train human caregivers but also to develop and optimize algorithms for clinical decision support applications and test and validate interventions. In this...
Article
Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to fu...
Preprint
Full-text available
Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to fu...
Article
Respiration is an essential and primary mechanism for speech production. We first inhale and then produce speech while exhaling. When we run out of breath, we stop speaking and inhale. Though this process is involuntary, speech production involves a systematic outflow of air during exhalation characterized by linguistic content and prosodic factors...
Article
Full-text available
Insights derived from wearable sensors in smartwatches or sleep trackers can help users in approaching their healthy lifestyle goals. These insights should indicate significant inferences from user behaviour and their generation should adapt automatically to the preferences and goals of the user. In this paper, we propose a neural network model tha...
Preprint
Full-text available
Automatic programming, the task of generating computer programs compliant with a specification without a human developer, is usually tackled either via genetic programming methods based on mutation and recombination of programs, or via neural language models. We propose a novel method that combines both approaches using a concept of a virtual neuro...
Preprint
Full-text available
Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neuralmodels, where it is often difficult to incorporate expert knowledge into the models or let experts review andvalidate the learned decision mechanisms. Knowledge-insertion and model review are important requirements inmany applications involvi...
Conference Paper
Full-text available
In the light of the current COVID-19 pandemic, the need for remote digital health assessment tools is greater than ever. This statement is especially pertinent for elderly and vulnerable populations. In this regard, the INTERSPEECH 2020 Alzheimer’s Dementia Recognition through Spontaneous Speech (ADReSS) Challenge offers competitors the opportunity...
Patent
Application, link: https://worldwide.espacenet.com/patent/search/family/063517652/publication/EP3614313A1?q=pn%3DEP3614313A1
Poster
Background A Personal Emergency Response Service (PERS) enables an aging population to receive help quickly when an emergency situation occurs. The reasons that trigger a PERS alert are varied, including a sudden worsening of a chronic condition, a fall, or other injury. Every PERS case is documented by the response center using a combination of st...
Article
Full-text available
Ambient intelligence (AmI) is intrinsically and thoroughly connected with artificial intelligence (AI). Some even say that it is, in essence, AI in the environment. AI, on the other hand, owes its success to the phenomenal development of the information and communication technologies (ICTs), based on principles such as Moore's law. In this paper we...
Chapter
The full comprehension of how topics change within psychotherapeutic conversation is key for assessment and therapeutic strategies to adopt by the counselor to the patients. That might enable artificial intelligence (AI) approaches to recommend the most suitable strategy for a new patient. Basically, understanding the topics dynamics of previous ca...
Chapter
In automated health services based on text and voice interfaces, there is a need to be able to understand what the user is talking about, and what is the attitude of the user towards a subject. Typical machine learning methods for text analysis require a lot of annotated data for the training. This is often a problem in addressing specific and poss...
Conference Paper
Full-text available
In automated health services based on text and voice interfaces, there is need to be able to understand what the user is talking about, and what is the attitude of the user towards a subject. Typical machine learning methods for text analysis require a lot of annotated data for the training. This is often a problem in addressing specific and possib...
Conference Paper
Full-text available
One of the key aspects in a psychotherapeutic conversation is the understanding of topics dynamics driving the dialogue. This may provide insights on the therapeutic strategy adopted by the counselor for the specific patient, providing the opportunity of building up artificial intelligence (AI) based methods for recommending the most appropriate th...
Conference Paper
Full-text available
Automatic lifestyle profiling to categorize users according to their daily routine-based lifestyles is an unexplored area. Despite the current trends on having wearable devices that generate large amounts of heterogeneous data, figuring out the lifestyle patterns of people is not a trivial task. We present Lifestyles-KG, a knowledge graph (fuzzy on...
Conference Paper
Full-text available
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In this paper we discuss one possibili...
Preprint
Full-text available
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In the paper we discuss one possibilit...
Preprint
Full-text available
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In the paper we discuss one possibilit...
Conference Paper
In connected health services automatic discovery of recurring patterns and correlations, or insights, provides many interesting opportunities for the personalization of the services. In this paper the focus is on insight mining for a health coaching service. The basic idea in the proposed method is to generate a large number of insight candidates w...
Conference Paper
Full-text available
Many health programs aim to encourage and support healthy behaviors, such as increasing one's physical activity and improving one's dietary patterns. Tailoring such programs to the constantly changing user state, is a challenge most programs struggle with. Conventionally, this tailoring has been the job of healthcare professionals and coaches who u...
Article
Full-text available
The objective of this study is to evaluate subjective quality of spectral envelope quantization in audio coding. In this paper, the spectral envelope is modeled using Linear Prediction (LP) and Warped Linear Prediction (WLP). The advantage of WLP compared to LP is that the frequency resolution of analysis can be modified so that it follows perceptu...
Patent
Full-text available
The method of the invention enables selection of an item (21) from a plurality of items (21, 23, 25). The method comprises the steps of visually representing a selected item (21) and reproducing at least part of an audio segment representing the selected item (21). The method further comprises applying a visual spatial effect to the visual represen...
Conference Paper
Full-text available
Involuntary movements of arms and legs reflect neural and metabolic processes in the human body. In this paper the focus is on the properties of physiological tremor, shivering, and tremors caused by physical fatigue measured in fingers of a subject. Three different signal modeling paradigms are compared in the paper using accelerometer data. It is...
Patent
Full-text available
A surround sound system comprises a receiver (301) for receiving a multichannel spatial signal that comprises at least one surround channel. A directional ultrasound transducer (305) is used for emitting ultrasound towards a surface to reach a listening position (111) via a reflection of the surface. The ultrasound signal may specifically reach the...
Conference Paper
Full-text available
Falls in nursing homes and hospitals take often place immediately after a bed exit of a patient. An alarm signaling the exit from the bed may already be too late for staff to react. In this paper we explore the possibilities of detecting the sequences of preparatory movements before the bed exit and in this way create an early warning of the prepar...
Patent
Full-text available
A method of controlling a system which includes the steps of obtaining at least one signal representative of information communicated by a user via an input device in an environment of the user, wherein a signal from a first source is available in a perceptible form in the environment; estimating at least a point in time when a transition between i...
Conference Paper
The use of visual user interfaces in smartphones and other personal media devices (PMD) leads to decreased situational awareness, for example, in city traffic. It is proposed in the paper that many menu navigation functions in PMDs can be replaced by an eyes-free auditory interface and an input device based on acoustic recognition of tactile gestur...
Conference Paper
Full-text available
Since the evaluation of audio systems or processing schemes is time-consuming and resource-expensive, alternative objective evaluation methods attracted considerable research interests. However, current perceptual models are not yet capable of replacing a human listener especially when the test stimulus is complex, for example, a sound scene consis...
Conference Paper
This paper proposes a collaborative vision network that leverages a personal webcam and cameras of the workplace to provide feedback relating to an office-worker's adherence to ergonomic guidelines. This can lead to increased well-being for the individual and better productivity in their work. The proposed system is evaluated with a recorded multi-...
Article
In smart environments, the embedded sensing systems should intelligently adapt to the behavior of the users. Many interesting types of behavior are characterized by repetition of actions such as certain activities or movements. A generic methodology to detect and classify repetitions that may occur at different scales is introduced in this paper. T...
Conference Paper
The application of Compressive Sensing is explored in three signal categories; footstep sounds, hand tremors and speech. An investigation of the reconstruction performance of various dictionaries is undertaken. It is demonstrated that these signal categories are reconstructed with higher SNR performance using K-SVD dictionaries than other fixed dic...
Patent
Full-text available
An audio reproduction system includes an arrangement of audio speakers of a first kind having a first degree of directivity in combination with at least one audio speaker of a second kind having a second degree of directivity. In order to create a virtual sound source at a desired distance to a listener's position, the second degree of directivity...
Article
Full-text available
In this study we have developed a digital guitar body mode modulation technique where the modulation can be controlled thro- ugh one driving parameter. The filtering and modulation is done with frequency-warped recursive filters that have been implemented in real-time on a modern DSP processor. By changing the warping parameter the perceived size o...
Article
Full-text available
This paper highlights research into using virtual worlds, intelligent environments and mixed reality to create artificial control systems for simulated humans. Following a brief explanation of this project, the beneficial contribution provided by virtual ...
Article
Full-text available
ESPOO FINLAND Aki.Harma@hut.fi Low-delay audio coding is a somewhat new trend in perceptual wideband audio coding. Low coding delay is important, e.g., in applications based on bidirectional real-time audio transmission. The technical aspects and psychoacoustics of a such applications are reviewed and an audio codec with a coding delay of 2 ms is i...
Conference Paper
Full-text available
In binaural sound reproduction applications using head-related transfer functions (HRTFs) there it is beneficial that the properties of the HRTFs correspond to the personal characteristics of the real HRTFs of the user. In this paper we propose a method to choose HRTFs using a relative localization test. This allows to make the selection of the bes...
Article
It is often desired to detect some particular short sound events from an audio recording. For example, in music analysis and processing, one may be interested in detection of percussive events. In environmental audio analysis one may look for individual sound events related to some activity, for example, sounds of footsteps from a walking person. G...
Article
Full-text available
A classification of time-frequency (TF) regions in stereo audio data by the type of mixture the region represents is presented. The detection of the type of mixing is necessary, for example, in stereo-to-multichannel upmixing, audio enhancement, and audio manipulation applications. A generic signal model for a stereo signal is proposed and it is us...
Article
Full-text available
A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detec...
Conference Paper
Full-text available
Stereo audio enhancement and upmixing techniques require spatial analysis of the mixture in order to work optimally for different types of contents. In this paper a method is proposed which classifies the time-frequency regions in stereo audio data into six different classes. The individual classes represent special cases of a generic stereo signal...
Article
Full-text available
A method to estimate the distance of a speaker from a single microphone in a room environment is studied. Several features, related tostatistical parameters of speech source excitation signals, areintroduced and are shown to depend on the distance between sourceand receiver. Those features are used to train a pattern recognizerfor distance estimati...
Conference Paper
Full-text available
Many applications demand the automatic induction of the tempo of a musical excerpt. The tempo estimation systems follow a general scheme that consists of two main steps: the creation of a feature list and the detection of periodicities on this list. In this study, we propose a new method for the implementation of the first step, along with the addi...
Conference Paper
Stereo audio enhancement and upmixing techniques require spatial analysis of the mixture in order to work optimally for different types of contents. In this paper a method is proposed which classifies each time-frequency regions in stereo audio data into six different classes. The individual classes represent special cases of a generic stereo signa...
Article
Full-text available
Stereo audio signal is often modeled as a mixture of instan-taneously mixed primary components and uncorrelated am-bience components. This paper focuses on the estimation of the primary-to-ambience energy ratio, PAR. This measure is useful for signal decomposition in stereo and multichan-nel audio coding, format conversion, and spatial audio en-han...
Conference Paper
Detection and extraction of the center vocal source is important for many audio format conversion and manipulation applications. First, we study some generic properties of stereo signals containing sources panned exactly to the center of the stereo image and propose an algorithm for the separation of a stereo audio signal into a center and side cha...
Conference Paper
Full-text available
The focus of the paper is on studying five different methods to combine multi-view data from an uncalibrated smart camera network for human activity recognition. The multi-view classification scenarios studied can be divided to two categories: view selection and view fusion methods. Selection uses a single view to classify, whereas fusion merges mu...
Chapter
Full-text available
In the current technological landscape colored by environmental and security concerns the logic of replacing traveling by technical means of communications is undisputable. For example, consider a comparison between a normal family car and a video conference system with two laptop computers connected over the Internet. The power consumption of the...
Conference Paper
A classification of time frequency (TF) regions in stereo audio data by the type of mixture the region represents is presented. The detection of the type of mixing is necessary, for example, in stereo-to-multichannel upmixing, audio enhancement, and audio manipulation applications. A generic signal model for a stereo signal is proposed and it is us...
Conference Paper
In this paper, we present the result of listening tests where the width of the sound stage was compared between conventional 2-channel stereophony and 3-channel reproduction with an additional centre loudspeaker. When listeners were seated at the axis of symmetry, there was no significant difference between the two cases. On the off-axis positions...
Conference Paper
Full-text available
The experience of telephonic communication in the home environment has remained very similar for decades: practical, but intrusive, and providing little experience of social presence. This paper presents the work aiming at improving the experience of social presence experience in telephony. We present the results of several user studies on telephon...
Article
Full-text available
In some visual communication applications it is not possible or even desired to aim at a photorealistic representation of the remote person. One possibility is to aim at stylized visual representations of remote persons, e.g., as avatars shown on a display device or as shadows in lighting. In this paper we introduce a system for persistent and ambi...