
Aki HärmäMaastricht University | UM · DACS
Aki Härmä
PhD
About
148
Publications
42,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,441
Citations
Introduction
Additional affiliations
May 2023 - present
Publications
Publications (148)
Large pretrained self-attention neural networks, or transformers, have been very successful in various tasks recently. The performance of a model on a given task depends on its ability to memorize and generalize the training data. Large transformer models, which may have billions of parameters, in theory have a huge capacity to memorize content. Ho...
Recent years has witnessed an increase in technologies that use speech for the sensing of the health of the talker. This survey paper proposes a general taxonomy of the technologies and a broad overview of current progress and challenges. Vocal biomarkers are often secondary measures that are approximating a signal of another sensor or identifying...
The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leverag-ing HUman-MAchine Natura...
Purpose: This study aims to investigate the impact of personalized health insights generated from wearable device data on users' health behaviors. The primary objective is to assess whether user feedback-driven algorithms enhance the relevance and effectiveness of health insights, ultimately influencing positive changes in users' daily activities....
We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative mod...
In natural language generation (NLG), insight mining is seen as a data-to-text task, where data is mined for interesting patterns and verbalised into 'insight' statements. An 'over-generate and rank' paradigm is intuitively used to generate such insights. The multidimensionality and subjectivity of this process make it challenging. This paper intro...
Respiratory chest belt sensor can be used to measure the respiratory rate and other respiratory health parameters. Virtual Respiratory Belt, VRB, algorithms estimate the belt sensor waveform from speech audio. In this paper we compare the detection of inspiration events (IE) from respiratory belt sensor data using a novel neural VRB algorithm and t...
Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the...
Background
Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this...
In this paper we give an overview of the field of patient simulators and provide qualitative and quantitative comparison of different modeling and simulation approaches. Simulators can be used to train human caregivers but also to develop and optimize algorithms for clinical decision support applications and test and validate interventions. In this...
Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to fu...
Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to fu...
Respiration is an essential and primary mechanism for speech production. We first inhale and then produce speech while exhaling. When we run out of breath, we stop speaking and inhale. Though this process is involuntary, speech production involves a systematic outflow of air during exhalation characterized by linguistic content and prosodic factors...
Insights derived from wearable sensors in smartwatches or sleep trackers can help users in approaching their healthy lifestyle goals. These insights should indicate significant inferences from user behaviour and their generation should adapt automatically to the preferences and goals of the user. In this paper, we propose a neural network model tha...
Automatic programming, the task of generating computer programs compliant with a specification without a human developer, is usually tackled either via genetic programming methods based on mutation and recombination of programs, or via neural language models. We propose a novel method that combines both approaches using a concept of a virtual neuro...
Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neuralmodels, where it is often difficult to incorporate expert knowledge into the models or let experts review andvalidate the learned decision mechanisms. Knowledge-insertion and model review are important requirements inmany applications involvi...
In the light of the current COVID-19 pandemic, the need for remote digital health assessment tools is greater than ever. This statement is especially pertinent for elderly and vulnerable populations. In this regard, the INTERSPEECH 2020 Alzheimer’s Dementia Recognition through Spontaneous Speech (ADReSS) Challenge offers competitors the opportunity...
Application,
link:
https://worldwide.espacenet.com/patent/search/family/063517652/publication/EP3614313A1?q=pn%3DEP3614313A1
Background
A Personal Emergency Response Service (PERS) enables an aging population to receive help quickly when an emergency situation occurs. The reasons that trigger a PERS alert are varied, including a sudden worsening of a chronic condition, a fall, or other injury. Every PERS case is documented by the response center using a combination of st...
Ambient intelligence (AmI) is intrinsically and thoroughly connected with artificial intelligence (AI). Some even say that it is, in essence, AI in the environment. AI, on the other hand, owes its success to the phenomenal development of the information and communication technologies (ICTs), based on principles such as Moore's law. In this paper we...
The full comprehension of how topics change within psychotherapeutic conversation is key for assessment and therapeutic strategies to adopt by the counselor to the patients. That might enable artificial intelligence (AI) approaches to recommend the most suitable strategy for a new patient. Basically, understanding the topics dynamics of previous ca...
In automated health services based on text and voice interfaces, there is a need to be able to understand what the user is talking about, and what is the attitude of the user towards a subject. Typical machine learning methods for text analysis require a lot of annotated data for the training. This is often a problem in addressing specific and poss...
In automated health services based on text and voice interfaces, there
is need to be able to understand what the user is talking about, and what is the
attitude of the user towards a subject. Typical machine learning methods for text
analysis require a lot of annotated data for the training. This is often a problem
in addressing specific and possib...
One of the key aspects in a psychotherapeutic conversation is the understanding of topics dynamics driving the dialogue. This may provide insights on the therapeutic strategy adopted by the counselor for the specific patient, providing the opportunity of building up artificial intelligence (AI) based methods for recommending the most appropriate th...
Automatic lifestyle profiling to categorize users according to their daily routine-based lifestyles is an unexplored area. Despite the current trends on having wearable devices that generate large amounts of heterogeneous data, figuring out the lifestyle patterns of people is not a trivial task. We present Lifestyles-KG, a knowledge graph (fuzzy on...
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In this paper we discuss one possibili...
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In the paper we discuss one possibilit...
In the health self-management services it is beneficial to identify and address the already existing healthy activity patterns of the user. Some of these healthy activity patterns might be of a utilitarian nature, e.g. commuting to work by bike or on foot, or might be for leisure, like taking a walk in a park. In the paper we discuss one possibilit...
In connected health services automatic discovery of recurring patterns and correlations, or insights, provides many interesting opportunities for the personalization of the services. In this paper the focus is on insight mining for a health coaching service. The basic idea in the proposed method is to generate a large number of insight candidates w...
Many health programs aim to encourage and support healthy behaviors, such as increasing one's physical activity and improving one's dietary patterns. Tailoring such programs to the constantly changing user state, is a challenge most programs struggle with. Conventionally, this tailoring has been the job of healthcare professionals and coaches who u...
The objective of this study is to evaluate subjective quality of spectral envelope quantization in audio coding. In this paper, the spectral envelope is modeled using Linear Prediction (LP) and Warped Linear Prediction (WLP). The advantage of WLP compared to LP is that the frequency resolution of analysis can be modified so that it follows perceptu...
The method of the invention enables selection of an item (21) from a plurality of items (21, 23, 25). The method comprises the steps of visually representing a selected item (21) and reproducing at least part of an audio segment representing the selected item (21). The method further comprises applying a visual spatial effect to the visual represen...
Involuntary movements of arms and legs reflect neural and metabolic processes in the human body. In this paper the focus is on the properties of physiological tremor, shivering, and tremors caused by physical fatigue measured in fingers of a subject. Three different signal modeling paradigms are compared in the paper using accelerometer data. It is...
A surround sound system comprises a receiver (301) for receiving a multichannel spatial signal that comprises at least one surround channel. A directional ultrasound transducer (305) is used for emitting ultrasound towards a surface to reach a listening position (111) via a reflection of the surface. The ultrasound signal may specifically reach the...
Falls in nursing homes and hospitals take often place immediately after a bed exit of a patient. An alarm signaling the exit from the bed may already be too late for staff to react. In this paper we explore the possibilities of detecting the sequences of preparatory movements before the bed exit and in this way create an early warning of the prepar...
A method of controlling a system which includes the steps of obtaining at least one signal representative of information communicated by a user via an input device in an environment of the user, wherein a signal from a first source is available in a perceptible form in the environment; estimating at least a point in time when a transition between i...
The use of visual user interfaces in smartphones and other personal media devices (PMD) leads to decreased situational awareness, for example, in city traffic. It is proposed in the paper that many menu navigation functions in PMDs can be replaced by an eyes-free auditory interface and an input device based on acoustic recognition of tactile gestur...
Since the evaluation of audio systems or processing schemes is time-consuming and resource-expensive, alternative objective evaluation methods attracted considerable research interests. However, current perceptual models are not yet capable of replacing a human listener especially when the test stimulus is complex, for example, a sound scene consis...
This paper proposes a collaborative vision network that leverages a personal webcam and cameras of the workplace to provide feedback relating to an office-worker's adherence to ergonomic guidelines. This can lead to increased well-being for the individual and better productivity in their work. The proposed system is evaluated with a recorded multi-...
In smart environments, the embedded sensing systems should intelligently adapt to the behavior of the users. Many interesting types of behavior are characterized by repetition of actions such as certain activities or movements. A generic methodology to detect and classify repetitions that may occur at different scales is introduced in this paper. T...
The application of Compressive Sensing is explored in three signal categories; footstep sounds, hand tremors and speech. An investigation of the reconstruction performance of various dictionaries is undertaken. It is demonstrated that these signal categories are reconstructed with higher SNR performance using K-SVD dictionaries than other fixed dic...
An audio reproduction system includes an arrangement of audio speakers of a first kind having a first degree of directivity in combination with at least one audio speaker of a second kind having a second degree of directivity. In order to create a virtual sound source at a desired distance to a listener's position, the second degree of directivity...
In this study we have developed a digital guitar body mode modulation technique where the modulation can be controlled thro- ugh one driving parameter. The filtering and modulation is done with frequency-warped recursive filters that have been implemented in real-time on a modern DSP processor. By changing the warping parameter the perceived size o...
This paper highlights research into using virtual worlds, intelligent environments and mixed reality to create artificial control systems for simulated humans. Following a brief explanation of this project, the beneficial contribution provided by virtual ...
ESPOO FINLAND Aki.Harma@hut.fi Low-delay audio coding is a somewhat new trend in perceptual wideband audio coding. Low coding delay is important, e.g., in applications based on bidirectional real-time audio transmission. The technical aspects and psychoacoustics of a such applications are reviewed and an audio codec with a coding delay of 2 ms is i...
In binaural sound reproduction applications using head-related transfer functions (HRTFs) there it is beneficial that the properties of the HRTFs correspond to the personal characteristics of the real HRTFs of the user. In this paper we propose a method to choose HRTFs using a relative localization test. This allows to make the selection of the bes...
It is often desired to detect some particular short sound events from an audio recording. For example, in music analysis and processing, one may be interested in detection of percussive events. In environmental audio analysis one may look for individual sound events related to some activity, for example, sounds of footsteps from a walking person. G...
A classification of time-frequency (TF) regions in stereo audio data by the type of mixture the region represents is presented. The detection of the type of mixing is necessary, for example, in stereo-to-multichannel upmixing, audio enhancement, and audio manipulation applications. A generic signal model for a stereo signal is proposed and it is us...
A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detec...
Stereo audio enhancement and upmixing techniques require spatial analysis of the mixture in order to work optimally for different types of contents. In this paper a method is proposed which classifies the time-frequency regions in stereo audio data into six different classes. The individual classes represent special cases of a generic stereo signal...
A method to estimate the distance of a speaker from a single microphone in a room environment is studied. Several features, related tostatistical parameters of speech source excitation signals, areintroduced and are shown to depend on the distance between sourceand receiver. Those features are used to train a pattern recognizerfor distance estimati...
Many applications demand the automatic induction of the tempo of a musical excerpt. The tempo estimation systems follow a general scheme that consists of two main steps: the creation of a feature list and the detection of periodicities on this list. In this study, we propose a new method for the implementation of the first step, along with the addi...
Stereo audio enhancement and upmixing techniques require spatial analysis of the mixture in order to work optimally for different types of contents. In this paper a method is proposed which classifies each time-frequency regions in stereo audio data into six different classes. The individual classes represent special cases of a generic stereo signa...
Stereo audio signal is often modeled as a mixture of instan-taneously mixed primary components and uncorrelated am-bience components. This paper focuses on the estimation of the primary-to-ambience energy ratio, PAR. This measure is useful for signal decomposition in stereo and multichan-nel audio coding, format conversion, and spatial audio en-han...
Detection and extraction of the center vocal source is important for many audio format conversion and manipulation applications. First, we study some generic properties of stereo signals containing sources panned exactly to the center of the stereo image and propose an algorithm for the separation of a stereo audio signal into a center and side cha...
The focus of the paper is on studying five different methods to combine multi-view data from an uncalibrated smart camera network for human activity recognition. The multi-view classification scenarios studied can be divided to two categories: view selection and view fusion methods. Selection uses a single view to classify, whereas fusion merges mu...
In the current technological landscape colored by environmental and security concerns the logic of replacing traveling by
technical means of communications is undisputable. For example, consider a comparison between a normal family car and a video
conference system with two laptop computers connected over the Internet. The power consumption of the...
A classification of time frequency (TF) regions in stereo audio data by the type of mixture the region represents is presented. The detection of the type of mixing is necessary, for example, in stereo-to-multichannel upmixing, audio enhancement, and audio manipulation applications. A generic signal model for a stereo signal is proposed and it is us...
In this paper, we present the result of listening tests where the width of the sound stage was compared between conventional 2-channel stereophony and 3-channel reproduction with an additional centre loudspeaker. When listeners were seated at the axis of symmetry, there was no significant difference between the two cases. On the off-axis positions...
The experience of telephonic communication in the home environment has remained very similar for decades: practical, but intrusive, and providing little experience of social presence. This paper presents the work aiming at improving the experience of social presence experience in telephony. We present the results of several user studies on telephon...
In some visual communication applications it is not possible or even desired to aim at a photorealistic representation of the remote person. One possibility is to aim at stylized visual representations of remote persons, e.g., as avatars shown on a display device or as shadows in lighting. In this paper we introduce a system for persistent and ambi...