Manu AiraksinenUniversity of Helsinki | HY
Manu Airaksinen
DSc (Tech)
About
55
Publications
9,774
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
830
Citations
Introduction
Publications
Publications (55)
Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL method...
Study objectives
To develop a non-invasive and practical wearable method for long-term tracking of infants’ sleep.
Methods
An infant wearable, NAPping PAnts (NAPPA), was constructed by combining a diaper cover and a movement sensor (triaxial accelerometer and gyroscope), allowing either real-time data streaming to mobile devices or offline feature...
Assessing infant carrying and holding (C/H), or physical infant-caregiver interaction, is important for a wide range of contexts in development research. An automated detection and quantification of infant C/H is particularly needed in long term at-home studies where development of infants’ neurobehavior is measured using wearable devices. Here, we...
The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings. This information could be used for developmental research and to support clinical decision-making, such as detection of developmental problems and guiding of their therap...
Background:
Early neurodevelopmental care and research are in urgent need of practical methods for quantitative assessment of early motor development. Here, performance of a wearable system in early motor assessment was validated and compared to developmental tracking of physical growth charts.
Methods:
Altogether 1358 h of spontaneous movement...
Infant motility assessment using intelligent wearables is a promising new approach for assessment of infant neurophysiological development, and where efficient signal analysis plays a central role. This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors. We focus...
Background
Electroencephalogram (EEG) monitoring is recommended as routine in newborn neurocritical care to facilitate early therapeutic decisions and outcome predictions. EEG's larger-scale implementation is, however, hindered by the shortage of expertise needed for the interpretation of spontaneous cortical activity, the EEG background. We develo...
When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these...
Background
Early neurodevelopmental care needs better, effective and objective solutions for assessing infants’ motor abilities. Novel wearable technology opens possibilities for characterizing spontaneous movement behavior. This work seeks to construct and validate a generalizable, scalable, and effective method to measure infants’ spontaneous mot...
Infant motility assessment using intelligent wearables is a promising new approach for assessment of infant neurophysiological development, and where efficient signal analysis plays a central role. This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors. We focus...
Neonatal brain monitoring in the neonatal intensive care units (NICU) requires a continuous review of the spontaneous cortical activity, i.e., the electroencephalograph (EEG) background activity. This needs development of bedside methods for an automated assessment of the EEG background activity. In this paper, we present development of the key com...
Objective
To develop a non-invasive and clinically practical method for a long-term monitoring of infant sleep cycling in the intensive care unit.
Methods
Forty three infant polysomnography recordings were performed at 1–18 weeks of age, including a piezo element bed mattress sensor to record respiratory and gross-body movements. The hypnogram sco...
Infants’ spontaneous and voluntary movements mirror developmental integrity of brain networks since they require coordinated activation of multiple sites in the central nervous system. Accordingly, early detection of infants with atypical motor development holds promise for recognizing those infants who are at risk for a wide range of neurodevelopm...
Infants' spontaneous and voluntary movements mirror developmental integrity of brain networks since they require coordinated activation of multiple sites in the central nervous system. Accordingly, early detection of infants with atypical motor development holds promise for recognizing those infants who are at risk for a wide range of neurodevelopm...
This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation , a glottal vocoder is used...
In this article, three adaptation methods are compared based on how well they change the speaking style of a neural network based text-to-speech (TTS) voice. The speaking style conversion adopted here is from normal to Lombard speech. The selected adaptation methods are: auxiliary features (AF), learning hidden unit contribution (LHUC), and fine-tu...
Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, c...
Estimation of glottal source information can be performed non-invasively from speech by using glottal inverse filtering (GIF) methods. However, the existing GIF methods are sensitive even to slight distortions in speech signals under different realistic scenarios, for example, in coded telephone speech. Therefore, there is a need for robust GIF met...
Feature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. Howeve...
A vocoder is used to express a speech waveform with a controllable parametric representation that can be converted back into a speech waveform. Vocoders representing their main categories (mixed excitation, glottal, sinusoidal vocoders) were compared in this study with formal and crowd-sourced listening tests. Vocoder quality was measured within th...
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditionin...
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net....
Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models...
A new method is proposed for solving the glottal inverse filtering (GIF) problem. The goal of GIF is to separate an acoustical speech signal into two parts: the glottal airflow excitation and the vocal tract filter. To recover such information one has to deal with a blind deconvolution problem. This ill-posed inverse problem is solved under a deter...
Linear prediction (LP) is a prevalent source-filter separation method of speech production. One of the drawbacks of conventional LP-based approaches is the biasing of estimated formants by harmonic peaks. Methods such as discrete all-pole modeling and weighted LP have been proposed to overcome this problem, but they all use a linear frequency scale...
This study proposes an approach for glottal inverse filtering of acoustic speech signals using quadratic programming (QPR). The method aims to jointly model the effect of vocal tract and lip radiation with a single filter whose coefficients are optimized using QPR. This optimization is based on the principles of closed phase analysis, where the con...
Achieving high quality and naturalness in statistical parametric synthesis of female voices remains to be difficult despite recent advances in the study area. Vocoding is one such key element in all statistical speech synthesizers that is known to affect the syn- thesis quality and naturalness. The present study focuses on a spe- cial type of vocod...
In the analysis of speech production, glottal inverse filtering has proved to be an effective yet non-invasive method for obtaining information about the voice source. One of the main challenges of the existing methods is blind estimation of the contribution of the lip radiation, which must often be manually determined. To obtain a fully automatic...
Parameterization of the glottal flow is a process where the glottal flow is represented in terms of a few numerical values. This study proposes a novel parameterization technique called the phase plane symmetry (PPS) parameter that utilizes the symmetrical properties of the phase plane plot. Phase plane is a way to graphically visualize the glottal...
This study presents a new glottal inverse filtering (GIF) technique based on closed phase analysis over multiple fundamental periods. The proposed quasi closed phase (QCP) analysis method utilizes weighted linear prediction (WLP) with a specific attenuated main excitation (AME) weight function that attenuates the contribution of the glottal source...
This study presents a new glottal inverse filtering (GIF) technique based on the closed phase analysis over multiple fundamental periods. The proposed Quasi Closed Phase Analysis (QCP) method utilizes Weighted Linear Prediction (WLP) with a specific Attenuated Main Excitation (AME) weighting function that attenuates the contribution of the glottal...
In this study, the acoustic properties of shouted speech are ana-lyzed in relation to normal speech, and various synthesis tech-niques for shouting are investigated. The analysis shows large differences between the two styles, which induces difficulties in synthesis. Analysis-synthesis experiments show that the use of spectral estimation methods th...
This paper presents a new glottal inverse filtering (GIF) method that utilizes a Markov chain Monte Carlo (MCMC) algorithm. First, initial estimates of the vocal tract and glottal flow are evaluated by an existing GIF method, iterative adaptive inverse filtering (IAIF). Simultaneously, the initially estimated glottal flow is synthesized using the R...