Ricardo Gutierrez-Osuna

Ricardo Gutierrez-Osuna
Texas A&M University | TAMU · Department of Computer Science and Engineering

PhD

About

190
Publications
45,342
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,727
Citations
Additional affiliations
July 2002 - present
Texas A&M University
Position
  • Professor (Full)

Publications

Publications (190)
Preprint
Full-text available
Sustained high levels of blood glucose in type 2 diabetes (T2DM) can have disastrous long-term health consequences. An essential component of clinical interventions for T2DM is monitoring dietary intake to keep plasma glucose levels within an acceptable range. Yet, current techniques to monitor food intake are time intensive and error prone. To add...
Article
Long-term application of chemical sensor arrays for continuous monitoring is challenging as a result of sensor drift. Drift correction often requires periodic recalibration, which may not be feasible for sensors deeply embedded and deployed for uninterrupted continuous monitoring. In this paper, we propose a multi-calibration ensemble approach to c...
Article
Full-text available
Mid-infrared (mid-IR) sensors consisting of silicon nitride (SiN) waveguides were designed and tested to detect volatile organic compounds (VOCs). SiN thin films, prepared by low-pressure chemical vapor deposition (LPCVD), have a broad mid-IR transparent region and a lower refractive index (n SiN = 2.0) than conventional materials such as Si (n Si...
Article
Diet monitoring is an essential intervention component for a number of diseases, from type 2 diabetes to cardiovascular diseases. However, current methods for diet monitoring are burdensome and often inaccurate. In prior work, we showed that continuous glucose monitors (CGMs) may be used to predict the macronutrients in a meal (e.g., carbohydrates,...
Article
Full-text available
Working in a fast-paced environment can lead to shallow breathing, which can exacerbate stress and anxiety. To address this issue, this study aimed to develop micro-interventions that can promote deep breathing in the presence of stressors. First, we examined two types of breathing guides to help individuals learn deep breathing: providing their br...
Article
Foreign accent conversion (FAC) aims to create a new voice that has the voice identity of a given second-language (L2) speaker but with a native (L1) accent. Previous FAC approaches usually require training a separate model for each L2 speaker and, more importantly, generally require considerable speech data from each L2 speaker for training. To ad...
Article
This article provides an up-to-date review of technological advances in 3 key areas related to diet monitoring and precision nutrition. First, we review developments in mobile applications, with a focus on food photography and artificial intelligence to facilitate the process of diet monitoring. Second, we review advances in 2 types of wearable and...
Article
Foreign accent conversion (FAC) is the problem of generating a synthetic voice that has the voice identity of a second-language (L2) learner and the pronunciation patterns of a native (L1) speaker. This synthetic voice has been referred to as a “golden-speaker” in the pronunciation-training literature. FAC is generally achieved by building a voice-...
Article
Background The amount of the macronutrients protein and carbohydrate (CHO) in a mixed meal is known to affect each other's digestion, absorption, and subsequent metabolism. While the effect of the amount of dietary protein and fat on the glycemic response is well studied, the ability of postprandial plasma amino acid patterns to predict the meal co...
Article
Digital games can make speech therapy exercises more enjoyable for children and increase their motivation during therapy. However, many such games developed to date have not been designed for long-term use. To address this issue, we developed Apraxia World, a speech therapy game specifically intended to be played over extended periods. In this stud...
Article
Methods to measure work stress generally rely on subjective measures from questionnaires or require dedicated sensors that are cumbersome to wear and interfere with the task. To address this problem, we propose a method to detect stress unobtrusively using commodity devices (keyboards, mice) instrumented with pressure sensors. We propose a minimali...
Article
Full-text available
This work focuses on the development of nanoparticle-based layer-by-layer (LbL) coatings for enhancing the detection sensitivity and selectivity of volatile organic compounds (VOCs) using on-chip mid-infrared (MIR) waveguides (WGs). First, we demonstrate construction of conformal coatings of polymer/mesoporous silica nanoparticles (MSNs) on the sur...
Article
Functionalization of optical waveguides with submicron coatings of zinc peroxide (ZnO2) and silica (SiO2) nanoparticles (NPs) is reported that enabled selective concentration of acetone vapors in the vicinity of the waveguide, boosting the sensitivity of a mid infrared (MIR) on-chip detector. Controlled thickness was achieved by introducing precise...
Conference Paper
Full-text available
This paper presents a methodology to study the role of non-native accents on talker recognition by humans. The methodology combines a state-of-the-art accent-conversion system to resynthesize the voice of a speaker with a different accent of her/his own, and a protocol for perceptual listening tests to measure the relative contribution of accent an...
Article
Purpose One of the key principles of motor learning supports using knowledge of results feedback (KR, i.e., whether a response was correct / incorrect only) during high intensity motor practice, rather than knowledge of performance (KP, i.e., whether and how a response was correct/incorrect). In the future, mobile technology equipped with automatic...
Article
The accurate identification of likely segmental pronunciation errors produced by nonnative speakers of English is a longstanding goal in pronunciation teaching. Most lists of pronunciation errors for speakers of a particular first language (L1) are based on the experience of expert linguists or teachers of English as a second language (ESL) and Eng...
Conference Paper
Full-text available
Working in an environment with constant interruptions is known to affect stress, but how do interruptions affect emotional expression? Emotional expression can have significant impact on interactions among coworkers. We analyzed the video of 26 participants who performed an essay task in a laboratory while receiving either continual email interrupt...
Article
Sparse-coding techniques for voice conversion assume that an utterance can be decomposed into a sparse code that only carries linguistic contents, and a dictionary of atoms that captures the speakers' characteristics. However, conventional dictionary-construction and sparse-coding algorithms rarely meet this assumption. The result is that the spars...
Article
Full-text available
We describe a controlled experiment, aiming to study productivity and stress effects of email interruptions and activity interactions in the modern office. The measurement set includes multimodal data for n = 63 knowledge workers who volunteered for this experiment and were randomly assigned into four groups: (G1/G2) Batch email interruptions with/...
Article
The type of voice model used in Computer Assisted Pronunciation Instruction is a crucial factor in the quality of practice and the amount of uptake by language learners. As an example, prior research indicates that second-language learners are more likely to succeed when they imitate a speaker with a voice similar to their own, a so-called “golden...
Conference Paper
Automatic speech recognition (ASR) technology can be a useful tool in mobile apps for child speech therapy, empowering children to complete their practice with limited caregiver supervision. However, little is known about the feasibility of performing ASR on mobile devices, particularly when training data is limited. In this study, we investigated...
Article
Full-text available
Several unobtrusive sensors have been tested in studies to capture physiological reactions to stress in workplace settings. Lab studies tend to focus on assessing sensors during a specific computer task, while in situ studies tend to offer a generalized view of sensors’ efficacy for workplace stress monitoring, without discriminating different task...
Article
Accent conversion (AC) aims to transform non-native utterances to sound as if the speaker had a native accent. This can be achieved by mapping source speech spectra from a native speaker into the acoustic space of the target non-native speaker. In prior work, we proposed an AC approach that matches frames between the two speakers based on their aco...
Article
Video sharing sites have become keepers of de-facto digital libraries of sign language content, being used to store videos including the experiences, knowledge, and opinions of many in the deaf or hard of hearing community. Due to limitations of term-based search over metadata, these videos can be difficult to find, reducing their value to the comm...
Conference Paper
Full-text available
Workplace environments are characterized by frequent interruptions that can lead to stress. However, measures of stress due to interruptions are typically obtained through self-reports, which can be afected by memory and emotional biases. In this paper, we use a thermal imaging system to obtain objective measures of stress and investigate personali...
Article
Purpose: To assist in remote treatment, speech-language pathologists (SLPs) rely on mobile games, which though entertaining, lack feedback mechanisms. Games integrated with automatic speech recognition (ASR) offer a solution where speech productions control gameplay. We therefore performed a feasibility study to assess children's and SLPs' experie...
Article
Full-text available
In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 cont...
Article
Purpose: A systematic search and review of published studies was conducted on the use of automated speech analysis (ASA) tools for analysing and modifying speech of typically-developing children learning a foreign language and children with speech sound disorders to determine (i) types, attributes, and purposes of ASA tools being used; (ii) accura...
Article
This paper investigates the effect of reinforcement schedules on biofeedback games for stress self-regulation. In particular, it examines whether partial reinforcement can improve resistance to extinction of relaxation behaviors, i.e., once biofeedback is removed. Namely, we compare two types of reinforcement schedules (partial and continuous) in a...
Conference Paper
This paper presents Apraxia World, a remote therapy tool for speech sound disorders that integrates speech exercises into an engaging platformer-style game. In Apraxia World, the player controls the avatar with virtual buttons/joystick, whereas speech input is associated with assets needed to advance from one level to the next. We tested performanc...
Conference Paper
In previous work we presented a Sparse, Anchor-Based Representation of speech (SABR) that uses phonemic “anchors” to represent an utterance with a set of sparse non-negative weights. SABR is speaker-independent: combining weights from a source speaker with anchors from a target speaker can be used for voice conversion. Here, we present an extension...
Article
Biofeedback games are an attractive alternative to standard techniques for learning short-term relaxation skills, especially for young adults. In this paper, we present the design, implementation and evaluation of three respiratory biofeedback games. To validate these games, we compared breathing rate across 100 male only participants ( $23\; \text...
Article
Real-time gas analysis on-a-chip was demonstrated using a mid-infrared (mid-IR) microcavity. Optical apertures for the microcavity were made of ultrathin silicate membranes embedded in a silicon chip using the complementary metal-oxide-semiconductor (CMOS) process. Fourier transform infrared spectroscopy (FTIR) shows that the silicate membrane is t...
Conference Paper
Sign language is the primary medium of communication for many people who are deaf or hard of hearing. Members of this community access online sign language (SL) content posted on video sharing sites to stay informed. Unfortunately, locating SL videos can be difficult since the text-based search on video sharing sites is based on metadata rather tha...
Article
This paper compares the effectiveness of two biofeedback mechanisms to promote acquisition and transfer of deep-breathing skills using a casual videogame. The first biofeedback mechanism, game adaptation, delivers respiratory information by altering an internal parameter of the game; the second, visual biofeedback, displays respiratory information...
Article
This paper presents an approach to use commercial videogames for biofeedback training. It consists of intercepting signals from the game controller and adapting them in real-time based on physiological measurements from the player. We present three sample implementations and a case study for teaching stress self-regulation via an immersive car raci...
Article
This article presents a wavelength selection framework for mixture identification problems. In contrast with multivariate calibration, where the mixture constituents are known and the goal is to estimate their concentration, in mixture identification the goal is to determine which of a large number of chemicals is present. Due to the combinatorial...
Conference Paper
Full-text available
We present SABR (Sparse, Anchor-Based Representation), an analysis technique to decompose the speech signal into speaker-dependent and speaker-independent components. Given a collection of utterances for a particular speaker, SABR uses the centroid for each phoneme as an acoustic " anchor, " then applies Lasso regularization to represent each speec...
Conference Paper
Full-text available
Four commercial e-nose instruments (Multisensor Systems, Alpha MOS, iSense, and Nordic Sensors Technologies) and a trained human panel tested cabin odors generated by heat cycling four new automobiles. Odor samples were collected at Hyundai Motor Group (HMG) and express-shipped to four university partners for analysis by an aggregate of 155 gas sen...
Conference Paper
The Internet provides access to content in almost all languages through a combination of crawling, indexing, and ranking capabilities. The ability to locate content on almost any topic has become expected for most users. But it is not the case for those whose primary language is a sign language. Members of this community communicate via the Interne...
Conference Paper
Full-text available
We describe a method for adapting a physical vocal tract model's anatomical and gestural parameters using acoustic information to match a target speaker. Physical vocal tract models are hard to adjust to match a speaker, as doing so requires information which is difficult to capture, such as X-Ray or MRI information. We propose an analysis-by-synth...
Article
This article presents an active wavelength selection algorithm for multicomponent analysis with tunable infrared sensors. Traditional techniques for wavelength selection operate off-line; as a result, the resulting feature subset is fixed and only optimal for the specific mixtures and noise levels in the training set. To address this limitation, th...
Article
Odor quality in the cabin air of automobiles can be a significant factor in the decision to purchase a vehicle and the overall customer satisfaction with the vehicle over time. A current standard practice uses a human panel to rate the vehicle cabin odors on intensity, irritation, and pleasantness. However, human panels are expensive, time-consumin...
Article
Odor quality in the cabin air of automobiles can be a significant factor in the decision to purchase a vehicle and the overall customer satisfaction with the vehicle over time. Current standard practice uses a human panel to rate the vehicle cabin odors on intensity, irritation, and pleasantness. However, human panels are expensive, time-consuming...
Conference Paper
Full-text available
This paper introduces a pronunciation verification method to be used in an automatic assessment therapy tool of child disordered speech. The proposed method creates a phonebased search lattice that is flexible enough to cover all probable mispronunciations. This allows us to verify the correctness of the pronunciation and detect the incorrect phone...
Article
We present an adaptive biofeedback game that aims to maintain the player’s arousal by modifying game difficulty in response to the player’s physiological state, as measured with wearable sensors. Our approach models the interaction between human physiology and game difficulty during gameplay as a control problem, where game difficulty is the system...
Article
Tracking the performance and health of capacitor banks in distribution systems is a challenging task due to their high number and the widespread geographical distribution of feeder circuits. In this work we propose a signal processing technique capable of identifying and characterizing the number of capacitor banks connected to a standard North-Ame...
Conference Paper
We present an auditory biofeedback technique that may be used as a tool for stress management. The technique encourages slow breathing by adjusting the quality of a music recording in proportion to the user's respiration rate. We propose two forms of acoustic degradation, one that adds white noise to the recording if the user's breathing deviates f...
Article
Full-text available
We describe and compare three methods that can be used to normalize articulatory data across speakers. The methods seek to explain systematic anatomical differences between a source and target speaker without modifying the articulatory velocities of the source speaker. The first method is the classical Procrustes transform, which allows for a globa...
Conference Paper
Locating sign language (SL) videos on video sharing sites (e.g., YouTube) is challenging because search engines generally do not use the visual content of videos for indexing. Instead, indexing is done solely based on textual content (e.g., title, description, metadata). As a result, untagged SL videos do not appear in the search results. In this p...
Article
Video sharing sites enable members of the sign language community to record and share their knowledge, opinions, and worries on a wide range of topics. As a result, these sites have formative digital libraries of sign language content hidden within their large overall collections. This article explores the problem of locating these sign language (S...
Article
This paper describes a new semi-supervised learning algorithm for intra-class clustering (ICC). ICC partitions each class into sub-classes in order to minimize overlap across clusters from different classes. This is achieved by allowing partitioning of a certain class to be assisted by data points from other classes in a context-dependent fashion....
Conference Paper
We present Chill-Out, an adaptive biofeedback game that teaches relaxation skills by monitoring the breathing rate of the player. The game uses a positive feedback loop that penalizes fast breathing by means of a proportional derivative control law: rapid (and/or increasing) breathing rates increase game difficulty and reduce the final score of the...
Conference Paper
New sensor technologies such as Fabry-Pérot interferometers (FPI) offer low-cost and portable alternatives to traditional infrared absorption spectroscopy for chemical analysis. However, with FPIs the absorption spectrum has to be measured one wavelength at a time. In this work, we propose an active-sensing framework to select a subset of wavelengt...
Conference Paper
Certain speech modifications, such as changes in foreign/regional accents or articulatory styles, are performed more effectively in the articulatory domain than in the acoustic domain. Though measuring articulators is cumbersome, articulatory parameters may be estimated from acoustics through inversion. In this paper, we study the impact on synthes...
Conference Paper
Registration between low-resolution images is a crucial step in super-resolution. Conventional methods tend to separate scale estimation from translation and rotation estimation. This is because the scale parameter is inherently related to the image resolution. In this paper, we present an area-based image registration technique that can simultaneo...
Conference Paper
The ability to measure a person's physiological parameters in a contact less fashion (i.e., without attaching electrodes to the skin) has tremendous potential in a number of applications, from affective interfaces to healthcare delivery. In this paper, we present a proof-of-concept method for measuring one such vital parameter, heart rate variabili...