Maureen Stone

University of Maryland, Baltimore, Baltimore, Maryland, United States

Are you Maureen Stone?

Claim your profile

Publications (123)113.82 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: A new contour-tracking algorithm is presented for ultrasound tongue image sequences, which can follow the motion of tongue contours over long durations with good robustness. To cope with missing segments caused by noise, or by the tongue midsagittal surface being parallel to the direction of ultrasound wave propagation, active contours with a contour-similarity constraint are introduced, which can be used to provide 'prior' shape information. Also, in order to address accumulation of tracking errors over long sequences, we present an automatic re-initialization technique, based on the complex wavelet image similarity index. Experiments on synthetic data and on real 60 frame per second (fps) data from different subjects demonstrate that the proposed method gives good contour tracking for ultrasound image sequences even over durations of minutes, which can be useful in applications such as speech recognition where very long sequences must be analyzed in their entirety.
    No preview · Article · Jan 2016 · Clinical Linguistics & Phonetics
  • [Show abstract] [Hide abstract]
    ABSTRACT: Quantitative characterization and comparison of tongue motion during speech and swallowing present fundamental challenges because of striking variations in tongue structure and motion across subjects. A reliable and objective description of the dynamics tongue motion requires the consistent integration of inter-subject variability to detect the subtle changes in populations. To this end, in this work, we present an approach to constructing an unbiased spatio-temporal atlas of the tongue during speech for the first time, based on cine-MRI from twenty two normal subjects. First, we create a common spatial space using images from the reference time frame, a neutral position, in which the unbiased spatio-temporal atlas can be created. Second, we transport images from all time frames of all subjects into this common space via the single transformation. Third, we construct atlases for each time frame via groupwise diffeomorphic registration, which serves as the initial spatio-temporal atlas. Fourth, we update the spatio-temporal atlas by realigning each time sequence based on the Lipschitz norm on diffeomorphisms between each subject and the initial atlas. We evaluate and compare different configurations such as similarity measures to build the atlas. Our proposed method permits to accurately and objectively explain the main pattern of tongue surface motion.
    No preview · Article · Jul 2015 · Information processing in medical imaging: proceedings of the ... conference
  • Source
    Chuyang Ye · Emi Murano · Maureen Stone · Jerry L Prince
    [Show abstract] [Hide abstract]
    ABSTRACT: The tongue is a critical organ for a variety of functions, including swallowing, respiration, and speech. It contains intrinsic and extrinsic muscles that play an important role in changing its shape and position. Diffusion tensor imaging (DTI) has been used to reconstruct tongue muscle fiber tracts. However, previous studies have been unable to reconstruct the crossing fibers that occur where the tongue muscles interdigitate, which is a large percentage of the tongue volume. To resolve crossing fibers, multi-tensor models on DTI and more advanced imaging modalities, such as high angular resolution diffusion imaging (HARDI) and diffusion spectrum imaging (DSI), have been proposed. However, because of the involuntary nature of swallowing, there is insufficient time to acquire a sufficient number of diffusion gradient directions to resolve crossing fibers while the in vivo tongue is in a fixed position. In this work, we address the challenge of distinguishing interdigitated tongue muscles from limited diffusion magnetic resonance imaging by using a multi-tensor model with a fixed tensor basis and incorporating prior directional knowledge. The prior directional knowledge provides information on likely fiber directions at each voxel, and is computed with anatomical knowledge of tongue muscles. The fiber directions are estimated within a maximum a posteriori (MAP) framework, and the resulting objective function is solved using a noise-aware weighted ℓ1-norm minimization algorithm. Experiments were performed on a digital crossing phantom and in vivo tongue diffusion data including three control subjects and four patients with glossectomies. On the digital phantom, effects of parameters, noise, and prior direction accuracy were studied, and parameter settings for real data were determined. The results on the in vivo data demonstrate that the proposed method is able to resolve interdigitated tongue muscles with limited gradient directions. The distributions of the computed fiber directions in both the controls and the patients were also compared, suggesting a potential clinical use for this imaging and image analysis methodology. Copyright © 2015 Elsevier Ltd. All rights reserved.
    Full-text · Article · Jul 2015 · Computerized medical imaging and graphics: the official journal of the Computerized Medical Imaging Society
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with new capturing technologies to safeguard and transmit endangered intangible cultural heritage including Corsican multipart singing technique. The described work, part of the European FP7 i-Treasures project, aims at increasing our knowledge on rare singing techniques. This paper includes (i) a presentation of our light hyper-helmet with 5 non-invasive sensors (microphone, camera, ultrasound sensor, piezoelectric sensor, electroglottograph), (ii) the data acquisition process and software modules for visualization and data analysis, (iii) a case study on acoustic analysis of voice quality for the UNESCO labelled traditional Cantu in Paghjella. We have identified specific features for this singing style, such as changes in vocal quality, especially concerning the energy in the speaking and singing formant frequency region, a nasal vibration that seems to occur during singing, as well as laryngeal mechanism characteristics. These capturing and analysis technologies will contribute to define relevant features for a future educational platform.
    No preview · Article · Jun 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biomechanical models of the oropharynx are beneficial to treatment planning of speech impediments by providing valuable insight into the speech function such as motor control. In this paper, we develop a subject-specific model of the oropharynx and investigate its utility in speech production. Our approach adapts a generic tongue-jaw-hyoid model [Stavness I, Lloyd JE, Payan Y, Fels S. 2011. Coupled hard-soft tissue simulation with contact and constraints applied to jaw-tongue-hyoid dynamics. Int J Numer Method Biomed Eng. 27(3):367-390] to fit and track dynamic volumetric MRI data of a normal speaker, subsequently coupled to a source-filter-based acoustic synthesiser. We demonstrate our model's ability to track tongue tissue motion, simulate plausible muscle activation patterns, as well as generate acoustic results that have comparable spectral features to the associated recorded audio. Finally, we propose a method to adjust the spatial resolution of our subject-specific tongue model to match the fidelity level of our MRI data and speech synthesiser. Our findings suggest that a higher resolution tongue model - using similar muscle fibre definition - does not show a significant improvement in acoustic performance, for our speech utterance and at this level of fidelity; however, we believe that our approach enables further refinements of the muscle fibres suitable for studying longer speech sequences and finer muscle innervation using higher resolution dynamic data.
    No preview · Article · May 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents an interactive game-like application to learn, perform and evaluate modern contemporary singing. The Human Beat Box (HBB) is being used as a case study. The game consists of two main modules. A sensor module that consists of a portable helmet based system containing an ultrasonic (US) transducer to capture tongue movements, a video camera for the lips, Kinect camera for face gestures, and a microphone for sound. A 3D environment game module is used to visualize a 3D recording studio as game world with all of its unique elements like guitars, mixer, amplifier, speakers and a microphone in front of the 3D avatar to simulate the recording ambience. The game also features a 2D virtual tutor to help the learner by giving oral and written feedback during the game. He also gives feedbacks during the practice session to improve the student’s performance. The game is still at its early stages of development and it is been tested using simple HBB plosive sounds for percussion such as “PTK”.
    Full-text · Conference Paper · Mar 2015
  • Fangxu Xing · Chuyang Ye · Jonghye Woo · Maureen Stone · Jerry L Prince
    [Show abstract] [Hide abstract]
    ABSTRACT: The human tongue is composed of multiple internal muscles that work collaboratively during the production of speech. Assessment of muscle mechanics can help understand the creation of tongue motion, interpret clinical observations, and predict surgical outcomes. Although various methods have been proposed for computing the tongue's motion, associating motion with muscle activity in an interdigitated fiber framework has not been studied. In this work, we aim to develop a method that reveals different tongue muscles' activities in different time phases during speech. We use four-dimensional tagged magnetic resonance (MR) images and static high-resolution MR images to obtain tongue motion and muscle anatomy, respectively. Then we compute strain tensors and local tissue compression along the muscle fiber directions in order to reveal their shortening pattern. This process relies on the support from multiple image analysis methods, including super-resolution volume reconstruction from MR image slices, segmentation of internal muscles, tracking the incompressible motion of tissue points using tagged images, propagation of muscle fiber directions over time, and calculation of strain in the line of action, etc. We evaluated the method on a control subject and two post-glossectomy patients in a controlled speech task. The normal subject's tongue muscle activity shows high correspondence with the production of speech in different time instants, while both patients' muscle activities show different patterns from the control due to their resected tongues. This method shows potential for relating overall tongue motion to particular muscle activity, which may provide novel information for future clinical and scientific studies.
    No preview · Article · Feb 2015 · Proceedings of SPIE - The International Society for Optical Engineering
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Magnetic resonance imaging (MRI) is an essential tool in the study of muscle anatomy and functional activity in the tongue. Objective assessment of similarities and differences in tongue structure and function has been performed using unnormalised data, but this is biased by the differences in size, shape and orientation of the structures. To remedy this, we propose a methodology to build a 3D vocal tract atlas based on structural MRI volumes from 20 normal subjects. We first constructed high-resolution volumes from three orthogonal stacks. We then removed extraneous data so that all 3D volumes contained the same anatomy. We used an unbiased diffeomorphic groupwise registration using a cross-correlation similarity metric. Principal component analysis was applied to the deformation fields to create a statistical model from the atlas. Various evaluations and applications were carried out to show the behaviour and utility of the atlas.
    Full-text · Article · Jan 2015
  • Source
    Jonghye Woo · Maureen Stone · Jerry L. Prince
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimodal image registration is a class of algorithms to find correspondence from different modalities. Since different modalities do not exhibit the same characteristics, finding accurate correspondence still remains a challenge. In order to deal with this, mutual information (MI) based registration has been a preferred choice as MI is based on the statistical relationship between both volumes to be registered. However, MI has some limitations. First, MI based registration often fails when there are local intensity variations in the volumes. Second, MI only considers the statistical intensity relationships between both volumes and ignores the spatial and geometric information about the voxel. In this work, we propose to address these limitations by incorporating spatial and geometric information via a 3D Harris operator. Specifically, we focus on the registration between a high-resolution image and a low-resolution image. The MI cost function is computed in the regions where there are large spatial variations such as corner or edge. In addition, the MI cost function is augmented with geometric information derived from the 3D Harris operator applied to the high-resolution image. The robustness and accuracy of the proposed method were demonstrated using experiments on synthetic and clinical data including the brain and the tongue. The proposed method provided accurate registration and yielded better performance over standard registration methods.
    Full-text · Article · Dec 2014 · IEEE Transactions on Image Processing
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Imaging and quantification of tongue anatomy is helpful in surgical planning, post-operative rehabilitation of tongue cancer patients, and studying of how humans adapt and learn new strategies for breathing, swallowing and speaking to compensate for changes in function caused by disease, medical interventions or aging. In vivo acquisition of high-resolution three-dimensional (3D) magnetic resonance (MR) images with clearly visible tongue muscles is currently not feasible because of breathing and involuntary swallowing motions that occur over lengthy imaging times. However, recent advances in image reconstruction now allow the generation of super-resolution 3D MR images from sets of orthogonal images, acquired at a high in-plane resolution and combined using super-resolution techniques. This paper presents, to the best of our knowledge, the first attempt towards automatic tongue muscle segmentation from MR images. We devised a database of ten super-resolution 3D MR images, in which the genioglossus and inferior longitudinalis tongue muscles were manually segmented and annotated with landmarks. We demonstrate the feasibility of segmenting the muscles of interest automatically by applying the landmark-based game-theoretic framework (GTF), where a landmark detector based on Haar-like features and an optimal assignment-based shape representation were integrated. The obtained segmentation results were validated against an independent manual segmentation performed by a second observer, as well as against B-splines and demons atlasing approaches. The segmentation performance resulted in mean Dice coefficients of 85.3%, 81.8%, 78.8% and 75.8% for the second observer, GTF, B-splines atlasing and demons atlasing, respectively. The obtained level of segmentation accuracy indicates that computerized tongue muscle segmentation may be used in surgical planning and treatment outcome analysis of tongue cancer patients, and in studies of normal subjects and subjects with speech and swallowing problems.
    Full-text · Article · Nov 2014 · Medical Image Analysis
  • Source

    Full-text · Article · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The production of speech includes considerable variability in speech gestures despite our perception of very repeatable sounds. Variability is seen in vocal tract shapes and tongue contours when different speakers produce the same sound. This study asks whether internal tongue motion patterns for a specific sound are similar across subjects, or whether they indicate multiple gestures. There are two variants of the sound /s/, which may produce two gestures or may represent a multitude of gestures. The first goal of this paper was to quantify internal tongue differences between these allophones in normal speakers. The second goal was to test how these differences are affected by subjects expected to have different speech gestures: normal controls and subjects who have had tongue cancer surgery. The study used tagged MRI to capture midsagittal tongue motion patterns and principal components analyses to identify patterns of variability that define subject groups and /s/ types. Results showed no motion differences between apical and laminal controls in either the tongue tip or whole tongue. These results did not support unique tongue behaviours for apical and laminal /s/. The apical patients, however, differed from all other speakers and were quite uniform as a group. They had no elevation and considerable downward/backward motion of the tongue tip. This was consistent with difficulty in maintaining the tip–blade region at the proper distance from the palate.
    No preview · Article · Oct 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an early version of an open extendable research and educational platform to support users in learning and mastering the different types of rare-singing. The platform is interfaced with a portable helmet to synchronously capture multiple signals during singing in a non-laboratory environment. Collected signals reflect articulatory movements and induced vibrations. The platform consists of four main modules: i) a capture and recording module, ii) a data replay (post processing) module, iii) an acoustic auto adaptation learning module, iv) and a 3D visualization sensory motor learning module. Our demo will focus on the first two modules. The system has been tested on two rare endangered singing musical styles, the Corsican “Cantu in Paghjella”, and the Byzantine hymns from Mount Athos, Greece. The versatility of the approach is further demonstrated by capturing a contemporary singing style known as “Human Beat Box.”
    Full-text · Conference Paper · Sep 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: INTRODUCTION: Human swallowing and its disorders (dysphagia) are still poorly understood, and yet many speech-language pathologists (SLPs) need to be trained to recognize correct, incorrect, and potentially dangerous swallows. The anatomy of the head and neck region is notoriously complex and difficult to visualize and study. Currently, training programs that teach SLPs to recognize swallowing disorders use artistically derived animations of swallowing, rendered at fixed viewpoints, to help students visualize the anatomy of the head and neck region. This work improves on these animations by using state-of-the-art medical images to create a dynamic, interactive, 3D simulation of human swallowing. Images of a male subject during swallow were captured in a single shot using a 320-slice CT scanner [Inamoto et al. 2011]. The images have very high spatial resolution (0.5 x 0.5 x 0.5 mm), but low temporal resolution (10 Hz). The low temporal resolution resulted in blurring of the fluid being swallowed, making automatic segmentation and visualizations of the fluid difficult to generate. APPROACH: A moving airway boundary was segmented from the medical images, and this in turn was used as the input to a Smoothed Particle Hydrodynamics simulation (SPH). The low temporal resolution of the images meant that interpolation between time frames was required. In order to generate the moving airway, an initial surface mesh representation was created from the first 3D volume of the sequence using the commercial software package Amira . This initial mesh was deformably registered to subsequent volumes in the time sequence using Blender modelling software. By using the “Sculpting” features of Blender, the mesh topology remains constant between time frames. This allows us to interpolate the airway positions in between time frames by simply performing interpolation over the vertex positions in time. In this work, cubic interpolation was used. The liquid being swallowed was simulated using SPH. A standard, viscous, weakly-compressible SPH formulation was used. There is good agreement between the predicted fluid position and the positions shown in the images. CONCLUSIONS AND FUTURE WORK: Using the ArtiSynth simulation platform, one is now able to visualize the swallowing in any 3D orientation, make use of transparencies, scrub back and forth through time, and arbitrarily slice the geometry in order to isolate a particular view of the airway. This work can be extended in order to allow SLPs-in-training to modify the airway boundary timings and to visualize the results. Provided the simulations are adequately valid, this will provide an excellent platform for asking “what-if” types of questions, and give a much deeper understanding of the importance of swallowing timings.
    No preview · Conference Paper · Aug 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Generic biomechanical models of the oral, pharyngeal and la-ryngeal structures have been adopted into the ArtiSynth simula-tion framework (www.artisynth.org). Forward-dynamics track-ing of FE model of the tongue was previously addressed through solving the inverse problem (Stavness, Lloyd, and Fels 2012). The estimated biomechanics were evaluated using either the av-erage motion reported in the literature or those of a different subject. We expand the existing generic platform to allow for subject-specific simulations, in order to (1) better evaluate the simulated biomechanics, (2) investigate the inter-subject vari-ability and (3) provide additional insight into the speech pro-duction.
    Full-text · Conference Paper · May 2014
  • Source

    Full-text · Article · May 2014
  • Jonghye Woo · Fangxu Xing · Maureen Stone · Jerry L Prince
    [Show abstract] [Hide abstract]
    ABSTRACT: The human tongue produces oromotor behaviors such as speaking, swallowing, and breathing, which are executed by deforming local functional units using complex muscular array. Therefore, identifying functional units and understanding the mechanisms of coupling among them in relation to the underlying anatomical structures can aid significantly in the understanding of normal motor control and the development of clinical diagnoses and surgical procedures. Magnetic resonance imaging (MRI) has been widely used to observe detailed structures in the vocal tract and to measure internal tissue motion of the tongue. This work aims at determining the functional units from tagged MRI and muscle maps are extracted from high-resolution MRI. A non-negative matrix factorization method with a sparsity constraint is utilized to extract an activation map for each tissue point using a set of motion quantities extracted from tagged MRI including information from point trajectories (i.e., displacement, angle, and curvature) and strain. The activation map is then used to determine the coherent region using spectral clustering, revealing functional units and their relations to the underlying muscles. We test our algorithm on simple protrusion and speech tasks, demonstrating that the proposed algorithm can determine the correlated patterns of the tissue point tracking trajectories and strain.
    No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study aims to ascertain the effects of tongue cancer surgery (glossectomy) on tongue motion during the speech sounds "s" and "sh." Subjects were one control and three glossectomies. The first patient had surgery closed with sutures. The second had sutures plus radiation, which produces fibrosis and stiffness. The third was closed with an external free flap, and is of particular interest since he has no direct motor control of the flap. Cine and tagged-MRI data were recorded in axial, coronal and sagittal orientations at 26 fps. 3D tissue point motion was tracked at every time-frame in the word. 3D displacement fields were calculated at each time-frame to show tissue motion during speech. A previous pilot study showed differences in "s" production [Pedersen et al., JASA (2013)]. Specifically, subjects differed in internal tongue motion pattern, and the flap patient had unusual genioglossus lengthening patterns. The "s" requires a midline tongue groove, which is challenging for the patients. This study continues that effort by adding the motion of "sh," because "sh" does not require a midline groove and may be easier for the patients to pronounce. We also add more muscles, to determine how they interact to produce successful motion. [This study was supported by NIH R01CA133015.].
    No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America
  • [Show abstract] [Hide abstract]
    ABSTRACT: Assessment of tongue muscle mechanics during speech helps interpret clinical observations and provides data that can predict optimal surgical outcomes. Magnetic resonance imaging (MRI) is a non-invasive method for imaging the tongue that provides information about anatomy and motion. In this work, we aim to develop a pipeline to track 4D (3D space with time) muscle mechanics in order to measure motion similarities and differences in normal and glossectomy speakers. The pipeline comprises of several modules including super-resolution volume reconstruction of high-resolution MRI (hMRI) and cine-MRI, deformable registration of hMRI with cine-MRI to establish muscle correspondences, tracking tissue points using incompressible motion estimation algorithm (IDEA) from tagged MRI, followed by calculation of muscle mechanics including displacement, rotation, elongation, etc. IDEA is the 3D motion estimated from harmonic phase (HARP) motion that is obtained from tagged MRI. The proposed pipeline was evaluated on five subjects including both normal and glossectomy speakers, yielding accurate tracking results as visually assessed. In addition, we were able to differentiate normal and abnormal muscle mechanics, potentially providing invaluable information for interpreting clinical observations and predicting surgical outcomes.
    No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America
  • Xinhui Zhou · Jonghye Woo · Maureen Stone · Carol Espy-Wilson
    [Show abstract] [Hide abstract]
    ABSTRACT: It is unclear in glossectomy whether a flap will improve or impair speech, at least in the moderate sized (T2) tongue tumors. To gain some insights into this question, we studied the speech production of a post-glossectomy speaker, who had a T2 tumor surgically removed from the left side of his tongue and, closed with a radial forearm free flap (RFFF). Our acoustic analysis showed that this speaker had a significantly smaller vowel space and a significantly higher center of gravity in "sh", but not in "s", compared with the averages of normal controls or post-glossectomy speakers with primary closures. Based on cine and tagged magnetic resonance (MR) images, we analyzed the vocal tract shapes of two vowels and two fricatives and studied the tongue motion in transition of phonemes on this speaker and two controls. We will compare the vocal tract models between the flap patient and the controls. [This study was supported by NIH R01CA133015.].
    No preview · Article · Apr 2014 · The Journal of the Acoustical Society of America

Publication Stats

1k Citations
113.82 Total Impact Points

Institutions

  • 1996-2016
    • University of Maryland, Baltimore
      • • Department of Neural and Pain Sciences
      • • Department of Orthodontics
      • • Department of Medicine
      Baltimore, Maryland, United States
  • 2014
    • Loyola University Maryland
      Baltimore, Maryland, United States
  • 2013
    • University of Maryland, College Park
      CGS, Maryland, United States
  • 2012
    • NPS
      Sydney, New South Wales, Australia
  • 2008
    • Institute of Electrical and Electronics Engineers
      Washington, Washington, D.C., United States
  • 2004
    • Yale University
      • Haskins Laboratories
      New Haven, Connecticut, United States
  • 1998-2000
    • University of Delaware
      • Department of Computer and Information Sciences
      Newark, DE, United States