Maureen Stone

University of Maryland, Baltimore, Baltimore, Maryland, United States

Are you Maureen Stone?

Claim your profile

Publications (77)61.12 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The production of speech includes considerable variability in speech gestures despite our perception of very repeatable sounds. Variability is seen in vocal tract shapes and tongue contours when different speakers produce the same sound. This study asks whether internal tongue motion patterns for a specific sound are similar across subjects, or whether they indicate multiple gestures. There are two variants of the sound /s/, which may produce two gestures or may represent a multitude of gestures. The first goal of this paper was to quantify internal tongue differences between these allophones in normal speakers. The second goal was to test how these differences are affected by subjects expected to have different speech gestures: normal controls and subjects who have had tongue cancer surgery. The study used tagged MRI to capture midsagittal tongue motion patterns and principal components analyses to identify patterns of variability that define subject groups and /s/ types. Results showed no motion differences between apical and laminal controls in either the tongue tip or whole tongue. These results did not support unique tongue behaviours for apical and laminal /s/. The apical patients, however, differed from all other speakers and were quite uniform as a group. They had no elevation and considerable downward/backward motion of the tongue tip. This was consistent with difficulty in maintaining the tip–blade region at the proper distance from the palate.
    Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 10/2014; 2(4).
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an early version of an open extendable research and educational platform to support users in learning and mastering the different types of rare-singing. The platform is interfaced with a portable helmet to synchronously capture multiple signals during singing in a non-laboratory environment. Collected signals reflect articulatory movements and induced vibrations. The platform consists of four main modules: i) a capture and recording module, ii) a data replay (post processing) module, iii) an acoustic auto adaptation learning module, iv) and a 3D visualization sensory motor learning module. Our demo will focus on the first two modules. The system has been tested on two rare endangered singing musical styles, the Corsican “Cantu in Paghjella”, and the Byzantine hymns from Mount Athos, Greece. The versatility of the approach is further demonstrated by capturing a contemporary singing style known as “Human Beat Box.”
    Interspeech 2014 conference, Singapore; 09/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study aims to ascertain the effects of tongue cancer surgery (glossectomy) on tongue motion during the speech sounds "s" and "sh." Subjects were one control and three glossectomies. The first patient had surgery closed with sutures. The second had sutures plus radiation, which produces fibrosis and stiffness. The third was closed with an external free flap, and is of particular interest since he has no direct motor control of the flap. Cine and tagged-MRI data were recorded in axial, coronal and sagittal orientations at 26 fps. 3D tissue point motion was tracked at every time-frame in the word. 3D displacement fields were calculated at each time-frame to show tissue motion during speech. A previous pilot study showed differences in "s" production [Pedersen et al., JASA (2013)]. Specifically, subjects differed in internal tongue motion pattern, and the flap patient had unusual genioglossus lengthening patterns. The "s" requires a midline tongue groove, which is challenging for the patients. This study continues that effort by adding the motion of "sh," because "sh" does not require a midline groove and may be easier for the patients to pronounce. We also add more muscles, to determine how they interact to produce successful motion. [This study was supported by NIH R01CA133015.].
    The Journal of the Acoustical Society of America 04/2014; 135(4):2389. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Assessment of tongue muscle mechanics during speech helps interpret clinical observations and provides data that can predict optimal surgical outcomes. Magnetic resonance imaging (MRI) is a non-invasive method for imaging the tongue that provides information about anatomy and motion. In this work, we aim to develop a pipeline to track 4D (3D space with time) muscle mechanics in order to measure motion similarities and differences in normal and glossectomy speakers. The pipeline comprises of several modules including super-resolution volume reconstruction of high-resolution MRI (hMRI) and cine-MRI, deformable registration of hMRI with cine-MRI to establish muscle correspondences, tracking tissue points using incompressible motion estimation algorithm (IDEA) from tagged MRI, followed by calculation of muscle mechanics including displacement, rotation, elongation, etc. IDEA is the 3D motion estimated from harmonic phase (HARP) motion that is obtained from tagged MRI. The proposed pipeline was evaluated on five subjects including both normal and glossectomy speakers, yielding accurate tracking results as visually assessed. In addition, we were able to differentiate normal and abnormal muscle mechanics, potentially providing invaluable information for interpreting clinical observations and predicting surgical outcomes.
    The Journal of the Acoustical Society of America 04/2014; 135(4):2196. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The human tongue produces oromotor behaviors such as speaking, swallowing, and breathing, which are executed by deforming local functional units using complex muscular array. Therefore, identifying functional units and understanding the mechanisms of coupling among them in relation to the underlying anatomical structures can aid significantly in the understanding of normal motor control and the development of clinical diagnoses and surgical procedures. Magnetic resonance imaging (MRI) has been widely used to observe detailed structures in the vocal tract and to measure internal tissue motion of the tongue. This work aims at determining the functional units from tagged MRI and muscle maps are extracted from high-resolution MRI. A non-negative matrix factorization method with a sparsity constraint is utilized to extract an activation map for each tissue point using a set of motion quantities extracted from tagged MRI including information from point trajectories (i.e., displacement, angle, and curvature) and strain. The activation map is then used to determine the coherent region using spectral clustering, revealing functional units and their relations to the underlying muscles. We test our algorithm on simple protrusion and speech tasks, demonstrating that the proposed algorithm can determine the correlated patterns of the tissue point tracking trajectories and strain.
    The Journal of the Acoustical Society of America 04/2014; 135(4):2196. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is unclear in glossectomy whether a flap will improve or impair speech, at least in the moderate sized (T2) tongue tumors. To gain some insights into this question, we studied the speech production of a post-glossectomy speaker, who had a T2 tumor surgically removed from the left side of his tongue and, closed with a radial forearm free flap (RFFF). Our acoustic analysis showed that this speaker had a significantly smaller vowel space and a significantly higher center of gravity in "sh", but not in "s", compared with the averages of normal controls or post-glossectomy speakers with primary closures. Based on cine and tagged magnetic resonance (MR) images, we analyzed the vocal tract shapes of two vowels and two fricatives and studied the tongue motion in transition of phonemes on this speaker and two controls. We will compare the vocal tract models between the flap patient and the controls. [This study was supported by NIH R01CA133015.].
    The Journal of the Acoustical Society of America 04/2014; 135(4):2195. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: PURPOSE Accurate tissue motion tracking within the tongue can help to diagnose and treat vocal tract related disorders, evaluate speech quality before and after surgery, and conduct various scientific studies. We have compared tissue tracking results from four widely used deformable registration (DR) methods applied to Cine-MRI with harmonic phase (HARP)-based tracking applied to tagged-MRI. METHOD Ten subjects repeated the words "a geese" multiple times while sagittal images of the head were collected at 26 Hz, first in a tagged-MRI data set, and then in a Cine-MRI data set. HARP tracked the motion of eight specified tissue points in the tagged data set. Four DR methods including diffeomorphic demons and free-form deformations based on cubic B-spline with three different similarity measures were used to track the same eight points in the Cine-MRI data set. Individual points were tracked and length changes of several muscles were calculated using the DR and HARP based tracking methods. RESULTS Results showed that the DR tracking errors were non-systematic and varied in direction, amount, and timing across speakers and within speakers. Comparison of HARP and DR tracking with manual tracking showed better tracking results for HARP except at the tongue surface, where mistracking caused greater errors in HARP than DR. CONCLUSIONS Tissue point tracking using DR tracking methods contain non-systematic tracking errors within and across subjects, making it less successful than tagged-MRI tracking within the tongue. However, HARP sometimes mistracks points at the tongue surface of tagged MRI due to its limited bandpass filter and tag pattern fading, so that DR has better success measuring surface tissue points on Cine-MRI than HARP does. Therefore a hybrid method is being explored.
    Journal of Speech Language and Hearing Research 02/2014; · 1.97 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Measuring the internal muscular motion and deformation of the tongue during natural human speech is of high interest to head and neck surgeons and speech language pathologists. A pipeline for calculating 3D tongue motion from dynamic cine and tagged Magnetic Resonance (MR) images during speech has been developed. This paper presents the result of a complete analysis of eleven subjects' (seven normal controls and four glossectomy patients) global tongue motion during speech obtained through MR imaging and processed through the tongue motion analysis pipeline. The data is regularized into the same framework for comparison. A generalized two-step principal component analysis is used to show the major difference between patients' and controls' tongue motions. A test is performed to demonstrate the ability of this process to distinguish patient data from control data and to show the potential power of quantitative analysis that the tongue motion pipeline can achieve.
    Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging 12/2013; 2013:816-819.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate segmentation is an important preprocessing step for measuring the internal deformation of the tongue during speech and swallowing using 3D dynamic MRI. In an MRI stack, manual segmentation of every 2D slice and time frame is time-consuming due to the large number of volumes captured over the entire task cycle. In this paper, we propose a semi-automatic segmentation workflow for processing 3D dynamic MRI of the tongue. The steps comprise seeding a few slices, seed propagation by deformable registration, random walker segmentation of the temporal stack of images and 3D super-resolution volumes. This method was validated on the tongue of two subjects carrying out the same speech task with multi-slice 2D dynamic cine-MR images obtained at three orthogonal orientations and 26 time frames. The resulting semi-automatic segmentations of 52 volumes showed an average dice similarity coefficient (DSC) score of 0.9 with reduced segmented volume variability compared to manual segmentations.
    Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging 12/2013; 2013:1465-1468.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study examined a control subject and three patients who had surgery to remove tongue cancer. One patient's surgery was closed with sutures, one with a radial forearm free flap reconstruction, and one with sutures plus radiation. This study aims to ascertain the effects of these traumas on internal and surface tongue motion during speech. The flap consists of soft tissue that is vascularized, but not innervated; it increases tongue bulk, but has no direct motor control. The other two patients have missing tissue and a scar where the cut regions were sewn together. This morphological change may increase difficulty creating properly formed palatal contacts. The supplemental radiation treatment may cause additional muscle stiffness due to fibrosis. The cine and tagged datasets were recorded in axial, coronal, and sagittal orientations using identical parameters so their data could be overlaid. Each dataset was reconstructed into 3D volumes, one for each time-frame in the word. From each cine-MRI volume, the 3D tongue surface was segmented and used as a "mask" in the tagged-MRI volume. Inthe tagged-MRI volumes, 3D displacement fields were calculated to show motion of each tissue point inside the tongue mask during the speech task.
    The Journal of the Acoustical Society of America 11/2013; 134(5):4168. · 1.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.
    The Journal of the Acoustical Society of America 06/2013; 133(6):EL439-45. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Magnetic resonance imaging (MRI) is a widely used technology for non-invasive tongue imaging. MRI can detail tongue and muscle shapes and their variability in both healthy and diseased populations. Such detail can aid significantly in the interpretation of muscle interactions in the tongue, and their relation in normal and disordered speech production. However, the size or shape of the tongue and muscles may vary from one subject to another. In addition, there exists no comprehensive and systematic framework to assess the difference and variability of tongue and muscles in a normalized space. In the present work, we built a multi-subject atlas from 20 normal subjects that are acquired using structural MRI to offer a normalized space on which all subjects from a target population can be mapped and compared. In order to find accurate one-to-one correspondences, we bound the tongue so that each volume had the same vocal tract features. For registration, we utilize symmetric diffeomorphic image registration with cross-correlation, which is widely used in brain image analysis. The atlas facilitates a template-based segmentation in assigning anatomical labels in the images. The tongue atlas is unprecedented and opens new vistas for exploring normal and diseased oral structures and function.
    The Journal of the Acoustical Society of America 05/2013; 133(5):3567. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Production of fricatives involves a narrow supraglottal constriction along the vocal tract. Air flows through the constriction, and generates turbulent noise source(s) by impinging on some obstacles downstream. In post-glossectomy speakers, the production of /s/ and /sh/ is often problematic. It is mainly caused by the tongue surgery, which changes tongue properties such as volume, motility, and symmetry, preventing the tongue from creating proper constrictions. The purpose of this study was to gain some insights on how the vocal tracts of abnormal /s/ and /sh/ are shaped and what are their corresponding acoustic consequences. Based on cine magnetic resonance images, we built 3-D vocal tract models for /s/ and /sh/ from two post-glossectomy speakers (one with abnormal /s/ and the other with abnormal /sh/). Due to the missing part of the tongue, the reconstructed vocal tracts are asymmetric with either an air-flow bypass or a side branch formed near the constrictions. Two coupled physics submodels are included in the 3-D FEM acoustic simulation: incompressible potential flow for the mean air flow and aeroacoustics for the distributed noise sources. The resulting acoustic spectra and acoustic roles of air flow bypass or side branch will be discussed. [This study was supported by NIH R01CA133015.].
    The Journal of the Acoustical Society of America 05/2013; 133(5):3606. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the deformation of the tongue during human speech is important for head and neck surgeons and speech and language scientists. Tagged magnetic resonance (MR) imaging can be used to image 2D motion, and data from multiple image planes can be combined via post-processing to yield estimates of 3D motion. However, lacking boundary information, this approach suffers from inaccurate estimates near the tongue surface. This paper describes a method that combines two sources of information to yield improved estimation of 3D tongue motion. The method uses the harmonic phase (HARP) algorithm to extract motion from tags and diffeomorphic demons to provide surface deformation. It then uses an incompressible deformation estimation algorithm to incorporate both sources of displacement information to form an estimate of the 3D whole tongue motion. Experimental results show that use of combined information improves motion estimation near the tongue surface, a problem that has previously been reported as problematic in HARP analysis, while preserving accurate internal motion estimates. Results on both normal and abnormal tongue motions are shown.
    Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention. 01/2013; 16(Pt 3):41-8.
  • [Show abstract] [Hide abstract]
    ABSTRACT: One of the issues in speech motor control is the nature of variance that occurs in the production of a single speech task spoken by multiple speakers. Do they all essentially use the same gesture modified by fine tuning to adjust to their own anatomy, dialect, etc, or are there quite different, motor equivalent, ways to produce the same speech sound. Production of /s/ in American English is known to be produced using two methods: apical or laminal. Apical /s/ primarily elevates the tongue tip, while laminal /s/ utilizes the tip and blade. Both gestures are found frequently in normal speakers. The present study uses principal components analysis of midsagittal velocity fields to identify the patterns of variance in the internal tongue motion patterns of 10 normal speakers. Palate height will also be examined, as preliminary evidence points to low-palate speakers having a preference for apical /s/, while high-palate speakers use either. Tagged-MRI was used to record 'a geese', and the motion between /g/ and /s/ studied for amount and directions of variance. The goal is to identify stable features of the motion and the effects of /s/ type and palate height.
    The Journal of the Acoustical Society of America 09/2012; 132(3):2087. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Magnetic resonance imaging has been widely used in speech production for vocal tract reconstruction and modeling. In order to observe detailed structures in the vocal tract, three orthogonal image stacks (sagittal, coronal, and axial) are usually acquired. Due to many constraints, each stack typically has an in-plane resolution which is much better than the out-of-plane resolution. Usually vocal tract modeling is based on just one of these three stacks. As a result, additional useful information revealed by the other two datasets is excluded in the vocal tract model. This study is to improve the vocal tract reconstruction and modeling by integrating information from all of the three stacks. To do so, a super-resolution reconstruction method recently developed to generate an isotropic image volume is used to integrate the three orthogonal stacks. Based on the ATR MRI database of vowel production, vocal tract models from MR images in high resolution, low resolution (simulated through downsampling), and super-resolution were built respectively and compared. The improvement in vocal tract modeling due to the super-resolution technique will be demonstrated on five vowels in terms of visualization and acoustic responses. [This research was supported by NIH R01 CA133015.].
    The Journal of the Acoustical Society of America 09/2012; 132(3):2088. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Glossectomy is a surgical procedure to remove the cancerous tumor of the tongue. After the glossectomy, the tongue is sutured closed or a flap is inserted to reconstruct the tongue volume. As a result, the properties of the tongue are more or less affected by the surgery. The changes in the tongue properties may also affect the speech production abilities of the post-glossectomy speaker. This study examined the production of the fricative consonants /s/ and /sh/ in normal and post-glossectomy speakers. The data analyzed consisted of audio and magnetic resonance images from dozens of normal and glossectomy speakers. An acoustic analysis showed that the average centers of gravity of /s/ and /sh/ in glossectomy speakers are significantly lower than in normals. This difference may be explained by a more posterior constriction in glossectomees due to the surgery. Examination of the tongue shapes in midsagittal MR images showed that they tend to have more laminal /s/ than apical /s/. 3-D vocal tracts of /s/ and /sh/ were reconstructed for three glossectomy speakers whose /s/ and /sh/ cannot easily be discriminated in listening tests. Details of the 3-D vocal tract shapes, along with their acoustic implications, will be discussed for the glossectomy and normal speakers. [This research was supported by NIH R01 CA133015.].
    The Journal of the Acoustical Society of America 09/2012; 132(3):2089. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recent estimates suggest that 34,000 people are diagnosed with oral cancer each year. The lateral border of the tongue is one of the most common sites for lingual cancer and surgery resections both muscles and nerves leading to the tongue tip. One sound that is typically impaired is /s/ as it requires precise tongue shape, location and palatal contact, and small errors are acoustically salient. This study uses Principal Components Analysis (PCA) to compare motion patterns of the internal tongue during the word 'geese'. The study will compare 4 subjects: one glossectomy patient and a matched control who produce an apical /s/, and another pair that produces a laminal /s/. A PCA will be run for all subjects and the principal patterns of variance will be determined. These patterns will be used to identify stable features of the motion and variations due to /s/-type and patient vs control. The complexity of each subject's motion pattern will be studied to determine whether the patients have more variability in their motion due to strategies of motor adaptation, or less variability due to scarring and increased rigidity.
    The Journal of the Acoustical Society of America 09/2012; 132(3):2090. · 1.65 Impact Factor

Publication Stats

309 Citations
61.12 Total Impact Points


  • 2002–2014
    • University of Maryland, Baltimore
      • • Department of Neural and Pain Sciences
      • • Department of Orthodontics
      Baltimore, Maryland, United States
  • 2012
    • University of Maryland, College Park
      • Department of Electrical & Computer Engineering
      College Park, MD, United States
  • 2005–2009
    • Johns Hopkins University
      • • Department of Cognitive Science
      • • Department of Electrical and Computer Engineering
      Baltimore, Maryland, United States
    • University of Delaware
      • Department of Computer and Information Sciences
      Newark, DE, United States
  • 2008
    • Institute of Electrical and Electronics Engineers
      Washington, Washington, D.C., United States
  • 2007
    • Indiana University Bloomington
      • Department of Psychological and Brain Sciences
      Bloomington, IN, United States
  • 2006
    • Japan Advanced Institute of Science and Technology
      KMQ, Ishikawa, Japan