[Show abstract][Hide abstract] ABSTRACT: Magnetic resonance imaging (MRI) is an essential tool in the study of muscle anatomy and functional activity in the tongue. Objective assessment of similarities and differences in tongue structure and function has been performed using unnormalised data, but this is biased by the differences in size, shape and orientation of the structures. To remedy this, we propose a methodology to build a 3D vocal tract atlas based on structural MRI volumes from 20 normal subjects. We first constructed high-resolution volumes from three orthogonal stacks. We then removed extraneous data so that all 3D volumes contained the same anatomy. We used an unbiased diffeomorphic groupwise registration using a cross-correlation similarity metric. Principal component analysis was applied to the deformation fields to create a statistical model from the atlas. Various evaluations and applications were carried out to show the behaviour and utility of the atlas.
Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 01/2015; 3(1).
[Show abstract][Hide abstract] ABSTRACT: Multimodal image registration is a class of algorithms to find correspondence from different modalities. Since different modalities do not exhibit the same characteristics, finding accurate correspondence still remains a challenge. In order to deal with this, mutual information (MI) based registration has been a preferred choice as MI is based on the statistical relationship between both volumes to be registered. However, MI has some limitations. First, MI based registration often fails when there are local intensity variations in the volumes. Second, MI only considers the statistical intensity relationships between both volumes and ignores the spatial and geometric information about the voxel. In this work, we propose to address these limitations by incorporating spatial and geometric information via a 3D Harris operator. Specifically, we focus on the registration between a high-resolution image and a low-resolution image. The MI cost function is computed in the regions where there are large spatial variations such as corner or edge. In addition, the MI cost function is augmented with geometric information derived from the 3D Harris operator applied to the high-resolution image. The robustness and accuracy of the proposed method were demonstrated using experiments on synthetic and clinical data including the brain and the tongue. The proposed method provided accurate registration and yielded better performance over standard registration methods.
IEEE Transactions on Image Processing 12/2014; · 3.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Imaging and quantification of tongue anatomy is helpful in surgical planning, post-operative rehabilitation of tongue cancer patients, and studying of how humans adapt and learn new strategies for breathing, swallowing and speaking to compensate for changes in function caused by disease, medical interventions or aging. In vivo acquisition of high-resolution three-dimensional (3D) magnetic resonance (MR) images with clearly visible tongue muscles is currently not feasible because of breathing and involuntary swallowing motions that occur over lengthy imaging times. However, recent advances in image reconstruction now allow the generation of super-resolution 3D MR images from sets of orthogonal images, acquired at a high in-plane resolution and combined using super-resolution techniques. This paper presents, to the best of our knowledge, the first attempt towards automatic tongue muscle segmentation from MR images. We devised a database of ten super-resolution 3D MR images, in which the genioglossus and inferior longitudinalis tongue muscles were manually segmented and annotated with landmarks. We demonstrate the feasibility of segmenting the muscles of interest automatically by applying the landmark-based game-theoretic framework (GTF), where a landmark detector based on Haar-like features and an optimal assignment-based shape representation were integrated. The obtained segmentation results were validated against an independent manual segmentation performed by a second observer, as well as against B-splines and demons atlasing approaches. The segmentation performance resulted in mean Dice coefficients of 85.3%, 81.8%, 78.8% and 75.8% for the second observer, GTF, B-splines atlasing and demons atlasing, respectively. The obtained level of segmentation accuracy indicates that computerized tongue muscle segmentation may be used in surgical planning and treatment outcome analysis of tongue cancer patients, and in studies of normal subjects and subjects with speech and swallowing problems.
Medical Image Analysis 11/2014; 20(1). · 3.68 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The production of speech includes considerable variability in speech gestures despite our perception of very repeatable sounds. Variability is seen in vocal tract shapes and tongue contours when different speakers produce the same sound. This study asks whether internal tongue motion patterns for a specific sound are similar across subjects, or whether they indicate multiple gestures. There are two variants of the sound /s/, which may produce two gestures or may represent a multitude of gestures. The first goal of this paper was to quantify internal tongue differences between these allophones in normal speakers. The second goal was to test how these differences are affected by subjects expected to have different speech gestures: normal controls and subjects who have had tongue cancer surgery. The study used tagged MRI to capture midsagittal tongue motion patterns and principal components analyses to identify patterns of variability that define subject groups and /s/ types. Results showed no motion differences between apical and laminal controls in either the tongue tip or whole tongue. These results did not support unique tongue behaviours for apical and laminal /s/. The apical patients, however, differed from all other speakers and were quite uniform as a group. They had no elevation and considerable downward/backward motion of the tongue tip. This was consistent with difficulty in maintaining the tip–blade region at the proper distance from the palate.
Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 10/2014; 2(4).
[Show abstract][Hide abstract] ABSTRACT: This paper presents an early version of an open extendable research and educational platform to support users in learning and mastering the different types of rare-singing. The platform is interfaced with a portable helmet to synchronously capture multiple signals during singing in a non-laboratory environment. Collected signals reflect articulatory movements and induced vibrations. The platform consists of four main modules: i) a capture and recording module, ii) a data replay (post processing) module, iii) an acoustic auto adaptation
learning module, iv) and a 3D visualization sensory motor learning module. Our demo will focus on the first two modules.
The system has been tested on two rare endangered singing musical styles, the Corsican “Cantu in Paghjella”, and the Byzantine hymns from Mount Athos, Greece. The versatility of the approach is further demonstrated by capturing a contemporary singing style known as “Human Beat Box.”
[Show abstract][Hide abstract] ABSTRACT: Generic biomechanical models of the oral, pharyngeal and la-ryngeal structures have been adopted into the ArtiSynth simula-tion framework (www.artisynth.org). Forward-dynamics track-ing of FE model of the tongue was previously addressed through solving the inverse problem (Stavness, Lloyd, and Fels 2012). The estimated biomechanics were evaluated using either the av-erage motion reported in the literature or those of a different subject. We expand the existing generic platform to allow for subject-specific simulations, in order to (1) better evaluate the simulated biomechanics, (2) investigate the inter-subject vari-ability and (3) provide additional insight into the speech pro-duction.
10th International Conference in Speech Production; 05/2014
[Show abstract][Hide abstract] ABSTRACT: It is unclear in glossectomy whether a flap will improve or impair speech, at least in the moderate sized (T2) tongue tumors. To gain some insights into this question, we studied the speech production of a post-glossectomy speaker, who had a T2 tumor surgically removed from the left side of his tongue and, closed with a radial forearm free flap (RFFF). Our acoustic analysis showed that this speaker had a significantly smaller vowel space and a significantly higher center of gravity in "sh", but not in "s", compared with the averages of normal controls or post-glossectomy speakers with primary closures. Based on cine and tagged magnetic resonance (MR) images, we analyzed the vocal tract shapes of two vowels and two fricatives and studied the tongue motion in transition of phonemes on this speaker and two controls. We will compare the vocal tract models between the flap patient and the controls. [This study was supported by NIH R01CA133015.].
The Journal of the Acoustical Society of America 04/2014; 135(4):2195. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This study aims to ascertain the effects of tongue cancer surgery (glossectomy) on tongue motion during the speech sounds "s" and "sh." Subjects were one control and three glossectomies. The first patient had surgery closed with sutures. The second had sutures plus radiation, which produces fibrosis and stiffness. The third was closed with an external free flap, and is of particular interest since he has no direct motor control of the flap. Cine and tagged-MRI data were recorded in axial, coronal and sagittal orientations at 26 fps. 3D tissue point motion was tracked at every time-frame in the word. 3D displacement fields were calculated at each time-frame to show tissue motion during speech. A previous pilot study showed differences in "s" production [Pedersen et al., JASA (2013)]. Specifically, subjects differed in internal tongue motion pattern, and the flap patient had unusual genioglossus lengthening patterns. The "s" requires a midline tongue groove, which is challenging for the patients. This study continues that effort by adding the motion of "sh," because "sh" does not require a midline groove and may be easier for the patients to pronounce. We also add more muscles, to determine how they interact to produce successful motion. [This study was supported by NIH R01CA133015.].
The Journal of the Acoustical Society of America 04/2014; 135(4):2389. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Assessment of tongue muscle mechanics during speech helps interpret clinical observations and provides data that can predict optimal surgical outcomes. Magnetic resonance imaging (MRI) is a non-invasive method for imaging the tongue that provides information about anatomy and motion. In this work, we aim to develop a pipeline to track 4D (3D space with time) muscle mechanics in order to measure motion similarities and differences in normal and glossectomy speakers. The pipeline comprises of several modules including super-resolution volume reconstruction of high-resolution MRI (hMRI) and cine-MRI, deformable registration of hMRI with cine-MRI to establish muscle correspondences, tracking tissue points using incompressible motion estimation algorithm (IDEA) from tagged MRI, followed by calculation of muscle mechanics including displacement, rotation, elongation, etc. IDEA is the 3D motion estimated from harmonic phase (HARP) motion that is obtained from tagged MRI. The proposed pipeline was evaluated on five subjects including both normal and glossectomy speakers, yielding accurate tracking results as visually assessed. In addition, we were able to differentiate normal and abnormal muscle mechanics, potentially providing invaluable information for interpreting clinical observations and predicting surgical outcomes.
The Journal of the Acoustical Society of America 04/2014; 135(4):2196. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The human tongue produces oromotor behaviors such as speaking, swallowing, and breathing, which are executed by deforming local functional units using complex muscular array. Therefore, identifying functional units and understanding the mechanisms of coupling among them in relation to the underlying anatomical structures can aid significantly in the understanding of normal motor control and the development of clinical diagnoses and surgical procedures. Magnetic resonance imaging (MRI) has been widely used to observe detailed structures in the vocal tract and to measure internal tissue motion of the tongue. This work aims at determining the functional units from tagged MRI and muscle maps are extracted from high-resolution MRI. A non-negative matrix factorization method with a sparsity constraint is utilized to extract an activation map for each tissue point using a set of motion quantities extracted from tagged MRI including information from point trajectories (i.e., displacement, angle, and curvature) and strain. The activation map is then used to determine the coherent region using spectral clustering, revealing functional units and their relations to the underlying muscles. We test our algorithm on simple protrusion and speech tasks, demonstrating that the proposed algorithm can determine the correlated patterns of the tissue point tracking trajectories and strain.
The Journal of the Acoustical Society of America 04/2014; 135(4):2196. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: PURPOSE Accurate tissue motion tracking within the tongue can help to diagnose and treat vocal tract related disorders, evaluate speech quality before and after surgery, and conduct various scientific studies. We have compared tissue tracking results from four widely used deformable registration (DR) methods applied to Cine-MRI with harmonic phase (HARP)-based tracking applied to tagged-MRI. METHOD Ten subjects repeated the words "a geese" multiple times while sagittal images of the head were collected at 26 Hz, first in a tagged-MRI data set, and then in a Cine-MRI data set. HARP tracked the motion of eight specified tissue points in the tagged data set. Four DR methods including diffeomorphic demons and free-form deformations based on cubic B-spline with three different similarity measures were used to track the same eight points in the Cine-MRI data set. Individual points were tracked and length changes of several muscles were calculated using the DR and HARP based tracking methods. RESULTS Results showed that the DR tracking errors were non-systematic and varied in direction, amount, and timing across speakers and within speakers. Comparison of HARP and DR tracking with manual tracking showed better tracking results for HARP except at the tongue surface, where mistracking caused greater errors in HARP than DR. CONCLUSIONS Tissue point tracking using DR tracking methods contain non-systematic tracking errors within and across subjects, making it less successful than tagged-MRI tracking within the tongue. However, HARP sometimes mistracks points at the tongue surface of tagged MRI due to its limited bandpass filter and tag pattern fading, so that DR has better success measuring surface tissue points on Cine-MRI than HARP does. Therefore a hybrid method is being explored.
Journal of Speech Language and Hearing Research 02/2014; · 1.97 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Tongue motion during speech and swallowing involves synergies of locally deforming regions, or functional units. Motion clustering during tongue motion can be used to reveal the tongue's intrinsic functional organization. A novel matrix factorization and clustering method for tissues tracked using tagged magnetic resonance imaging (tMRI) is presented. Functional units are estimated using a graph-regularized sparse non-negative matrix factorization framework, learning latent building blocks and the corresponding weighting map from motion features derived from tissue displacements. Spectral clustering using the weighting map is then performed to determine the coherent regions--i.e., functional units--efined by the tongue motion. Two-dimensional image data is used to ver-fy that the proposed algorithm clusters the different types of images ac-urately. Three-dimensional tMRI data from five subjects carrying out simple non-speech/speech tasks are analyzed to show how the proposed approach defines a subject/task-specific functional parcellation of the tongue in localized regions.
[Show abstract][Hide abstract] ABSTRACT: Accurate segmentation is an important preprocessing step for measuring the internal deformation of the tongue during speech and swallowing using 3D dynamic MRI. In an MRI stack, manual segmentation of every 2D slice and time frame is time-consuming due to the large number of volumes captured over the entire task cycle. In this paper, we propose a semi-automatic segmentation workflow for processing 3D dynamic MRI of the tongue. The steps comprise seeding a few slices, seed propagation by deformable registration, random walker segmentation of the temporal stack of images and 3D super-resolution volumes. This method was validated on the tongue of two subjects carrying out the same speech task with multi-slice 2D dynamic cine-MR images obtained at three orthogonal orientations and 26 time frames. The resulting semi-automatic segmentations of 52 volumes showed an average dice similarity coefficient (DSC) score of 0.9 with reduced segmented volume variability compared to manual segmentations.
Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging 12/2013; 2013(8):1465-1468.
[Show abstract][Hide abstract] ABSTRACT: Measuring the internal muscular motion and deformation of the tongue during natural human speech is of high interest to head and neck surgeons and speech language pathologists. A pipeline for calculating 3D tongue motion from dynamic cine and tagged Magnetic Resonance (MR) images during speech has been developed. This paper presents the result of a complete analysis of eleven subjects' (seven normal controls and four glossectomy patients) global tongue motion during speech obtained through MR imaging and processed through the tongue motion analysis pipeline. The data is regularized into the same framework for comparison. A generalized two-step principal component analysis is used to show the major difference between patients' and controls' tongue motions. A test is performed to demonstrate the ability of this process to distinguish patient data from control data and to show the potential power of quantitative analysis that the tongue motion pipeline can achieve.
Proceedings / IEEE International Symposium on Biomedical Imaging: from nano to macro. IEEE International Symposium on Biomedical Imaging 12/2013; 2013:816-819.
[Show abstract][Hide abstract] ABSTRACT: This study examined a control subject and three patients who had surgery to remove tongue cancer. One patient's surgery was closed with sutures, one with a radial forearm free flap reconstruction, and one with sutures plus radiation. This study aims to ascertain the effects of these traumas on internal and surface tongue motion during speech. The flap consists of soft tissue that is vascularized, but not innervated; it increases tongue bulk, but has no direct motor control. The other two patients have missing tissue and a scar where the cut regions were sewn together. This morphological change may increase difficulty creating properly formed palatal contacts. The supplemental radiation treatment may cause additional muscle stiffness due to fibrosis. The cine and tagged datasets were recorded in axial, coronal, and sagittal orientations using identical parameters so their data could be overlaid. Each dataset was reconstructed into 3D volumes, one for each time-frame in the word. From each cine-MRI volume, the 3D tongue surface was segmented and used as a "mask" in the tagged-MRI volume. Inthe tagged-MRI volumes, 3D displacement fields were calculated to show motion of each tissue point inside the tongue mask during the speech task.
The Journal of the Acoustical Society of America 11/2013; 134(5):4168. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.
The Journal of the Acoustical Society of America 06/2013; 133(6):EL439-45. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Magnetic resonance imaging (MRI) is a widely used technology for non-invasive tongue imaging. MRI can detail tongue and muscle shapes and their variability in both healthy and diseased populations. Such detail can aid significantly in the interpretation of muscle interactions in the tongue, and their relation in normal and disordered speech production. However, the size or shape of the tongue and muscles may vary from one subject to another. In addition, there exists no comprehensive and systematic framework to assess the difference and variability of tongue and muscles in a normalized space. In the present work, we built a multi-subject atlas from 20 normal subjects that are acquired using structural MRI to offer a normalized space on which all subjects from a target population can be mapped and compared. In order to find accurate one-to-one correspondences, we bound the tongue so that each volume had the same vocal tract features. For registration, we utilize symmetric diffeomorphic image registration with cross-correlation, which is widely used in brain image analysis. The atlas facilitates a template-based segmentation in assigning anatomical labels in the images. The tongue atlas is unprecedented and opens new vistas for exploring normal and diseased oral structures and function.
The Journal of the Acoustical Society of America 05/2013; 133(5):3567. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Production of fricatives involves a narrow supraglottal constriction along the vocal tract. Air flows through the constriction, and generates turbulent noise source(s) by impinging on some obstacles downstream. In post-glossectomy speakers, the production of /s/ and /sh/ is often problematic. It is mainly caused by the tongue surgery, which changes tongue properties such as volume, motility, and symmetry, preventing the tongue from creating proper constrictions. The purpose of this study was to gain some insights on how the vocal tracts of abnormal /s/ and /sh/ are shaped and what are their corresponding acoustic consequences. Based on cine magnetic resonance images, we built 3-D vocal tract models for /s/ and /sh/ from two post-glossectomy speakers (one with abnormal /s/ and the other with abnormal /sh/). Due to the missing part of the tongue, the reconstructed vocal tracts are asymmetric with either an air-flow bypass or a side branch formed near the constrictions. Two coupled physics submodels are included in the 3-D FEM acoustic simulation: incompressible potential flow for the mean air flow and aeroacoustics for the distributed noise sources. The resulting acoustic spectra and acoustic roles of air flow bypass or side branch will be discussed. [This study was supported by NIH R01CA133015.].
The Journal of the Acoustical Society of America 05/2013; 133(5):3606. · 1.65 Impact Factor