ChapterPDF Available

Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project

Authors:
A preview of the PDF is not available
... The eSpeak system is multilingual, it allows to add new language by adding language module (Panjabi language [13], Yeddish [8], Japanese [15], Albanian citekastrati2014opportunity). The eSpeak system uses modular language data files, which are easy to understand text files. ...
Article
Text to speech (TTS) is a crucial tool needed in many domains, mainly for visually impaired users. The availability of TTS open sources improves access to computers and gives more valuable applications. eSpeak provides support for several languages. It is a tool that provides rules and phoneme files for more than 50 languages, besides, eSpeak is a light, fast, low memory consumption and used in multi-platforms. In this paper, we have explored the possibility to adapt the existing text to speech converters into Arabic language in eSpeak. We attempt to define new text to speech conversion rules, adapting existed phonemes and adding missing phonemes for Arabic under eSpeak. The contributions are quite significant; however, the software's developers will be able to integrated these enhancements within the new version, so that users who have problems with visual impairments or children with special needs will utilise this development of eSpeak. The availability of such support, open new fields to use Arabic in TTS environment, especially for blind persons.
... The speech produced by eSpeak is highly configurable, but is not as natural sounding as an unit-based synthesizer [16]. espeak is multilingual system, it allows the addition of new language by simply adding its module (Panjabi language [14], Yeddish [8], Japanese [16], Albanian [13]). The eSpeak system uses modular language data files, which are easy to understand their text files [16]. ...
Article
Text to speech (TTS) is a crucial tool needed in many domains, mainly for visually impaired users. The availability of TTS open sources improves access to computers and gives more valuable applications. eSpeak provides support for several languages. It is a tool that provides rules and phoneme files for more than 50 languages, besides, eSpeak is a light, fast, low memory consumption and used in multi-platforms. In this paper we have explored the possibility to adapt the existing text to speech converters into Arabic language in eSpeak. we attempt to define new text to speech conversion rules, adapting existed phonemes and adding missing phonemes for Arabic under eSpeak. The contributions are quite significant, however, the softwares developers will be able to integrated these enhancements within the new version, so that users who have problems with visual impairments or children with special needs will utilize this development of eSpeak. The availability of such support, open new fields to use arabic in TTS environment, especially for blind persons.
Preprint
Full-text available
Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of language, accent, dialect, variety, and speech impairment - and the intersectionality of speech features with socioeconomic and demographic features. Furthermore, there is often a lack of oversight on the underlying training data - commonly built on massive web-crawling and/or publicly available speech - with regard to the ethics of such data collection. To encourage standardized documentation of such speech data components, we introduce an augmented datasheet for speech datasets, which can be used in addition to "Datasheets for Datasets". We then exemplify the importance of each question in our augmented datasheet based on in-depth literature reviews of speech data used in domains such as machine learning, linguistics, and health. Finally, we encourage practitioners - ranging from dataset creators to researchers - to use our augmented datasheet to better define the scope, properties, and limits of speech datasets, while also encouraging consideration of data-subject protection and user community empowerment. Ethical dataset creation is not a one-size-fits-all process, but dataset creators can use our augmented datasheet to reflexively consider the social context of related SLT applications and data sources in order to foster more inclusive SLT products downstream.
Book
Full-text available
HTK is a toolkit for building Hidden Markov Models (HMMs). HMMs can be used to model any time series and the core of HTK is similarly general-purpose. However, HTK is primarily designed for building HMM-based speech processing tools, in particular recognisers. Thus, much of the infrastructure support in HTK is dedicated to this task. As shown in the picture above, there are two major processing stages involved. Firstly, the HTK training tools are used to estimate the parameters of a set of HMMs using training utterances and their associated transcriptions. Secondly, unknown utterances are transcribed using the HTK recognition tools.
Endangered language documentation: Bootstrapping a Chatino speech corpus, forced aligner, ASR
  • D Cavar
  • M Cavar
Cavar, D., Cavar, M., and Cruz, H. (2016a). Endangered language documentation: Bootstrapping a Chatino speech corpus, forced aligner, ASR. In Proceedings of LREC 2016. ELRA.
Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
  • D Cavar
  • M Cavar
Cavar, D., Cavar, M., and Moe, L. (2016b). Global Open Resources and Information for Language and Linguistic Analysis (GORILLA). In Proceedings of LREC 2016. ELRA. Creative Commons. (2016). Creative commons attribution-sharealike 4.0 international.
Surviving remnants of yiddish folksinging and creativity in contemporary ukraine
  • D.-B Kerler
Kerler, D.-B. (2014). Surviving remnants of yiddish folksinging and creativity in contemporary ukraine [in yiddish].
Annotation by category: ELAN and ISO DCR
  • H Sloetjes
  • P Wittenburg
Sloetjes, H. and Wittenburg, P. (2008). Annotation by category: ELAN and ISO DCR. In Nicoletta Calzolari, et al., editors, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May. European Language Resources Association (ELRA). http://www.lrecconf.org/proceedings/lrec2008/.
Elan: a professional framework for multimodality research
  • P Wittenburg
  • H Brugman
  • A Russel
  • A Klassmann
  • H Sloetjes
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., and Sloetjes, H. (2006). Elan: a professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pages 1556-1559.