The fifth hit for the query idea is with extra 2.5 seconds on both sides and a user-defined region to play the relevant portion of the audio

The fifth hit for the query idea is with extra 2.5 seconds on both sides and a user-defined region to play the relevant portion of the audio

Source publication
Article
Full-text available
In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss dif...

Context in source publication

Context 1
... feature allows users to select the exact length of the audio clip to be played, thus making it possible in cases of more severe misalignment to play audio outside the maximum window provided by the audio control buttons. Figure 3 displays the wave form of the audio for the 5th hit of the query result after adding 2.5 seconds on both sides of the -slightly faulty -timestamps for the query hit. In addition, a part of the wave form has been selected using 'click and drag'; this region corresponds to the exact position in the audio where the words idea is are uttered. ...

Similar publications

Article
Full-text available
Mandarin tone 3 sandhi is a phonological alternation in which the initial tone 3 (i.e., low tone) syllable changes to a tone 2 (i.e., rising tone) when followed by another tone 3. The present study used a cross-modal syllable-morpheme matching experiment to examine how native speakers process the sandhi sequences derived from verb reduplication and...

Citations

... It consists of both monologues and dialogues from different speech genres of a number of British English varieties, and contains about 7.5 million words. We extracted the data via its web interface (Hoffmann & Arndt-Lappe, 2021;Hoffmann & Evert, 2018). The QuakeBox corpus (Walsh et al., 2013) consists of mainly monologues spoken by inhabitants of Christchurch, New Zealand, who tell the interviewer about their experiences surrounding the 2010-2011 Canterbury earthquakes. ...
Article
Full-text available
Morphological segmentability, i.e., the degree to which complex words can be decomposed into their morphological constituents, has been considered an important factor in research on morphological processing and is expected to affect acoustic duration (e.g., Hay, 2001, 2003). One way of operationalizing segmentability is through the relative frequency of a complex word to its base word. However, relative frequency has failed to affect duration for different affix categories in many previous studies. One potential reason is the fact that complex words vary in their prosodic structure, depending on the prosodic integration of the affix (Plag & Ben Hedia, 2018). In a large corpus study with three different corpora and eight affixes each, we investigate how prosodic word structure and relative frequency influence duration, and how these two factors interact. We find that prosodic structure does not significantly interact with relative frequency. Second, we show that relative frequency effects on duration do not emerge consistently across a large number of affixes. Third, not only does prosodic word structure not explain the absence of relative frequency effects, it also often cannot account for durational differences as such. We discuss these findings in light of phonological theory and speech production models.
Chapter
Corpus linguistics continues to be a vibrant methodology applied across highly diverse fields of research in the language sciences. With the current steep rise in corpus sizes, computational power, statistical literacy and multi-purpose software tools, and inspired by neighbouring disciplines, approaches have diversified to an extent that calls for an intensification of the accompanying critical debate. Bringing together a team of leading experts, this book follows a unique design, comparing advanced methods and approaches current in corpus linguistics, to stimulate reflective evaluation and discussion. Each chapter explores the strengths and weaknesses of different datasets and techniques, presenting a case study and allowing readers to gauge methodological options in practice. Contributions also provide suggestions for further reading, and data and analysis scripts are included in an online appendix. This is an important and timely volume, and will be essential reading for any linguist interested in corpus-linguistic approaches to variation and change.
Article
Full-text available
This article aims to describe key challenges of preparing and releasing audio material for spoken data and to propose solutions to these challenges. We draw on our experience of compiling the new London-Lund Corpus 2 (LLC-2), where transcripts are released together with the audio files. However, making the audio material publicly available required careful consideration of how to, most effectively, 1) align the transcripts with the audio and 2) anonymise personal information in the recordings. First, audio-to-text alignment was solved through the insertion of timestamps in front of speaker turns in the transcription stage, which, as we show in the article, may later be used as a valuable complement to more robust automatic segmentation. Second, anonymisation was done by means of a Praat script, which replaced all personal information with a sound that made the lexical information incomprehensible but retained the prosodic characteristics. The public release of the LLC-2 audio material is a valuable feature of the corpus that allows users to extend the corpus data relative to their own research interests and, thus, broaden the scope of corpus linguistics. To illustrate this, we present three studies that have successfully used the LLC-2 audio material.