PresentationPDF Available

Abstract and Figures

Language arranges itself along a continuous line, either in time (speech) or in space (text). As human working memory is finite and presumably limited to approx. 4 units of information (Cowan 2001), it is largely uncontested by now that language processing must proceed in some kind of units or chunks (see Christiansen and Chater 2016). What these online chunks are, is a more debatable issue. We report on an experiment aiming at investigating language segmentation into large high-level units, or chunking. Participants (45 students of the University of Helsinki, 32 females, range: 20-39 years old) with no previous linguistic training listened to short (approx. 30 seconds) audio extracts of authentic data while simultaneously marking what they intuitively felt were natural boundaries on broad orthographic transcripts (see Vetchinnikova et al. 2017). The perceived boundaries resulted in ‘chunks’. The experiment was carried out via a web-based application presented on a tablet. Participants marked boundaries by tapping an interactive tilda sign shown between all orthographic words in the transcript; they had the option of removing their mark by tapping the symbol again. To assess inter-participant agreement in chunking behaviour, we first used a conventional measure of inter-rater reliability, i.e. proportion of all pairs of individual participants who agreed on the presence (1) or absence (0) of a boundary. We computed this proportion for every space between two orthographic words: the mean value across them was treated as the overall measure of agreement. Values close to 0 indicated weak agreement, while values close to 1 indicated strong agreement. Since this measure does not account for chance agreement, we also calculated another measure of inter-rater reliability, Fleiss’ kappa, which corrects the measure of agreement described above for chance agreement. The total number of possible boundaries in our dataset is equal to 4692. The average number of boundaries one participant marked amounted to 455 (SD = 211.2). The agreement according to the first measure was 0.9, indicating strong agreement in chunking behaviour across participants. This value reduced to 0.45 for the Fleiss’ kappa, indicating moderate agreement. We hypothesised that this reduction is due to highly unequal distributions in 0 and 1 in the dataset. Given that the differences in proportion measure and Fleiss’ kappa can be explained by unequal distribution of boundaries and non-boundaries in the data (agreement on a non-boundary occurs much more often than agreement on a boundary), we claim that the online chunking task used in the study allows us to capture consensus in segmentation strategies.
Content may be subject to copyright.
UK Cognitive Linguistics conference,
Consensus in an intuitive
chunking task
Dobrego A., Konina A.,
Vetchinnikova S., Williams N.,
Mauranen A.
(Cutler & Norris 1988)
(Christiansen & Chater 2016)
Spontaneous speech ?
Speech is processed in chunks
? Do you segment the speech into smaller parts while listening?
Probably yes, due to working memory limitations ~ 4 units*
*Cowan 2001
How can we measure segmentation choices?
Do we agree on these segments?
Our study
to investigate speech segmentation
into large high-level units
Our study
Aim: to investigate speech segmentation
into large high-level units
45 speakers of English
66 short audio extracts
Method: intuitive chunking task
32 females, aged 20-39
no prior knowledge of linguistics
Our study
Aim: to investigate speech segmentation
into large high-level units
45 speakers of English
66 short audio extracts
Method: intuitive chunking task
ELFA corpus*
20-40 seconds each
*Mauranen 2008
Our study
Aim: to investigate speech segmentation
into large high-level units
45 speakers of English
66 short audio extracts
Method: intuitive chunking task
Listen and see the text
Mark boundaries by tapping the screen
Background/feedback questionnaire
Yes/no comprehension questions
I’d ~like ~to ~point ~out ~that ~er ~er ~the ~profiles ~user
~profiles ~and ~the ~reasons ~for ~communicating ~are ~ a ~
little ~bit ~different ~er ~in ~the ~Honiara ~internet ~café ~
and ~the ~nowadays ~internet ~centre ~than ~what ~they ~
are ~in ~the ~rural ~ e-mail ~stations ~but ~the ~main ~
thing ~is ~to ~communicate ~with ~family ~members ~
relatives ~mhm ~mhm ~the ~er ~family ~and ~kinship ~it’s ~
still ~central ~thing ~in ~the ~Solomon ~Islands ~societies ~
in ~all ~of ~them ~all ~these ~communities ~are ~very ~
social ~and ~very ~communal ~there ~ isn’t ~ much ~privacy ~
there ~ isn’t ~
The output
Between every two words:
If marked -> 1 -> boundary
If unmarked -> 0 -> non-boundary
Analysis 1
*Landis and Koch 1977
Number of raters in agreement
as a proportion of total possible
number of raters
0.904 => strong*
Perhaps it’s random?
*Landis and Koch 1977
Analysis 2
*Landis and Koch 1977
The difference between the observed and
chance agreement divided by the agreement
attainable above chance
Fleiss’ Kappa (κ)
κ = 0.45 => moderate*
*Landis and Koch 1977
95% CI [0.451, 0.453]
Skewed distribution?*
Solution: compare to null distribution
* Feinstein & Cicchetti 1990
Konina et al forthc.
Finnish κ = 0.43
Swedish κ = 0.41
Russian κ = 0.40
Therefore, our method captures consensus
in different languages
INSTR κ = 0.55
NORM κ = 0.54
N OF SPEAKERS: κ = 0.53
Therefore, our method captures consensus
in different conditions
Dobrego et al forthc.
Konina et al forthc.
examples of κ?
American English
κ = 0.51
Indian English, κ = 0.23
American English, κ = 0.43
Aim: to look at prosodic boundaries
in different cohorts of annotators*
*Cole et al 2017
Ventsov & Kasevich, 1994
Paper and pencil
working on a tablet is much faster
-> a better window into online processing
Cole, Mahrt, & Roy 2017
Language Markup and Experimental Design Software
aims to elicit naive prosodic analysis
-> instead, our method captures natural process
We can operationalize speech segmentation
through agreement on segmentation choices
made by listeners
We can use the intuitive chunking task for this
The values for each language are quite close to
each other, but still vary across the sample –
perhaps language-specific differences?
Further directions…
Explore intuitive chunking on L1 / L2 speakers
Explore other ways to calculate agreement
Explore individual differences in several languages
Thank you!
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.