PresentationPDF Available

Habitual aspect as a property of text spans

Authors:

Abstract

The Oceanic languages of Melanesia are generally small, low-resource languages, of which very little primary data is available. For our study on tense, aspect, and modality (TAM), we have access to richly annotated corpora from seven endangered Oceanic languages. In this paper, we describe some of the methods we used to investigate the category of habitual aspect in these languages. We show that information can be recovered from the English translations and from metadata on genres. For a more in-depth study, we relied on clause- based tags labelling clause type, tense, aspect, mood, and polarity. The process of tagging aspect, in particular, revealed the theoretically and practically important fact that habituality is sometimes a property of larger spans of texts (Carlson and Spejewski, 1997), rather than just a property of clauses, and can combine with more specific clause-level aspect.
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Habitual aspect as a property of text spans
Annika Tjuka, Lena Weißmann, and Kilu von Prince
May 19th, 2019
The 6th LTC Workshop on Less-Resourced Languages
1 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Outline
The MelaTAMP project
Data
Method
Results of Inter-Annotator Agreement
Habituality
Conclusion
2 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
The MelaTAMP project
3 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
The MelaTAMP project
Saliba-Logea
Mavea
Daakaka
Dalkalaen
Daakie
Nafsan
North Ambrym
Figure 1: Subject languages of the MelaTAMP project.
4 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
The MelaTAMP project
comparative research
based on corpus data
investigation of modality, aspect, tense, and polarity (TAMP)
in Oceanic languages
The focus of this talk is on our study of habitual aspect.
5 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Habituals in Oceanic
(1) a. I teach at college.
b. lions eat meat
Question: How do Oceanic languages express habitual aspect?
primarily through reduplication or
imperfective aspect marking or
a combination of both;
sometimes not at all. (von Prince et al., 2019)
6 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Habituals in Oceanic
(1) a. I teach at college.
b. lions eat meat
Question: How do Oceanic languages express habitual aspect?
primarily through reduplication or
imperfective aspect marking or
a combination of both;
sometimes not at all. (von Prince et al., 2019)
6 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Examples
(2) Ka
ASR
ya=p
3PL=POT
du
stay
es-esi
REDUP-see
ngok
2SG
teenem
home
a
and
nye
LSG
ka
ASR
na
1SG
w=i
POT=COP
ten
assigned.to
dóór
dark.bush
kyun.
just
“[People] shall see you in the village and I, I will go to the
bush.” (Daakaka: 1347)
(3) me
FUT
ro
then
nno
2SG
me
FUT
ko-lo-suruvu
2SG-IMPF-sleep
atano
ground
na
but
me
FUT
ko-lo-taua
2SG-IMPF-put
otoli
egg
na
LOC
atano
ground
“You will sleep on the ground, and you will lay eggs on the
ground.” (Mavea 06016.060)
7 / 28
MelaTAMP project Data Method Inter-annotator agreement Habituality Conclusion
Data
8 / 28
MelaTAMP project Data Method Inter-annotator agreement Habituality Conclusion
Corpora
Corpora of the following languages were considered in this
study: Daakaka, Dalkalaen, Mavea, Nafsan, Saliba-Logea
Texts were recorded during eldwork sessions with speakers of
the respective language.
Annotation includes morpheme-by-morpheme glosses,
part-of-speech-tags, translations into English, metadata on
speakers, text genre, and the circumstances of the recording
9 / 28
MelaTAMP project Data Method Inter-annotator agreement Habituality Conclusion
Identifying comparable Texts
1. Translation-based searches for keywords: used to/ would/
always/ usually/ often
2. Genre-based searches with text-level data: “explanation
nature”, “story”
3. Biographic and historic accounts as a sources for past
habituals
Result: A list of 26 similar stories, plots and tropes with
signicant overlap between corpora.
10 / 28
MelaTAMP project Data Method Inter-annotator agreement Habituality Conclusion
Identifying comparable Texts
1. Translation-based searches for keywords: used to/ would/
always/ usually/ often
2. Genre-based searches with text-level data: “explanation
nature”, “story”
3. Biographic and historic accounts as a sources for past
habituals
Result: A list of 26 similar stories, plots and tropes with
signicant overlap between corpora.
10 / 28
MelaTAMP project Data Method Inter-annotator agreement Habituality Conclusion
Overview
Total Tagged
Language #Texts #Tok. #Texts #Clauses
Daakaka 119 68k 5 141
Dalkalaen 114 34k 6 658
Mavea 61 45k 3 634
Nafsan 110 65k 6 363
Saliba-Logea 214 150k* 6 157
Total 618 362k 26 1953
Table 1: Corpora included in this study; Tok: tokens; tag.: tagged; *of
the 150k tokens in this corpus, about 70k are fully annotated.
11 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Method
12 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Segmentation
Prioritizing of the comparable subcorporus (26 texts)
Segmentation of the texts into annotation units, which often
correspond to sentences
Further subdivision of these units into clauses for TAMP
annotation (1953 clauses in total)
13 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Tagging
Category Name Tags
Clause type clause assertion, question, directive;
embedded: proposition,
conditional, e.question, temporal,
adverbial, attributive
Temporal domain time past, future, present
Modal domain mood factual, counterfactual, possible
Aspectual domain event bounded, ongoing, repeated,
stative
Polarity polarity positive, negative
Table 2: Tag set of the MelaTAMP project, see
https://wikis.hu-berlin.de/melatamp/Main_page.
14 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Tags in the Aspectual Domain
For our study on habituals, the aspectual category repeated
was especially relevant.
(4) ya=t
3P=DIST
du
stay
tiye
kill
barar
pig
tevy-an
side.of-3S.POSS
na
COMP
ya=t
3P=DIST
du
stay
se
hook
tóó
wild.cane
“They used to kill a pig for the occasion of
spear-throwing.” (Daakaka: 5210)
The same tag also covers iterative contexts (The children
were bouncing up and down on.)
15 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Tagging Example
(5) ya=t
3P=DIST
du
stay
tiye
kill
barar
pig
tevy-an
side.of-3S.POSS
na
COMP
ya=t
3P=DIST
du
stay
se
hook
tóó
wild.cane
“They used to kill a pig for the occasion of spear-throwing.
(Daakaka: 5210)
clause: assertion
time: past
mood: factual
event: repeated
polarity: positive
16 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Results of Inter-Annotator Agreement
17 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Results in each Category
Figure 2: Percentages of total inter-annotator consistencies (yellow) and
inconsistencies (green) in each TAM category of the tag set.
18 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Inter-Annotator Agreement Score for each Category
Polarity: κ= 0.91
Mood: κ= 0.86
Clause: κ= 0.85
Time: κ= 0.85
Event: κ=0.79
19 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Results in the Event Category
Figure 3: Percentages of total inter-annotator consistencies (yellow) and
inconsistencies (green) in each tag of the event category.
20 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Results in the Event Category
Inter-annotator consistency was particularly low for the
repeated aspectual tag.
The main reason for this is that habituality can be a property
of passages, which combines with a variety of clause-level
aspects.
21 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Habituality
22 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Observations
A habitual narrative is described in consecutive sentences.
Individual clauses within the passage dier with respect to
their local aspectual values.
(6) a. hinage
also
ta
1INCL.SBJ
dup-paisowa
DUP-work
“we work hard too” (Saliba: Tautolowaiya_01AG_0048)
b. kamna-da
feeling-1INCL.POSS
te
near.SP
se
3PL.SBJ
yababa
bad
“and we feel tired” (Saliba: Tautolowaiya_01AG_0049)
23 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Observations
A habitual narrative is described in consecutive sentences.
Individual clauses within the passage dier with respect to
their local aspectual values.
(6) a. hinage
also
ta
1INCL.SBJ
dup-paisowa
DUP-work
“we work hard too” (Saliba: Tautolowaiya_01AG_0048)
b. kamna-da
feeling-1INCL.POSS
te
near.SP
se
3PL.SBJ
yababa
bad
“and we feel tired” (Saliba: Tautolowaiya_01AG_0049)
23 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Habituality as a Property of Text Passages
(7) a. My grandmother used to bake the most wonderful pies
every Saturday.
b. She went to the orchard on Shady Lane early in the
morning.
b’. The alarm clock would have gone o at 6 a.m.
c. She then would pick a basket each of apples and
peaches.
c’. Cows would be in the orchard mooing at her.
(Carlson and Spejewski, 1997)
24 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Theoretical Implications
Habitual passages might just be one special case of a much
more general situation:
modal subordination,
sequence of tense,
reported speech,
and present-in-the-past.
Our experiences highlight that tense, aspect, and mood are
properties that apply at clause-level, but also, to some extent,
to larger passages, and that passage-level TAM values are
partially independent from clause-level TAM values.
25 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Theoretical Implications
Habitual passages might just be one special case of a much
more general situation:
modal subordination,
sequence of tense,
reported speech,
and present-in-the-past.
Our experiences highlight that tense, aspect, and mood are
properties that apply at clause-level, but also, to some extent,
to larger passages, and that passage-level TAM values are
partially independent from clause-level TAM values.
25 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Implications for Tagging Aspect Categories
Passage-wide properties need to be considered depending on
the intended scope and degree of granularity.
26 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Conclusion
27 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Conclusion
Tagging TAM categories gives insights into the use of
aspectual categories for specic text genres.
Dierence between passage-level aspect and clause-level
aspect can be found by comparing the annotation of
comparable texts.
Habituality can span over sequences of several clauses.
Thank you!
28 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Conclusion
Tagging TAM categories gives insights into the use of
aspectual categories for specic text genres.
Dierence between passage-level aspect and clause-level
aspect can be found by comparing the annotation of
comparable texts.
Habituality can span over sequences of several clauses.
Thank you!
28 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Conclusion
Tagging TAM categories gives insights into the use of
aspectual categories for specic text genres.
Dierence between passage-level aspect and clause-level
aspect can be found by comparing the annotation of
comparable texts.
Habituality can span over sequences of several clauses.
Thank you!
28 / 28
MelaTAMP project Data Metho d Inter-annotator agreement Habituality Conclusion
Conclusion
Tagging TAM categories gives insights into the use of
aspectual categories for specic text genres.
Dierence between passage-level aspect and clause-level
aspect can be found by comparing the annotation of
comparable texts.
Habituality can span over sequences of several clauses.
Thank you!
28 / 28
Appendix
References
Carletta, Jean, 1996. Assessing agreement on classication tasks: the kappa statistic. Computational
linguistics, 22(2):249–254.
Carlson, Greg N. and Beverly Spejewski, 1997. Generic passages. Natural Language Semantics, 5(2):101.
Dickey, Stephen M., 2000. Parameters of Slavic aspect: A cognitive approach. Center for the Study of
Language and Information.
Druskat, Stephan, 2018. ToolboxTextModules (Version 1.1.0).
Franjieh, Michael, 2013. A documentation of North Ambrym, a language of Vanuatu. London: SOAS,
ELAR.
Guérin, Valérie, 2006. Documentation of Mavea. London: SOAS, ELAR.
Guérin, Valérie, 2011. A grammar of Mav ̋ea: An Oceanic language of Vanuatu. Honolulu: University of
Hawai’i Press.
Klecha, Peter, 2011. Optional and obligatory modal subordination. In Proceedings of Sinn und Bedeutung,
volume 15.
Krause, Thomas and Amir Zeldes, 2016. ANNIS3: A new architecture for generic corpus query and
visualization. Digital Scholarship in the Humanities, 31(1):118–139.
Krifka, Manfred, 2013. Daakie, The Language Archive. Nijmegen: MPI for Psycholinguistics.
Margetts, Anna, Andrew Margetts, and Carmen Dawuda, 2017. Saliba/Logea. The Language Archive.
MelaTAMP, 2017. Primary data repository – MelaTAMP. https://wikis.hu-berlin.de/melatamp.
von Prince, Kilu, 2013a. Daakaka, The Language Archive. Nijmegen: MPI for Psycholinguistics.
von Prince, Kilu, 2013b. Dalkalaen, The Language Archive. Nijmegen: MPI for Psycholinguistics.
von Prince, Kilu, Ana Krajinovic, Manfred Krifka, Valérie Guérin, and Michael Franjieh, 2018. Mapping
Irreality: Storyboards for Eliciting TAM contexts. In Anja Gattnar, Robin Hörnig, and Melanie Störzer
(eds.), Proceedings of Linguistic Evidence 2018.
von Prince, Kilu and Anna Margetts, to appear. Expressing possibility in Saliba-Logea and Daakaka.
Studies in Language.
Roberts, Craige, 1989. Modal subordination and pronominal anaphora in discourse. Linguistics and
Philosophy, 12:683–721.
Thieberger, Nick, 2006. Dictionary and texts in South Efate. Digital collection managed by PARADISEC.
1 / 1
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper is concerned with the data structures, properties of query languages and visualization facilities required for the generic representation of richly annotated, heterogeneous linguistic corpora. We propose that above and beyond a general graph based data-model, which is becoming increasingly popular in many complex annotation formats, a well-defined concept of multiple, potentially conflicting segmentation layers must be introduced to deal with different sources and applications of corpus data flexibly. We also propose a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data. We offer an implementation and evaluation of our architecture in ANNIS3, an open source browser-based architecture for corpus search and visualization. We present three case studies to test the coverage of the system, encompassing core linguistic and digital humanities use-cases including richly annotated newspaper treebanks, multilingual diplomatic and normalized manuscript materials edited in TEI, and analysis of multimodal recordings of spoken language.
Book
Spoken on Mavea Island by approximately 32 people, Mavea is an endangered Oceanic language of Vanuatu. This work provides grammatical descriptions of this hitherto undescribed language. Fourteen chapters, containing more than 1,400 examples, cover topics in the phonology and morphosyntax of Mavea, with an emphasis on the latter. Of particular interest are examples of individual speaker variation presented throughout the grammar; the presence of three linguo-labials (still used today by a single speaker) that were unexpectedly found before the rounded vowel /o/; and a chapter on numerals and the counting system, which have long been replaced by Bislama's but are remembered by a handful of speakers. Most of the grammatical descriptions derive from a corpus of texts of various genres (conversations, traditional stories, personal histories, etc.) gathered during the author's fieldwork, conducted for eleven months between 2005 and 2007.
Article
Spoken on Mavea Island by approximately 32 people, Mavea is an endangered Oceanic language of Vanuatu. This work provides grammatical descriptions of this hitherto undescribed language. Fourteen chapters, containing more than 1,400 examples, cover topics in the phonology and morphosyntax of Mavea, with an emphasis on the latter. Of particular interest are examples of individual speaker variation presented throughout the grammar; the presence of three linguo-labials (still used today by a single speaker) that were unexpectedly found before the rounded vowel /o/; and a chapter on numerals and the counting system, which have long been replaced by Bislama's but are remembered by a handful of speakers. Most of the grammatical descriptions derive from a corpus of texts of various genres (conversations, traditional stories, personal histories, etc.) gathered during the author's fieldwork, conducted for eleven months between 2005 and 2007.
Article
This paper examines a type of discourse structure we here call ‘generic passages’. We argue that generic passages should be analyzed as sequences of generic sentences, each sentence containing its own GEN operator (Krifka et al. 1995). The GEN operators produce tripartite matrix/restrictor structures; the main discourse connection among the sentences is that the restrictor produced by each sentence in the sequence has as its contents the information in the matrix produced by the previous sentence in the discourse. We also argue that an identity of reference times is required for this process to occur. In the end generic passages are a natural product of the interaction of generic operators in sentences with independently- established principles structuring ordinary extensional narrative.
Article
Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several different statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same difficulties and come up with a solution in the kappa statistic. We discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.
Article
A dictionary and texts in South Efate. Created in Shoebox, this is a work-in-progress that will be updated periodically. Shoebox/Toolbox requires settings files (TYP) that accompany the main data. Thus the file NT8-TEXT-DIC_[DATE].txt has the accompanying file NT8-TEXT-DICTYP_[DATE].txt. The collection of texts in NT8-TEXT-TX_[DATE].txt has the TYP file in NT8-TEXT-TXTYP_[DATE].txt.. NT8-TEXT-df.pdf NT8-TEXT-DICTYP_20070827.txt NT8-TEXT-DIC_20070827.txt NT8-TEXT-DIC_20091130.txt NT8-TEXT-TXT_20070827.txt NT8-TEXT-TXT_20091130.txt
Parameters of Slavic aspect: A cognitive approach. Center for the Study of Language and Information
  • Stephen M Dickey
Dickey, Stephen M., 2000. Parameters of Slavic aspect: A cognitive approach. Center for the Study of Language and Information.
ToolboxTextModules (Version 1.1.0)
  • Stephan Druskat
Druskat, Stephan, 2018. ToolboxTextModules (Version 1.1.0).
A documentation of North Ambrym
  • Michael Franjieh
Franjieh, Michael, 2013. A documentation of North Ambrym, a language of Vanuatu. London: SOAS, ELAR. Guérin, Valérie, 2006. Documentation of Mavea. London: SOAS, ELAR.