Conference PaperPDF Available

I thought pointing is rude: A dialogue-semantic analysis of pointing at the addressee

Authors:

Abstract

A pilot corpus study on the use of pointing gestures in dialogue yielded 44 instances of pointing at the addressee. In none of these instances is the addressee the gesture's referent, however. Rather, such discourse pointings are bound up with dialogue management: they control the addressee's attention and her view of the status of these referents in the incrementally emergent context. We distinguish four classes of addressee pointings, descriptively glossed utterance anaphora, common ground, something's coming to mind, and grab turn. We exemplify each class by means of empirical data and provide a dialogue semantics analysis. In this way, we extend the taxonomy of uses of pointings currently discussed in semantics and argue that the linguistic competence revealed by discourse pointings is inherently dialogical, adding evidence for extending the domain of grammar from well-formedness and truth conditions to include micro-level elements of conversational interaction.
I thought pointing is rude: A dialogue-semantic analysis of pointing at the
addressee1
Jonathan GINZBURG — CNRS, Université de Paris, Laboratoire de Linguistique Formelle
Andy LÜCKING — CNRS, Université de Paris, Laboratoire de Linguistique Formelle; Text
Technology Lab, Goethe University Frankfurt
Abstract. A pilot corpus study on the use of pointing gestures in dialogue yielded 44 instances
of pointing at the addressee. In none of these instances is the addressee the gesture’s referent,
however. Rather, such discourse pointings are bound up with dialogue management: they con-
trol the addressee’s attention and her view of the status of these referents in the incrementally
emergent context. We distinguish four classes of addressee pointings, descriptively glossed
utterance anaphora,common ground,something’s coming to mind, and grab turn. We exem-
plify each class by means of empirical data and provide a dialogue semantics analysis. In this
way, we extend the taxonomy of uses of pointings currently discussed in semantics and argue
that the linguistic competence revealed by discourse pointings is inherently dialogical, adding
evidence for extending the domain of grammar from well-formedness and truth conditions to
include micro–level elements of conversational interaction.
Keywords: pointing, interactive gesture, discourse, common ground, utterance anaphora.
1. Introduction
That (manual) gestures belong in the grammar is by now well accepted (cf. Alahverdzhieva
et al., 2017; Fricke, 2012; Lücking, 2013 from the view of formal grammar theory and Kendon,
1980; McNeill, 1985 from the field of communication psychology). Of all types of manual
gestures, deictic or pointing gestures have received the most attention in semantics. This pre-
dominance is very likely due to the idea that pointing gestures are bound up with reference,
a key notion in semantics and pragmatics. The standard account in this respect is still the
incompleteness and direct reference view of Kaplan (1989), where pointing gestures act as
demonstratum-donating means for otherwise referentially deficient demonstratives.2Closer to
modern grammar-based approaches is the neo-Peirce-Wittgenstein-Quine view that both point-
ing gestures and their relation to language belong to the object language (Rieser, 2004).3These
1Many thanks to the insightful comments of three reviewers for the Gestures@SuB25 workshop, as well as to the
virtual but very alert and warm audience at the workshop. We acknowledge support by a public grant overseen
by the French National Research Agency (ANR) as part of the program “Investissements d’Avenir” (reference:
ANR-10-LABX-0083). It contributes to the IdEx Université de Paris — AN–1–IDE–0001. We also acknowledge
a senior fellowship from the Institut Universitaire de France to Ginzburg.
2While Kaplan (1978) developed a particular technical account, the general idea can be traced back throughout
the philosophical literature – consider, for instance, Frege (1918: 64): “In allen solchen Fällen [i.e., demonstrative
expressions; the authors] ist der bloße Wortlaut, wie er schriftlich festgehalten werden kann, nicht der vollständige
Ausdruck des Gedankens, sondern man bedarf zu dessen richtiger Auffassung noch der Kenntnis gewisser das
Sprechen begleitender Umstände, die dabei als Mittel des Gedankenausdrucks benutzt werden. Dazu können
auch Fingerzeige, Handbewegungen, Blicke gehören.” (In all such cases [i.e., demonstrative expressions; the
authors] the mere wording, as it can be recorded in writing, is not the complete expression of the thought, but for
its correct understanding one still needs the knowledge of certain circumstances accompanying the speech, which
are used thereby as means of the thought expression. To it also finger points, hand movements, looks can belong.)
3Most notably, Rieser developed the notion of region pointing, where a deictic gesture indicates the spatial position
of the value of a verbal reference marker. This view has been empirically tested and further developed by Lücking
et al. (2015). A framework according to which speech and gesture occupy different informational channels and
referent-identification accounts focus on concrete index-finger pointings such as that in (1):
(1) Can you jump over this spout?
c
A. Lücking
The demonstration act in (1) is concrete since the thing indicated (pointed at) and the thing
referred to (described) are identical.4However, this need not to be the case, as is evinced
in cases of deferred reference (Nunberg, 1993), which exploit a metonymic relation between
index and referent. An example is given in (2), taken from Clark (1996: 168), where the book
is indicated, but its author is referred to:
(2) That man was a friend of mine.
c
A. Lücking, book icon from T. Tantau (L
A
T
EXbeamer class)
Deferred reference still involves a concrete index. But pointing gestures can also be used in an
abstract way (McNeill et al., 1993). An example is given in (3), where the speaker virtually
draws a map in gesture space and points to virtual locations on this map.
(3) Then you do not exit here, but there. [Translated from German original]
Taken from dialogue V9, 6:56 (Lücking et al., 2010)
In formal semantics abstract deixis is modeled as a mathematical projection from gesture space
into the described situation (Lascarides and Stone, 2009: 408).
cohere in a spatial model has been formulated by Kühnlein (1999). Further evidence has been provided by Grosz
(2019) in terms of different kinds of pronoun-aligned pointing gestures (in addition to intonation).
4And since the gesture is produced in the context of a verbal demonstrative it contributes at-issue information
(Ebert and Ebert, 2016).
In sum, the received view on pointing gestures in semantics/pragmatics is that they can be used
in one of three ways:
(4) a. deictically (i.e., locating the semantic value of a discourse referent (DR) in the per-
ceptible environment); concrete deixis is prototypically affiliated to demonstratives
like “this” or “over there” in speech;
b. as spatial proxy (projection from gesture space to real world). Abstract deixis is well-
known from gestures studies where a location in gesture space represents a location
from the events talked about (Fricke, 2007);5
c. in deferred reference (exploiting a metonymical relation between demonstratum and
referent).
In what follows we argue that data from pointing gestures in natural interactions – most notably,
pointing at the addressee – force us to extend this semantic taxonomy. Moreover, in order to
provide a precise, semantic analysis of addressee pointing – as will become clear in its informal
discussion in Sec. 2 – a detailed model of utterance context is needed. Such a model has been
developed within KoS (Ginzburg, 2012), which is briefly introduced alongside some of its
logical underpinnings in Sec. 3. KoS provides the analytical means to spell out the semantic
significance of various forms of addressee pointing; the corresponding lexical entries are given
in Sec. 4. Understanding discourse pointing involves attention managing and how it marks the
grounding status of discourse referents in an incrementally unfolding conversational context.
Accordingly, a grammar that includes addressee pointings must be intrinsically dialogical – a
demand that probably applies to grammar in general (Ginzburg and Poesio, 2016).
Such an analysis is required all the more so because pointing at people is usually considered
to be rude: in many cultures one finds rules of etiquette along the lines of It is bad manners to
point at dressed people with naked fingers! For instance, the entry of the verb point in Harrap’s
English Dictionary explicitly mentions that “You mustn’t point at people like that” (Higgleton
and Seaton, 1996: 732). Accordingly, an explanation of the fact that discourse pointing at the
addressee is nonetheless not offensive is needed.
2. Corpus study
The empirical evidence for the class of gestures called discourse pointing or pointing at the
addressee is based on a corpus study. We surveyed six route-giving dialogues from the SaGA
(Speech and Gesture Alignment) corpus (Lücking et al., 2010). A summary of this SaGA subset
is collected in Tab. 1. It contains 2,192 gestures in total. The gestures have been assigned to
the gesture classes beat,deictic (i.e. pointing), discourse,iconic and possible combinations
thereof. Hand and arm movements that do not seem to constitute a gesture have been labeled
move – see the annotation manual (Bergmann et al., 2014) for details. As can be seen in Tab. 1
the seventh part of the gestures observed in SaGA are deictic gestures, nearly the fifth part
are discourse gestures. Within the six dialogues, we found 44 instances of discourse pointing.
That is, about the seventeenth part of discourse or pointing gestures is addressee pointing.
5Abstract deixis has already been discussed as “Deixis am Phantasma” by Bühler (1934: § 8).
V2 V4 V5 V6 V7 V25 sum
beat 1 25 6 7 2 0 41
deictic 22 73 53 64 39 37 288
deictic-beat 1 9 8 7 4 2 31
discourse 43 74 107 210 10 13 457
iconic 27 185 344 154 124 57 891
iconic-beat 1 29 16 23 1 10 80
iconic-deictic 13 26 95 26 49 9 218
iconic-deictic-beat 0 6 3 2 0 0 11
move 5 77 25 26 15 23 171
unclear 3 0 0 0 1 0 4
sum 116 504 657 519 245 151 2,192
Table 1: Number of gestures (counted over both left and right hand/arm) in the six SaGA
dialogues used in the corpus survey, summed up for both participants.
We looked through each of the dialogue recordings and collected and transcribed instances of
discourse pointing (examples are presented below). Each occurrence was assigned to one of
the following, bottom-up defined classes:
(5) Sub-kinds of discourse pointing
a. “UTT” (utterance anaphora): indicating a discourse referent (DR) of the actual ut-
terance (difference to CG); occurs with topic (DR) introduction, affirmation of ut-
terance of the other interlocutor, clarification or information requests, or corrections;
we found formal variation between pointing at interlocutor and index finger raising;
corpus frequency: 20 (topic intro: 3, affirm: 12, self-correction: 3, request: 2)
b. “CG” (common ground): shared information pointing; indicating a DR which already
has been grounded; corpus frequency: 13
c. “SCTM” (something’s coming to mind): pointing gesture associated with having an
idea or recollection (in this latter case it is also CG); often affiliated to expressives
(e.g., ah!) in speech; corpus frequency: 9
d. “GrabTurn”: often realized by index finger raising; affiliated to turn-taking expres-
sions (wait;I have a question); corpus frequency: 2
An instance of CG (common ground pointing) is shown in (6) – more examples for each of
the classes just introduced are given in Sec. 4. (6) is taken from SaGA dialogue V5, start-
ing at video time 13:58. The transcription follows a minimal transcription according to the
GAT-2 (Gesprächsanalytisches Transkriptionssystem 2) standard for spoken German discourse
(Selting et al., 2009). We use “R” in order to indicate the route-giver and “F” to indicate the
route-follower. The original German transcription is given first, followed by a free English
translation right of a slash “/”.
(6) Context: F is recapitulating the route which has just described to him by R. Now he is
trying to recall the landmark at a certain point of the route (turn 1). Due to his hesitation
(“die (.) die”), R completes the utterance (turn 2) while discourse pointing at F. The
completion was successful since it got accepted by F (turn 3).
F: da steht die (.) die / there is the the1
R: die SKULptur ((pointing at F)) / the sculpture2
F: die skulptur drauf / the sculpture on top3
Neither F nor the sculpture talked about is a plausible candidate for a located object, so the
referent identification use (use 4a) has to be excluded. This holds literally as well as projec-
tively, since the space F occupies has not been assigned to some outer-world scene. That is,
abstract deixis (use 4b), mapping a region of gesture space to the physical space of the de-
scribed situation) has to be excluded, too. Likewise a deferred interpretation (use 4c) licensed
by a metonymic relation between the sculpture and F is not available: a reasonable contiguity
relation between the index (F) and the referent (the sculpture) is simply lacking. Hence, CG
pointing cannot be subsumed within the classes in (4).
In order to provide an analysis we have to take a different route. Following functional anal-
yses from gestures studies, we take CG gestures to be a kind of a shared information gesture
(Bavelas et al., 1992), which can be construed as markers of common ground (Holler, 2010).
What then is the CG gesture’s contribution? Its affiliated expression in turn 2 in (6) is an NP,
die Skulptur, which, due to its definiteness, either has to be linked to an already familiar DR, or
to be accommodated. The CG gesture disambiguates this interpretation fork by cancelling the
accommodation branch and signaling that the DR is indeed part of the CG of the interlocutors.
Concretely, (6) indicates for some constituent of the current utterance (a contextual parameter
of CG’s meaning) that it is the constituent of a grounded proposition. A more precise character-
ization of this informal interpretation is given in Sec. 4. Before we can spell out our semantic
formalization of the discourse pointing subclasses, we have to provide background about the
framework in which this formalization is couched.
3. KoS/TTR
We formulate our account within the framework of KoS (Ginzburg, 1994; Ginzburg and Cooper,
2004; Larsson, 2002; Purver, 2006; Fernández, 2006; Ginzburg and Fernández, 2010; Ginzburg,
2012).6KoS is a theory that combines an approach to semantics inspired by situation semantics
and dynamic semantics with a view of interaction influenced by Conversation Analysis. In KoS
6KoS is a toponym – the name of an island in the Dodecanese archipelago – bearing a loose connection to
conversation-oriented semantics.
instead of a single context, analysis is formulated at a level of cognitive states, one per conver-
sational participant. Each cognitive state consists of two ‘parts’, a private part and the dialogue
gameboard that represents information that arises from publicized interactions and on which we
focus here. The structure of the dialogue gameboard is given in (7) – the spkr,addr fields allow
one to track turn ownership, Facts represents conversationally shared assumptions, VisSit keeps
track of the visual situation including the focus of visual attention, Pending and Moves repre-
sent respectively moves that are in the process of being or have been grounded, QUD tracks
the questions currently under discussion, and mood tracks certain emotive aspects, important
for the analysis of non-verbal signals such as laughter, smiling, and frowning (Ginzburg et al.,
2020). Of these contextual parameters at least one, VisSit, is probably never entirely identical
across participants since distinct interlocutors do not share the same pair of eyes, and moreover
much of the time interlocutors have each other as their focus of attention. Nonetheless, there
are various devices most prominently perhaps pointing to effect alignment.
(7) DGBType =de f
spkr : Ind
addr : Ind
utt-time : Time
c-utt : addressing(spkr,addr,utt-time)
Facts : Set(Proposition)
VisSit : hInAttention : Indi
Pending : list(locutionary Proposition)
Moves : list(illocutionaryProposition)
QUD : poset(Question)
mood : Appraisal
To understand better the specification in (7), we offer a short digression concerning the logical
underpinnings of KoS. KoS is formulated within the framework of Type Theory with Records
(TTR) (Cooper, 2005, 2012; Cooper and Ginzburg, 2015; Cooper, 2021). TTR is a model–
theoretic descendant of the by and large proof theoretic Martin-Löf Type Theory (Ranta, 1994;
Betarte and Tasistro, 1998) and of situation semantics (Barwise and Perry, 1983; Cooper and
Poesio, 1994; Seligman and Moss, 1997; Ginzburg and Sag, 2000). For current purposes, the
key notions of TTR are the notion of a judgement and the notion of a record.
The typing judgement:a:Tclassifying an object aas being of type T. Examples are given
in (8). (8a) and (8b) involve basic “atomic” types IND(ividual) and TIME. In (8c) run(arg1IND
=b, arg2TIME =t) is a p(redicate)–type, that arises by assigning the entities b,t, respectively
to the argument roles of run; arg1IND requires its fillers to be of type IND, whereas arg2TIME
requires its fillers to be of type TIME; we will usually notate such types as (8d). Ranta (1994)
proposed that elements such as sin (8d) be viewed as events or situations.
(8) a. b: IND
b. t: TIME
c. run(arg1IND =b, arg2TIME =t)
d. s : run(b,t)
Records: A record is a set of fields assigning entities to labels of the form (9a), partially ordered
by a notion of dependence between the fields – dependent fields must follow fields on which
their values depend. A concrete instance is exemplified in (9b). This is a record with four fields
x,e-time, e-loc, and ctemp-at-in to which are assigned respectively a number, a time, a location,
and a situation sit1; the example is further discussed in (11). Records are used here to model
events and states, including utterances, and dialogue gameboards.
(9) a.
l1=val1
l2=val2
...
ln=valn
b.
x = 28
e-time = 2AM, Feb 17, 2019
e-loc = Nome
ctem patin = sit1
Record Types: a record type is a record where each field represents a judgement rather than an
assignment, as in (8a). The basic relationship between records and record types is that a record
ris of type RT if each value in rassigned to a given label lisatisfies the typing constraints
imposed by RT on li. More precisely, as in (10b):
(10) a.
l1:T1
l2:T2
...
ln:Tn
b. The record
l1=a1
l2=a2
...
ln=an
is of type:
l1:T1
l2:T2
...
ln:Tn
iff a1:T1,a2:T2,...,an:Tn
To exemplify this, (11a) is a possible type for (9b), assuming the conditions in (11b) hold.
Record types are used to model utterance types (often referred to in formal grammar as signs)
and to express rules of conversational interaction.
(11) a.
x : Ind
e-time : Time
e-loc : Loc
ctemp-at-in : temp_at_in(e-time,e-location,x)
b. 28 : Ind; 2:00AM, Feb 17, 2019 : Time; Nome : Loc; sit1 : temp_at_in(2:00AM,
Feb 17, 2019, Nome, 28)
Contextual reasoning will be important here in several ways. First, we characterize dialogue
regularities (e.g., As assertion pgives rise to the possibility that B accepts por alternatively that
B initiates discussion of the question p?) in terms of conversational rules, mappings between
two cognitive states the precond(ition)s and the effects. Conversational rules can come in two
flavours, rules that each interlocutor applies in the same way to their cognitive state (participant
neutral) and rules that are specified only for particular interlocutors (participant sensitive). The
latter kind of specification is, in principle, more general and is particularly important for an
algorithmic perspective involving generation see e.g., Larsson, 2002; Cooper, 2021. Most of
the conversational rules we will specify will be participant neutral, as exemplified in the rules
given in (12):
(12) a. Ask QUD-incrementation: given a question qand ASK(A,B,q) being the Latest-
Move, one can update QUD with qas MaxQUD.
pre : "q : Question
LatestMove = Ask(spkr,addr,q) : IllocProp#
effects : QUD = Dq,pre.QUDE: poset(Question)
b. QSPEC: this rule characterizes the contextual background of reactive queries and
assertions – if qis MaxQUD, then subsequent to this either conversational partici-
pant may make a move constrained to be q-specific (i.e., either About or Influencing
q).
pre : QUD = Dq, QE: poset(Question)
effects :
r : Question Prop
R: IllocRel
LatestMove = R(spkr,addr,r) : IllocProp
c1 : Qspecific(r,q)
We exemplify a participant sensitive rule that relates to one of the most basic communicative
interactions from infancy, namely visual attention directing, where A directs B to an object
o(Lücking, 2018). This is a visual situation update rule, analogous to the QUD and FACTS
update rules above. The sole difference is that in this case B needs to modify her visual situa-
tion so that it includes oas the visual focus, whereas A must already have updated his visual
situation to effect such an act. The notation we use for such rules is exemplified in (13a), where
the rule applies to the dialogue gameboard of current addressee, with the obvious change in the
case where it applies to the current speaker. (13b) provides the specification for visual situation
update rule:
(13) a.
tcs ="dgb : DGBType
private : Private #: TCS
B = dgb.addr : IND
B.pre = T1 : DGBType
B. effects = T2 : DGBType
b. Visual situation update:
tcs="dgb : DGBType
private : Private #: TCS
B = dgb.addr : IND
B.pre : "o : Ind
LatestMove = DirectAttention(spkr,addr,o) : IllocProp#
B.effects : hVisSit.InAttention = o : Indi
The final logical notion we introduce is the situation semantics notion of an Austinian propo-
sition (Barwise and Etchemendy, 1987). Deriving from Austin’s (1950) theory of truth (a true
assertion involving a situation token matching a situation type), they were originally proposed
to explicate assertions and relatedly beliefs. In TTR they are identified with records of the form
(14a) whose truth conditions are defined in (14b):
(14) a. sit = s
sit-type = T
b. A proposition p=sit = s0
sit-type = ST0is true iff s0:ST0
Subsequently, such propositions have been used in modelling utterance processing (Ginzburg,
2012). Ginzburg (2012) proposes that dialogue interaction is, to a large extent, structured by
a series of branching points where an utterance is either grounded (Clark, 1996) or gives rise
to clarification interaction or repair. Ginzburg (2012) shows that the specific conditions for
grounding and possibilities for repair of an utterance ucan be read off the locutionary propo-
sition defined by uand a grammatical type Tu, intuitively the sign (in the Saussaurean sense)
associated with u. That the locutionary proposition involves the entire sign and not merely its
semantic components is motivated, in part, by the fact that this enables the locutionary propo-
sition to characterize the forms that are possible means to ground or request clarification about
uand these exhibit significant syntactic and phonological parallelism with u(Ginzburg and
Cooper, 2004). (15b) exemplifies lexical entries we will posit below for laughter and its ilk.
Here it is a somewhat simplified lexical entry for the particle mmh used to by B to acknowledge
understanding of a prior utterance by A. It has fields for phonological and syntactic types, as
well as for the contextual parameters of the utterance (DGB-PARAMS) needed to resolve the
content of an utterance of mmh on a given use. In this case the contextual parameters are an
utterance token and the conversational participants:
(15) a. A locutionary proposition sit = u0
sit-type = Tu0is true iff u0:Tu0, in other words
iff the sign fully classifies the utterance; otherwise, repair interaction ensues.
b.
phon : mmh
syncat : interjection
dgb-params :
spkr : Ind
addr : Ind
u : sign
c1 : address(addr,spkr,u)
cont = Acknowledge(u,spkr) : IllocProp
4. Lexicalizing addressee pointing gestures
4.1. CG
We already encountered a CG example in (6) in Sec. 2. We left the discussion by observing that
the CG gesture indicates for some constituent of the current utterance that it is the constituent
of a common ground proposition. Having the tools from Sec. 3 at disposal we can make this
interpretation precise since both the constituents of an utterance and FACTS (i.e. CG) are
contextual parameters, among others. The working of CG pointing can be captured in terms of
the lexical entry in (16). Concretely, it indicates of some sub-utterance that is a constituent of
(the maximally) pending utterance – an utterance still in the process of being grounded – that
it fills an argument role of an already grounded proposition p(pis part of FACTS, see c2).
(16) CG 7→
Shape : PointType
dgb-params :
MaxPending : LocProp
u : sign
c1 : In(u,MaxPending.constits)
R : Rel
a : IND
p = R(a) : Prop
c2 : In(FACTS,p)
cont = hc3 : =(u.cont,a)i: RecType
Note that the lexical entries we provide here are simplified in that they abstract over different
tiers or channels. They can be embedded into a tier-based framework of dialogue gameboards,
however (Lücking and Ginzburg, 2020). Of course, the pointing gesture alone is not able
to discern the constituent which is indicated to be already familiar. This is achieved by the
accompanying speech that in case of example (6) involves a repetition of the constituent’s
PHON type.
However, identification of the grounded constituent does not necessarily involve segmental
repetition. In example (17) (SaGA V2, 9:16) the constituent in question is identified by means
of its order of appearance in the route direction:
(17)
F: ok_nochmal beim anfang dieses <<pointing at R> mit den säulen scheint ja1
irgendwie was komplizierter zu sein ja? (-)>2
/ ok back to the start, the thing (CG pointing) with the pillars seems to be a bit more complicated, isn’t it?
How does this work? The FACTS field is populated inter alia by (descriptive contents of)
grounded moves. MOVES are stored within a list. The addressee of CG pointing from (17)
just has to identify the first move from the route direction list and retrieve the constituent(s) it
introduces.
A related example is given in (18) (SaGA V4, 9:43):
(18)
F: auf jeden fall (.) DANN ((pointing at R)) muss ich in den park gehen?1
/ anyhow, THEN (CG pointing) I have to go into the park?
The difference to (17) is that the constituent-relevant move is indicated in a relative manner
(namely after some other route direction component) by then, rather than according to its order
of appearance.
4.2. Utterance anaphora
While CG pointing indicates that a given constituent is already known, Utt pointing (utterance
anaphora) emphasizes a DR of the actual utterance. Accordingly, Utt pointing often occurs with
topic (DR) introduction, an affirmation of the utterance of the other interlocutor, clarification
request, or corrections – see the frequencies collected in (5) in Sec. 2. It should be noted
that Utt pointing formally is not only realized by pointing at R or F, but also by index finger
raising, which is not a proper pointing at the addressee. We cannot explore here further such
form/function variations, however, though it is a potentially important consideration for future
work.
A real-world example is shown in (19) (SaGA V2, 7:30):
(19)
R: hh und dann kommen halt äh (-) die ((pointing at F)) BÄUme1
/ and then there will just eh be the (UTT pointing) TREes
Albeit a kinematically modest Utt pointing, R, while prosodically stressing trees, points at F
(the addressee). Once more, F is not the index of the pointing gesture. Rather, the gesture
put emphasis on a DR of the accompanying utterance – in case of (19) this is the constituent
associated with the plural noun trees. In the context of the route direction this Utt pointing
highlights a new landmark. It is therefore bound up with topic introduction or topic switch and
contributes to the structure of the ongoing conversation.
The lexical entry we provide in (20) assigns as the content of Utt uses the speech event associ-
ated with a sub-utterance of the (maximally) Pending utterance.
(20) Utt 7→
Shape : PointType
dgb-params :
MaxPending : LocProp
u : sign
c1: member(u1, MaxPending.sit.constits)
cont = u.sit : Rec
4.3. SCTM
SCTM pointing (something’s coming to mind) indicates just that: the speaker suddenly recalls
something different from what he or she is talking about. SCTM is best illustrated by means of
an example (SaGA V4, 5:23):
(21)
R: da gehst du rein (-) h da kommt n SEE: / there you enter, and there is a1
LAKE2
R: ah gut ((pointing at F)) (.) ich glaub / well (SCTM pointing ) I guess3
es kam doch erst der park / there was the park first4
In (21) the direction giver R continues her route description by introducing what she believes
to be the next landmark/topic (namely the lake). She then recognizes that she was confused:
the park was before the lake. The point of recall is indicated by particles in speech (ah gut) and
by addressee pointing. Intuitively SCTM pointing signals something like wait a moment and
there will be a modification/repair.
So at the bottom line SCTM involves topic change. More precisely, SCTM pointing is akin to a
forward looking disfluency (Ginzburg et al., 2014), a discourse particle that provides indications
about a looming utterance, in this case that the issue it concerns is distinct from the current one.
In this case we capture the effect in terms of a lexical entry that expresses the move effected
by the pointing and a conversational rule that indicates a subsequent contextual update such a
move underwrites.
(22) SCTM 7→
Shape : PointType
dgb-params :
spkr: Ind
addr: Ind
utt-time : Time
c1 : Address(spkr,addr,utt-time)
Pending.cont : IllocProp
q : Question
c1 : About(Pending.cont,q)
cont = ChangeTopic(spkr,q) : IllocProp
(23) SCTM conversational rule 7→
preconds : "q : Question
LatestMove = ChangeTopic(spkr,q) : IllocProp#
spkr = preconds.spkr : Ind
Pending.cont : IllocProp
c2: ¬About(Pending.cont,preconds.q)
We found SCTM in two variants: as addressee pointing and as index finger raising.
4.4. GrabTurn
Probably the most straightforward kind of discourse pointing is GrabTurn: it effectuates turn
change. Accordingly, it is affiliated to turn-taking expressions – in (25), for instance, a request
to pose a clarification question. With just two instances, GrabTurn was the least frequent kind of
discourse pointing in our corpus, and both occurrences were produced by index finger raising.
So it remains to be seen whether there is also an addressee pointing variant, as we suspect. The
first occurrence of GrabTurn in our sample is the following (SaGA V4, 4:28):
(24)
R: du bleibst auf jeden fall auf der straße wo du bist und gehst geradeaus h /1
in any case you stay on the street where you are and go straight ahead2
F: <<index raised, repeated>ich frage nochmal kurz was nach> (.) also ähm / I3
have a quick clarification request ehm4
F interrupts R with a raised index finger. F tries to catch R’s attention both visually as acousti-
cally. The reason for the interruption is then explained. As with STCM, we analyze GrabTurn
by means of a lexical entry and a conversational rule that gives rise to turn change:
(25) GrabTurn 7→
Shape : PointType
dgb-params :
spkr: Ind
addr: Ind
utt-time : Time
c1 : Address(spkr,addr,utt-time)
cont = GrabTurn(addr,utt-time) : IllocProp
(26) GrabTurn conversational rule7→
Preconds :
spkr: Ind
addr: Ind
utt-time : Time
LatestMove = GrabTurn(addr,utt-time) : IllocProp
Effects : "spkr = pre.addr : Ind
addr = pre.spkr : Ind#
5. Conclusions
In sum, the significance of pointing gestures not only consists in locating referents, but also
in controlling the addressee’s attention and her view of the status of these referents in the
incrementally emergent context. Accordingly, a dialogical notion of grammar is required, in
terms of which discourse pointing can be analyzed.
It is tempting to think about a coherent framework for the various uses of pointing gestures:
identification, abstract, deferred, and discourse. A common theme seems to be that in any of
these uses the pointing gesture acts as an instruction for the addressee to find the referent (which
in turn is further described in speech or by contextual salience). Just the search domain differs:
visual domain in concrete deictic pointing
knowledge domain for indirect classification in deferred reference
geometric projection in abstract deixis
utterance context and dialogue management in discourse pointing
A coherent framework will emerge in future work. This work includes extended corpus work
in order to get a better quantitative picture of the distribution of discourse pointing, and to
identify potentially further uses of addressee pointing. We have only very briefly covered gaze
and intonation. Following a truly multimodal analysis, however, the functional interaction of
discourse pointing with other verbal and non-verbal signals will be examined. Accordingly,
multi-tier extensions of dialogue gameboards provide a starting point (Lücking and Ginzburg,
2020).
References
Alahverdzhieva, K., A. Lascarides, and D. Flickinger (2017). Aligning speech and co-speech
gesture in a constraint-based grammar. Journal of Language Modelling 5(3), 421–464.
Austin, J. L. (1950). Truth. In Proceedings of the Aristotelian Society. Supplementary, Vol-
ume xxiv, pp. 111–128. Reprinted in John L. Austin: Philosophical Papers. 2. ed. Oxford:
Clarendon Press, 1970.
Barwise, J. and J. Etchemendy (1987). The Liar. New York: Oxford University Press.
Barwise, J. and J. Perry (1983). Situations and Attitudes. Bradford Books. Cambridge: MIT
Press.
Bavelas, J. B., N. Chovil, D. A. Lawrie, and A. Wade (1992). Interactive gestures. Discourse
Processes 15(4), 469–489.
Bergmann, K., O. Damm, F. Freigang, C. Fröhlich, F. Hahn, S. Klett, A. Löcking, J. Kopp,
Stefan Letetzki, H. Rieser, N. Thomas, and N. Wittwer (2014). Documentation – Sagaland.
Bielefeld University: SFB 673, Project B1. https://www.phonetik.uni-muenchen.de/
Bas/BasSaGAeng.html.
Betarte, G. and A. Tasistro (1998). Martin-Löf’s type theory with record types and subtyping.
In G. Sambin and J. M. Smith (Eds.), 25 Years of Constructive Type Theory. Proceedings of
a Congree held in Venice, October 1995. New York: Oxford University Press.
Bühler, K. (1934). Sprachtheorie: Die Darstellungsfunktion der Sprache. Jena: Gustav Fischer
Verlag.
Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press.
Cooper, R. (2005). Austinian truth, attitudes and type theory. Research on Language and
Computation 3(4), 333–362.
Cooper, R. (2012). Type theory and semantics in flux. In R. Kempson, N. Asher, and T. Fer-
nando (Eds.), Handbook of the Philosophy of Science, Volume 14: Philosophy of Linguistics,
pp. 271–323. Amsterdam: Elsevier.
Cooper, R. (2021). From perception to communication: An analysis of meaning and action
using a theory of types with records (TTR). https://github.com/robincooper/ttl.
Book Draft.
Cooper, R. and J. Ginzburg (2015). Type theory with records for natural language semantics. In
C. Fox and S. Lappin (Eds.), Handbook of Contemporary Semantic Theory, second edition,
Oxford. Blackwell.
Cooper, R. and M. Poesio (1994). Situation theory. In Fracas Deliverable D8. Centre for
Cognitive Science, Edinburgh: The Fracas Consortium.
Ebert, C. and C. Ebert (2016). The semantic behaviour of co-speech gestures and their role
in demonstrative reference. Invited talk given at Institut Jean-Nicod, Département d’Études
Cognitives Ecole Normale Supérieure, Paris.
Fernández, R. (2006). Non-Sentential Utterances in Dialogue: Classification, Resolution and
Use. Ph. D. thesis, King’s College, London.
Frege, G. (1918). Der Gedanke. Beiträge zur Philosophie des deutschen Idealismus 1(2),
58–77.
Fricke, E. (2007). Origo, Geste und Raum. Number 24 in Linguistik – Impulse & Tendenzen.
Berlin, New York: Walter de Gruyter.
Fricke, E. (2012). Grammatik multimodal. Wie Wörter und Gesten zusammenwirken. Num-
ber 40 in Linguistik – Impulse und Tendenzen. Berlin, Boston: De Gruyter.
Ginzburg, J. (1994). An update semantics for dialogue. In H. Bunt (Ed.), Proceedings of the
1st International Workshop on Computational Semantics. Tilburg: ITK, Tilburg University.
Ginzburg, J. (2012). The Interactive Stance: Meaning for Conversation. Oxford: Oxford
University Press.
Ginzburg, J. and R. Cooper (2004). Clarification, ellipsis, and the nature of contextual updates.
Linguistics and Philosophy 27(3), 297–366.
Ginzburg, J. and R. Fernández (2010). Computational models of dialogue. In A. Clark, C. Fox,
and S. Lappin (Eds.), Handbook of Computational Linguistics and Natural Language, Ox-
ford. Blackwell.
Ginzburg, J., R. Fernández, and D. Schlangen (2014). Disfluencies as intra-utterance dialogue
moves. Semantics and Pragmatics 7(9), 1–64.
Ginzburg, J., C. Mazzocconi, and Y. Tian (2020). Laughter as language. Glossa: a journal of
general linguistics 5(1).
Ginzburg, J. and M. Poesio (2016). Grammar is a system that characterizes talk in interaction.
Frontiers in Psychology 7, 1938.
Ginzburg, J. and I. A. Sag (2000). Interrogative Investigations: the form, meaning and use
of English Interrogatives. Number 123 in CSLI Lecture Notes. Stanford: California: CSLI
Publications.
Grosz, P. G. (2019). Pronominal typology and reference to the external world. In Proceedings
of the Amsterdam Colloquium 2019, AC’19, pp. 563–573.
Higgleton, E. and A. Seaton (Eds.) (1996). Harper’s Essential English Dictionary. New Delhi,
India: Allied Chambers.
Holler, J. (2010). Speakers’ use of interactive gestures as markers of common ground. In
S. Kopp and I. Wachsmuth (Eds.), Proceedings of Gesture Workshop 2009, Number 5934 in
Lecture Notes in Artificial Intelligence, pp. 11–22. Springer.
Kaplan, D. (1978). Dthat. In P. Cole (Ed.), Pragmatics, Number 9 in Syntax and Semantics,
pp. 221–243. New York, San Francisco, London: Academic Press.
Kaplan, D. (1989). Demonstratives. In J. Almog, J. Perry, and H. Wettstein (Eds.), Themes
from Kaplan, pp. 481–563. New York, Oxford: Oxford University Press.
Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In
M. R. Key (Ed.), The Relationship of Verbal and Nonverbal Communication, Number 25 in
Contributions to the Sociology of Language, pp. 207–227. The Hague: Mouton.
Kühnlein, P. (1999). Dynamics of complex information. In E. André, M. Poesio, and H. Rieser
(Eds.), Proceedings of the Workshop on Deixis, Demonstration and Deictic Belief at ESSLLI
XI. Paper 11.
Larsson, S. (2002). Issue based Dialogue Management. Ph. D. thesis, Gothenburg University.
Lascarides, A. and M. Stone (2009). A formal semantic analysis of gesture. Journal of Seman-
tics 26(4), 393–449.
Lücking, A. (2013). Ikonische Gesten. Grundzüge einer linguistischen Theorie. Berlin, Boston:
De Gruyter. Zugl. Diss. Univ. Bielefeld (2011).
Lücking, A. (2018). Witness-loaded and witness-free demonstratives. In M. Coniglio, A. Mur-
phy, E. Schlachter, and T. Veenstra (Eds.), Atypical Demonstratives. Syntax, Semantics and
Pragmatics, Number 568 in Linguistische Arbeiten. De Gruyter.
Lücking, A. and J. Ginzburg (2020). Towards the score of communication. In Proceedings of
The 24th Workshop on the Semantics and Pragmatics of Dialogue, SemDial/WatchDial.
Lücking, A., T. Pfeiffer, and H. Rieser (2015). Pointing and reference reconsidered. Journal of
Pragmatics 77, 56–79.
Lücking, A., K. Bergmann, F. Hahn, S. Kopp, and H. Rieser (2010). The Bielefeld speech
and gesture alignment corpus (SaGA). In Multimodal Corpora: Advances in Capturing,
Coding and Analyzing Multimodality, LREC 2010, pp. 92–98. 7th International Conference
for Language Resources and Evaluation.
McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review 92(3), 350–
371.
McNeill, D., J. Cassell, and E. T. Levy (1993). Abstract deixis. Semiotica 95(1-2), 5–19.
Nunberg, G. (1993). Indexicality and deixis. Linguistics and Philosophy 16(1), 1–43.
Purver, M. (2006). Clarie: Handling clarification requests in a dialogue system. Research on
Language & Computation 4(2), 259–288.
Ranta, A. (1994). Type Theoretical Grammar. Oxford: Oxford University Press.
Rieser, H. (2004). Pointing in dialogue. In Proceedings of the Eighth Workshop on the Seman-
tics and Pragmatics of Dialogue, Catalog ’04, pp. 93–100.
Seligman, J. and L. Moss (1997). Situation Theory. In J. van Benthem and A. ter Meulen
(Eds.), Handbook of Logic and Linguistics. Amsterdam: North Holland.
Selting, M., P. Auer, D. Barth-Weingarten, J. Bergmann, P. Bergmann, K. Birkner, E. Couper-
Kuhlen, A. Deppermann, P. Gilles, S. Günthner, M. Hartung, F. Kern, C. Mertzlufft,
C. Meyer, M. Morek, F. Oberzaucher, J. Peters, U. Quasthoff, W. Schütte, A. Stukenbrock,
and S. Uhmann (2009). Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächs-
forschung – Online-Zeitschrift zur verbalen Interaktion 10, 353–402.
... Here, the pointing gesture not only establishes a deictic reference to the preceding speaker, but it also indicates that what is to come bears a topical connection to what this specific speaker just said. Building on previous research on pointing gestures that refer to common ground (Bavelas et al., 1992(Bavelas et al., , 1995Ginzburg & Lücking, 2021;Holler, 2010;Mondada, 2015), this study asks how far gestural backlinking can reach back into the conversational record shared among participants. The analysis demonstrates that other-directed pointing can be used (1) to link back to a prior turn in sequential proximity as well as (2) to establish connections across longer stretches of talk so that a more distant, past topic is made relevant again. ...
... In line with this, Holler (2010) provides experimental evidence that participants sharing mutual knowledge about a topic-of-talk use significantly more addresseedirected open palm gestures (PUOH) and index finger pointing. Working with experimentally controlled dyadics as well, Ginzburg and Lücking (2021) identify four subclasses of addresseedirected pointing, two of which relate, again, to CG (p. 279): They indicate discourse entities that have already been grounded (CG) and that are being recalled by the speaker (SCTM, "something's coming to mind"). ...
... During word search activities, pointing at the co-searcher often accompanies a response consisting of a candidate expression or the confirmation of a proposed expression. Both uses serve to make an internal moment of recognition public and accessible to the addressee (Sugiura, 2019), which resonates well with the findings of Ginzburg and Lücking (2021). 3 Other responsive turns conveying confirmation and agreement can also be accompanied by pointing directed at the just prior speaker (Healy, 2012;Yasui, 2017). ...
Article
Full-text available
This paper discusses an embodied strategy for making topical connections transparent by utilizing the co-presence of interlocutors: Pointing at co-participants. Building on previous research on common ground-related pointing gestures, I demonstrate that index finger pointing can be used to link back to prior parts of a conversation associated with the person being pointed to. This creates cohesion between discourse units in naturally occurring conversation. As the analysis shows, gestural back-linking can be done in sequential proximity to the conversational reference point, but also across longer distances of talk so that an earlier thread of conversation is picked up again. Pointing can thus be used to refer to the personal common ground accumulated over a shared inter-actional history. This discourse deictic usage of pointing is found mainly in pre-turn and turn-initial position and helps navigate the topical landscape in multi-party interactions. Data comes from naturally occurring German conversations.
Article
Full-text available
This paper deals with three interrelated topics, linguistic anaphora, multi-modal anaphora and the top-down broadcasting of information using gestural post-holds in multimodal dialogue. Initially, a new solution for definite, pronominal and pro-adverbial anaphora is given based on the idea that an existentially quantified general term may output a definite reference. This approach is extended to multimodal anaphora, where part or all of an anaphor’s meaning is contributed by some sequence of iconic or deictic gestures. Anaphora exploit the semantic potential of their antecedents, they work, as tradition has it, “bottom-up”. An inverse relation, more general than cataphora, and investigated here for the first time, is “broadcasting”, where information is freely distributed top down and input to receiving sites (ports). Anaphora are modelled with the same top-down mechanism and the same applies for coherence relations in dialogue which generally show an anaphora-like behaviour. “Broadcasting” can be used in the context of anaphors, for example, to provide their gestural meaning parts but also for a verb’s multi-modal arguments for referring to a location, a direction or an area. As to multi-modal data, broadcasting is shown to be frequently tied up with gestural post-holds, the holding of a gesture’s stroke information independently of semantically alignable speech. This leads to considering post-holds from a new perspective, stressing their speech-independent function and their relevance for indicating topic-continuity. We show that multi-modal anaphora and especially broadcasting cross single contributions and turns. The data which let us develop these perspectives come from the SaGA (Speech and Gesture Alignment) corpus, a set of route-description dialogues generated in a VR-setting incorporating marker-based eye-tracking facilities. The calculus used to model the anaphora and broadcasting dynamics is the concurrent λΨ-calculus, a recently developed two-tiered machinery using a Ψ-calculus for input-output, data transport and broadcasting. The data transported are in a typed λ-calculus format incorporating Neo-Davidsonian representations; these data can be linguistic, gestural only or multi-modal. Multi-modal informational chunks are modelled as communicating agents sending and receiving information via input-output-channels. They are introduced incrementally on an empirically motivated construction or gesture-plus-construction or gesture only basis. The λΨ-calculus is also used for the multi-modal fusion component unifying gestural and linguistic information; hence, the paper is also a contribution to multi-modal fusion of linguistic and gestural input. Finally, it is shown how the presented algorithm can capture multi-modal coherence relations or a multi-modal anaphora resolution based on PTT ideas.
Chapter
Full-text available
“Behavioromics” is a term that has been invented to cover the study of multimodal interaction from various disciplines and points of view. These disciplines and points of view, however, lack a platform for exchange. The workshop session on “Semantic, artificial and computational interaction studies” provides such a platform. We motivate behavioromics, sketch its historical background, and summarize this year’s contributions.KeywordsMultimodal interactionBehavioromicsSemanticsHuman-computer interaction
Conference Paper
Full-text available
The exchange of verbal and non-verbal communication signals in face-to-face dialogue is complexly organised in several ways: each contribution is produced and processed incre-mentally, contributions may be consecutive (e.g. question-answer pairs) or overlapping (e.g. backchannelling), and the contributions themselves may be multimodal. Contributions nonetheless exhibit pairwise utterance coherence , and in two respects: across tiers and across discourse co-texts. For these reasons, we propose to distribute dialogue agents across different tiers and to 'incrementalize' the sequential notion of turns according to the model of music-inspired communication scores.
Article
Full-text available
This paper concerns the form-meaning mapping of communicative actions consisting of speech and improvised co-speech gestures. Based on the findings of previous cognitive and computational approaches, we advance a new theory in which this form-meaning mapping is analysed in a constraint-based grammar. Motivated by observations in naturally occurring examples, we propose several construction rules, which use linguistic form, gesture form and their relative timing to constrain the derivation of a single speech-gesture syntax tree, from which a meaning representation can be composed via standard methods for semantic composition. The paper further reports on implementing these speech-gesture construction rules within the English Resource Grammar (Copestake and Flickinger 2000). Since gestural form often underspecifies its meaning, the logical formulae that are composed via syntax are underspecified so that current models of the semantics/pragmatics interface support the range of possible interpretations of the speech-gesture act in its context of use.
Article
Full-text available
Much of contemporary mainstream formal grammar theory is unable to provide analyses for language as it occurs in actual spoken interaction. Its analyses are developed for a cleaned up version of language which omits the disfluencies, non-sentential utterances, gestures, and many other phenomena that are ubiquitous in spoken language. Using evidence from linguistics, conversation analysis, multimodal communication, psychology, language acquisition, and neuroscience, we show these aspects of language use are rule governed in much the same way as phenomena captured by conventional grammars. Furthermore, we argue that over the past few years some of the tools required to provide a precise characterizations of such phenomena have begun to emerge in theoretical and computational linguistics; hence, there is no reason for treating them as “second class citizens” other than pre-theoretical assumptions about what should fall under the purview of grammar. Finally, we suggest that grammar formalisms covering such phenomena would provide a better foundation not just for linguistic analysis of face-to-face interaction, but also for sister disciplines, such as research on spoken dialogue systems and/or psychological work on language acquisition.
Chapter
Full-text available
According to current theories of demonstratives, both discourse referentially (endophoric) and real-world referentially (exophoric) uses of demonstrative noun phrases (DemNPs) obey the same mode of reference. Based on the clarification potential of DemNPs and on data on bridging and deferred reference it is argued that only exophoric DemNPs allow for the identification of a demonstratum, while endophoric ones do not. Furthermore, the view that discourse reference does not involve a demonstration act is taken and, hence, contrary to standard assumption, the claim is made that both uses follow different modes of reference. In order to maintain a unified analysis of DemNPs, it is argued to spell out their semantics in terms of a grammar-dialog interface, where demonstratives and demonstration acts contribute to processing instructions for reference management. In this system, exophoric Dem-NPs are modeled as witness-loaded referential expressions, while endophoric Dem-NPs remain witness-free. A final claim is that the witness gives rise to manifold perceptual classifications, which in turn license indirect reference. The analysis is implemented in Type Theory with Records (which provides the notion of a witness) within Ginzburg’s dialog framework called KoS. The dynamics of demonstratives is captured by a set of rules that govern their processing in dialog. Keywords: demonstratives, demonstration, reference, deferred reference, witnesses, dialog
Article
Full-text available
Although disfluent speech is pervasive in spoken conversation, disfluencies have received little attention within formal theories of grammar. The majority of work on disfluent language has come from psycholinguistic models of speech production and comprehension and from structural approaches designed to improve performance in speech applications. In this paper, we argue for the inclusion of this phenomenon in the scope of formal grammar, and present a detailed formal account which: (a) unifies disfluencies (self-repair) with Clarification Requests, without conflating them, (b) offers a precise explication of the roles of all key components of a disfluency, including editing phrases and filled pauses, and (c) accounts for the possibility of self addressed questions in a disfluency. http://dx.doi.org/10.3765/sp.7.9 BibTeX info
Conference Paper
Full-text available
People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; 3) reliability evalution methods and results; and 4) applications of the corpus in the research domain of speech and gesture alignment.
Chapter
In this paper we will go through the version of type theory TTR (Type Theory with Records) that we have proposed in a number of publications (including, Cooper, Res Lang Comput, 3:333–362, 2005a, J Log Comput, 15(2):99–112, 2005b, Handbook of the philosophy of science, volume 14: Philosophy of linguistics, 2012) and discuss the motivation for some choices that we have made which make it differ from other more standard modern type theories. We will relate TTR to the kind of type theory used in traditional formal semantics, characterizing more modern type theories as rich type theories with a greater variety of types. TTR, unlike many rich type theories, allows objects to be of several types and introduces both a kind of intensionality and modality. While TTR uses the idea that propositions should be modelled by types, it does not complete follow the Curry-Howard Correspondence introducing intersection and union types for a more classical treatment of conjunction and disjunction. It uses record types in place of Σ\varSigma -types and uses dependent types for several aspects of linguistic analysis.