PosterPDF Available

Connectionist Semantic Systematicity in Language Production

Authors:

Abstract

A defining characteristic of human language is systematicity: “the ability to produce/understand some sentences is intrinsically connected to the ability to produce/understand certain others” (Fodor & Pylyshyn, 1988). Further, Fodor and Pylyshyn (1988) argue that connectionist models are not able to display systematicity without implementing a classical symbol system. The connectionist comprehension model developed by Frank, Haselager, and van Rooij (2009), however, challenges this highly debated assertion, by developing a connectionist model of comprehension which is argued to achieve relevant levels of systematicity. Their model constructs a a situation model (see Zwaan and Radvansky (1998)) of the state-of-affairs described by a sentence that also incorporates world knowledge-driven inferences. When the model processes a sentence like ‘a boy plays soccer’, for instance, it not only recovers the explicit, literal propositional content, but also constructs a more complete situation model in which a boy is likely playing outside on a field, with a ball, with others, and so forth. Crucially, Frank et al. (2009)’s model generalizes to both sentences and situations that it has not seen during training, exhibiting different levels of semantic systematicity and is argued to provide an important step in the direction of psychologically plausible models of language comprehension. In the present paper, we examine whether the approach developed by Frank et al. (2009) is equally well suited to language production, and present a connectionist production model that generates sentences from these situation models.
Jesús Calvillo, Harm Brouwer, Matthew Crocker
jesusc; brouwer; crocker@coli.uni-saarland.de
Saarland University
Connectionist Semantic Systematicity in
Language Production
!"#$"#%"&'()*+%,)#-&&
Mapping'''
Seman+cs'à'Sentences.'
!./$"01,%2$.-&
Generaliza+on'to'
unseen'sentences/
seman+cs.'
'
High' overall' performance' of' the' model' shows' that' the' DSS-based'
representa+ons'are'/+2$134"&5)(&0)*"42#6&41#6+16"&7()*+%,)#8&
'
'
The' model' is' able' to' generate' novel' sentences' for' seman+cally' known'
situa+ons'but'with'a'different'voice'(cond.'1&2)'showing'
&à&!.#$1%,%&!./$"01,%2$.&
'
'
The'model' is' able' to' generate' sentences' for' unseen' areas' in' the' seman+c'
space'(cond.'3&5)'showing'
&à&!"01#,%&!./$"01,%2$.'
!"01#,%&!71%"& 92#6+2/,%&!71%"&
Similar'situa+ons'are'close'
to'each'other.'
Con+nuous'
Space'
Similar'situa+ons'are'
assigned'linguis+cally'
similar'realiza+ons.'
Generaliza+on'to'unseen'areas'is'
possible'if'the'model'learns'an'
abstrac+on'of'the'topology'of'the'
spaces'and'their'mapping,'as'proposed'
by'Frank'et'al.'(2009).'
:)#*2,)#& ;+"(.& !20241(2$.&<=>& '"(5"%$&?1$%@&<=>&
1' pas' 97.66' 92.86'
2' act' 97.58' 93.57'
3' act' 98.35' 93.57'
3' pas' 96.79' 83.57'
5' act' 95.08' 85.0'
Average'Test' -' AB8C& DD8EB&
*10-fold'cross'valida+on'averages'(90%'training,'10%'tes+ng)'
*Levenshtein'Similarity'
*Condi+on'4'had'no'passive'sentences'to'compare'with,'thus'no'similarity'scores'could''
''be'calculated.'
F+$7+$& Sophia'beats'Heidi'with'ease'at'hide_and_seek.'
GH7"%$"*& Sophia'beats'Heidi'with'ease'at'hide_and_seek'2#&$@"&3"*())0.'
F+$7+$& Sophia'beats'someone'at'hide_and_seek'in'the'bedroom.'
GH7"%$"*& someone'loses'to'Sophia'at'hide_and_seek'in'the'bedroom.'
PP-aZachment'
31.6%'
'
The'errors'of'5'folds'were'manually'inspected'(38'errors).'
'
With'a'couple'of'excep+ons,'all'sentences'are'/.#$1%,%144.&%)(("%$&1#*&/"01#,%144.&5"42%2$)+/8&
'
Mistakes'occur'when'the'model'produces'a'sentence'that'is'/"01#,%144.&@26@4.&/20241(&to'the'
one'expected.'
'1//2I"&F+$7+$& @2*"J1#*J/""K'is'won'with'ease'by'L"2*2'in'the'playground.'
M%,I"&!"#$"#%"& L"2*2'beats'Sophia'with'ease'in'the'playground'at'@2*"J1#*J/""K.'
'1//2I"&&F+$7+$& 1&$).'is'played'with'in'the'playground'by'Sophia.'
M%,I"&!"#$"#%"& Sophia'plays'in'the'playground.'
Output'of'3'folds'was'manually'inspected'(84'situa+ons).'
'
u''Mostly'correct'and'coherent'with'the'given'seman+cs.'
u''Model'learns'that:''
opassive'sentences'begin'by'the'object'of'the'ac+on.'''''
othe'object'is'never'a'person.'
Hidden'
'(120'units)'
Words'
(40'units)'
DSS'
(150'units)'
'
Comprehension'Model'
Frank'et'al.'(2009)'
monitoring'
Hidden'
'(120'units,'htan)'
Words'
(43'units,'so`max)'
Produc+on'Model'
DSS'
(45'units)'
Passive'
Sentences'
setAP''
(424'situa+ons)'
setA'
(358'situa+ons)'
Ac+ve'
Sentences'
?
1
Condi+ons'
?
2
?
3
?
4
?
?
5
?
?2%()41#6+16"&
'
43'words'
§40#original#words#+2#determiners#and#end-of-sentence#marker#'
'
8201'lawful'sentences:'
§83%'in'ac+ve'voice'
§17%'in'passive'voice'
'
782'unique'DSS'representa+ons:'
§424'related'to'ac+ve'and'passive'sentences'
§358'related'only'to'ac+ve'sentences'
'
Frank'et'al.'(2009)’s'grammar'does'not'define'passive'
sentences'for'situa+ons'where:'
§the'object'of'the'ac+on'is'a'person'(“Heidi'beats'Charlie.”)''
§or'undefined'(“Charlie'plays.”).'
0.1'
''0'
1.0'
0.03'
'…'''
0.8'
''C&
someone,'plays,'chess,'.'
someone,'plays,'chess,'inside,'.'
…'
a,'girl,'plays,'chess,'inside,'.'
a,'girl,'plays,'chess,'in,'the,'bedroom,'.'
0.1'
''0'
1.0'
0.03'
'…'''
0.8'
''N&
chess,'is,'played,'.'
chess,'is,'played,'by,'someone,'.'
…'
chess,'is,'played,'by,'a,'girl,'inside,'.'
chess,'is,'played,'by,'a,'girl,'in,'the,'bedroom,'.'
Ac+veà'
Passiveà'
&
O(12#2#6&'()%"*+("&
'
Cross-Entropy'Backpropaga+on'(Rumelhart,'Hinton'&'Williams,'1986).'
Weight'updates'a`er'each'word.'
Weight'ini+aliza+on'with'random'values'drawn'from'N(0,0.1).'
Bias'units'weights'ini+alized'to'zeros.'
At'+me't,'monitoring'units'were'set'to'what'the'model'was'supposed'
to'produce'at't-1'given'the'training'item.'
Ini+al'learning'rate'of'0.124'which'has'halved'each'+me'there'was'no'
improvement'of'performance'on'the'training'set'during'15'epochs.'
Training'halted'a`er'200'epochs'or'if'there'was'no'performance'
improvement'on'the'training'set'over'a'40-epoch'interval.'
F+$7+$& Sophia'wins'with'ease'at'a'game'2#&$@"&/$(""$.'
GH7"%$"*& Sophia'wins'with'ease'at'a'game')+$/2*".'
overspecifica+on'
23.5%'
underspecifica+on'
39.9%'
DSS'
DSS'
Many'samples'of'microworld'situa+ons'cons+tute'a'“situation-state space”'
Columns'represent'observa+ons'(states-of-affairs).#
Rows'represent'situa+on'vectors'for'basic'events.'
'
Complex'event'vectors'can'be'obtained'by'combining'basic'event'vectors'through'
logical'opera+ons.'
k=1' k=2' k=3' …' k=25000'
741.<@"2*2P%@"//>& 1' 1' 0' …' 0'
741%"</)7@21P/$(""$>& 1' 0' 0' …' 0'
4)/"<@"2*2>& 0' 1' 0' …' 0'
Q&
R2#<%@1(42">& 0' 1' 0' …' 1'
ß'''microworld'observa+ons''à'
ß'''44'basic'events''à'
“charlie'plays'soccer”'
Distributed'Situa+on'Space'(DSS)'model''
(Frank'et'al.,'2009)'
A'state-of-affairs'(situa+on)'in'a'microworld'is'defined'in'terms'of'basic
events'that'can'be'assigned'a'state'(i.e.,'they'can'be'the#case'or'not'the#case)'
Example—“heidi'loses'at'chess”:'
States-of-affairs'are'combina+ons'of'basic'events.'
So'now'we'have'a'way'to'represent'events'(basic'and'complex)'in'terms'of'the'
/2$+1,)#/'in'which'they'are'true.'
0'
0'
1'
0'
…'''
1'
1'
Situa+on'vectors'encode'event'probabili+es.'
Similar'events'are'represented'by'similar'vectors.'
'
Define'the'meaning'of'an'event'in'terms'of'the'31/2%&"I"#$/&with'
which'it'appears.'
###############
à
belief'vectors'
'
Dimensionality':='#'basic'events'
'
Each'dimension:'
P(basic'event'|'complex'event)'
>'Proposi+onal'logic'seman+cs'are'translated'into'belief'vectors'
0.3,'0.5,'0.0,'…,'0.75,'1.0'
0.0,'1.0,'0.1,'…,'0.5,'0.8'
0.0,'1.0,'0.1,'…,'0.5,'0.8'
Results
Model Architecture
Conclusion
Belief Vectors
Semantics
O"/,#6&:)#*2,)#/&
S#7+$TF+$7+$&GH1074"&
Goals
Qualitative Analysis Conds. 4-5 Passives?
... After training, the model successfully learned to map sequences of word representations (representing sentences) onto meaning vectors from S M×P that describe the semantics of these sentences. Since the aim is to investigate how information-theoretic metrics can be derived from the processing behavior of the model, the effects need to be tightly controlled, which is why the model is not tested using a separate set of unseen test sentences (note, however, that other models employing similar meaning representations have shown generalization to unseen sentences and semantics, in both comprehension [37] and production [42]). Instead, the performance of the model was evaluated using a comprehension score comprehension(a,b) [37] that indicates how well meaning vector a is understood to be the case from meaning vector b, resulting in a score that ranges from −1 (perfectly understood not to be the case) to +1 (perfectly understood to be the case). ...
Article
Full-text available
Language is processed on a more or less word-byword basis, and the processing difficulty induced by each word is affected by our prior linguistic experience as well as our general knowledge about the world. Surprisal and entropy reduction have been independently proposed as linking theories between word processing difficulty and probabilistic language models. Extant models, however, are typically limited to capturing linguistic experience and hence cannot account for the influence of world knowledge. A recent comprehension model by Venhuizen, Crocker, and Brouwer (2019, Discourse Processes) improves upon this situation by instantiating a comprehension-centric metric of surprisal that integrates linguistic experience and world knowledge at the level of interpretation and combines them in determining online expectations. Here, we extend this work by deriving a comprehension-centric metric of entropy reduction from this model. In contrast to previous work, which has found that surprisal and entropy reduction are not easily dissociated, we do find a clear dissociation in our model. While both surprisal and entropy reduction derive from the same cognitive process-the word-byword updating of the unfolding interpretation-they reflect different aspects of this process: state-by-state expectation (surprisal) versus end-state confirmation (entropy reduction).
... Moreover, we have shown how these meaning representations can be derived on a word-by-word basis in a neurocomputational model of language processing (see also Frank et al., 2009). Building in this direction, we are currently employing the framework to increase the coverage of a neurocomputational model of the electrophysiology of language comprehension (Brouwer, 2014; Brouwer et al., 2015), to model script-based surprisal (Venhuizen et al., 2016), and to model language production (Calvillo et al., 2016) Scalability. The meaning representations that we employed in our neurocomputational model were derived from a DSS constituted of observations sampled from a microworld. ...
Chapter
Full-text available
The study of language is ultimately about meaning: How can meaning be constructed from linguistic signal, and how can it be represented? The human language comprehension system is highly efficient and accurate at attributing meaning to linguistic input. Hence, in trying to identify computational principles and representations for meaning construction, we should consider how these could be implemented at the neural level in the brain. Here, we introduce a framework for such a neural semantics. This framework offers meaning representations that are neurally plausible (can be implemented in neural hardware), expressive (capture negation, quantification, and modality), compositional (capture complex propositional meaning as the sum of its parts), graded (are probabilistic in nature), and inferential (allow for inferences beyond literal propositional content). Moreover, it is shown how these meaning representations can be constructed incrementally, on a word-by-word basis in a neurocomputational model of language processing.
Article
Full-text available
Natural language semantics has recently sought to combine the complementary strengths of formal and distributional approaches to meaning. More specifically, proposals have been put forward to augment formal semantic machinery with distributional meaning representations, thereby introducing the notion of semantic similarity into formal semantics, or to define distributional systems that aim to incorporate formal notions such as entailment and compositionality. However, given the fundamentally different ‘representational currency’ underlying formal and distributional approaches—models of the world versus linguistic co-occurrence—their unification has proven extremely difficult. Here, we define a Distributional Formal Semantics that integrates distributionality into a formal semantic system on the level of formal models. This approach offers probabilistic, distributed meaning representations that are also inherently compositional, and that naturally capture fundamental semantic notions such as quantification and entailment. Furthermore, we show how the probabilistic nature of these representations allows for probabilistic inference, and how the information-theoretic notion of “information” (measured in terms of Entropy and Surprisal) naturally follows from it. Finally, we illustrate how meaning representations can be derived incrementally from linguistic input using a recurrent neural network model, and how the resultant incremental semantic construction procedure intuitively captures key semantic phenomena, including negation, presupposition, and anaphoricity.
Chapter
Full-text available
Formal semantics and distributional semantics offer complementary strengths in capturing the meaning of natural language. As such, a considerable amount of research has sought to unify them, either by augmenting formal semantic systems with a distributional component, or by defining a formal system on top of distributed representations. Arriving at such a unified framework has, however, proven extremely challenging. One reason for this is that formal and distributional semantics operate on a fundamentally different ‘representational currency’: formal semantics defines meaning in terms of models of the world, whereas distributional semantics defines meaning in terms of linguistic co-occurrence. Here, we pursue an alternative approach by deriving a vector space model that defines meaning in a distributed manner relative to formal models of the world. We will show that the resulting Distributional Formal Semantics offers probabilistic distributed representations that are also inherently compositional, and that naturally capture quantification and entailment. We moreover show that, when used as part of a neural network model, these representations allow for capturing incremental meaning construction and probabilistic inferencing. This framework thus lays the groundwork for an integrated distributional and formal approach to meaning.
ResearchGate has not been able to resolve any references for this publication.