Content uploaded by Filip Ilievski
Author content
All content in this area was uploaded by Filip Ilievski on Oct 20, 2018
Content may be subject to copyright.
Content uploaded by Filip Ilievski
Author content
All content in this area was uploaded by Filip Ilievski on Aug 09, 2017
Content may be subject to copyright.
Moving away from Semantic Overtting
Marten Postma, Filip Ilievski, Piek Vossen, and Marieke van Erp
http://www.understandinglanguagebymachines.org
Vrije Universiteit Amsterdam
Entities and events in the world
have no frequency
In communication, language
expressions and meanings follow a
Zip�an distribution:
)small amount of very
frequent observations
)very long tail of low
frequent observations
Cristiano Ronaldo
Ronaldo de Lima
Cristiano Ronaldo
Ronaldo de Lima
Ronaldo
Ronaldo
POPULARITY POPULARITY
2005 2015
Ronaldinho
Ronaldinho
NLP datasets sample texts but not the world:
)a lack of representativeness of long tail
phenomena:
–models over�t semantically to
head phenomena of time-bound
training data
–models under�t semantically to
tail phenomena of time-bound
target data
Task motivation Approach
)incentivize deep semantic processing linked to the head and the tail phenomena
)the set of references in the long tail is enormous, excessively ambiguous, and context-dependent
)no QA task has deliberately addressed the problem of long tail (co)reference
)an event-driven QA task
)high referential complexity
)represent both global and local events
Requirements
RMultiple event instances per event topic, e.g. the
murder of John Doe and the murder of Jane Roe
RMultiple event mentions per event instance within
the same document
RMultiple documents with varying document creation
times in which the same event instance is described to
capture topical information over time
REvent confusability by combining one or multiple
confusion factors, e.g. polysemy,location,participants
RRepresentation of non-dominant events and entities,
i.e. instances that receive little media coverage
Confusion factors
Confusion factor Example
ambiguity of John Smith res a gun
event forms John Smith res an employee
variance of John Smith kills John Doe
event forms John Smith murders John Doe
time murder A that happened in June
murder B in October
participants murder A committed by John Doe
murder B committed by the Roe couple
location murder A that happened in Zaire
murder B in Oklahoma
Task creatio n
Pick a subset of
ECB+ topics
Select one or more
confusability
factors
Increase the
amount of events
for an event topic
Retrieve multiple
event mentions for
each event
We favor seminal events
(e.g. murder) whose
surface forms have a
high lexical ambiguity
and/or variance.
Based on the
confusability factors.
We use local news
sources to ensure low
dominance.
View publication statsView publication stats