Conference PaperPDF Available

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Authors:

Abstract and Figures

While cross-lingual techniques are finding increasing success in a wide range of Natural Language Processing tasks, their application to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language adopts its own linguistic formalism, from PropBank for English to AnCora for Spanish and PDT-Vallex for Czech, inter alia. In this work, we address this issue and present a unified model to perform cross-lingual SRL over heterogeneous linguistic resources. Our model implicitly learns a high-quality mapping for different formalisms across diverse languages without resorting to word alignment and/or translation techniques. We find that, not only is our cross-lingual system competitive with the current state of the art but that it is also robust to low-data scenarios. Most interestingly, our unified model is able to annotate a sentence in a single forward pass with all the inventories it was trained with, providing a tool for the analysis and comparison of linguistic theories across different languages. We release our code and model at https://github.com/SapienzaNLP/unify-srl.
Content may be subject to copyright.
Proceedings of the 2021 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, pages 338–351
June 6–11, 2021. ©2021 Association for Computational Linguistics
338
Unifying Cross-Lingual Semantic Role Labeling
with Heterogeneous Linguistic Resources
Simone Conia Andrea Bacciu Roberto Navigli
Sapienza NLP Group
Department of Computer Science
Sapienza University of Rome
{first.lastname}@uniroma1.it
Abstract
While cross-lingual techniques are finding in-
creasing success in a wide range of Natural
Language Processing tasks, their application
to Semantic Role Labeling (SRL) has been
strongly limited by the fact that each language
adopts its own linguistic formalism, from Prop-
Bank for English to AnCora for Spanish and
PDT-Vallex for Czech, inter alia. In this work,
we address this issue and present a unified
model to perform cross-lingual SRL over het-
erogeneous linguistic resources. Our model
implicitly learns a high-quality mapping for
different formalisms across diverse languages
without resorting to word alignment and/or
translation techniques. We find that, not only is
our cross-lingual system competitive with the
current state of the art but that it is also robust
to low-data scenarios. Most interestingly, our
unified model is able to annotate a sentence
in a single forward pass with all the invento-
ries it was trained with, providing a tool for
the analysis and comparison of linguistic theo-
ries across different languages. We release our
code and model at https://github.com/
SapienzaNLP/unify-srl.
1 Introduction
Semantic Role Labeling (SRL) – a long-standing
open problem in Natural Language Processing
(NLP) and a key building block of language un-
derstanding (Navigli,2018) – is often defined as
the task of automatically addressing the question
“Who did what to whom, when, where, and how?”
(Gildea and Jurafsky,2000;Màrquez et al.,2008).
While the need to manually engineer and fine-tune
complex feature templates severely limited early
work (Zhao et al.,2009), the great success of neu-
ral networks in NLP has resulted in impressive
progress in SRL, thanks especially to the ability of
recurrent networks to better capture relations over
sequences (He et al.,2017;Marcheggiani et al.,
2017). Owing to the recent wide availability of
robust multilingual representations, such as multi-
lingual word embeddings (Grave et al.,2018) and
multilingual language models (Devlin et al.,2019;
Conneau et al.,2020), researchers have been able
to shift their focus to the development of models
that work on multiple languages (Cai and Lapata,
2019b;He et al.,2019;Lyu et al.,2019).
A robust multilingual representation is neverthe-
less just one piece of the puzzle: a key challenge in
multilingual SRL is that the task is tightly bound to
linguistic formalisms (Màrquez et al.,2008) which
may present significant structural differences from
language to language (Hajic et al.,2009). In the re-
cent literature, it is standard practice to sidestep this
issue by training and evaluating a model on each
language separately (Cai and Lapata,2019b;Chen
et al.,2019;Kasai et al.,2019;He et al.,2019;Lyu
et al.,2019). Although this strategy allows a model
to adapt itself to the characteristics of a given for-
malism, it is burdened by the non-negligible need
for training and maintaining one model instance
for each language, resulting in a set of monolingual
systems.
Instead of dealing with heterogeneous linguis-
tic theories, another line of research consists in
actively studying the effect of using a single for-
malism across multiple languages through annota-
tion projection or other transfer techniques (Akbik
et al.,2015,2016;Daza and Frank,2019;Cai and
Lapata,2020;Daza and Frank,2020). However,
such approaches often rely on word aligners and/or
automatic translation tools which may introduce
a considerable amount of noise, especially in low-
resource languages. More importantly, they rely on
the strong assumption that the linguistic formalism
of choice, which may have been developed with a
specific language in mind, is also suitable for other
languages.
In this work, we take the best of both worlds
and propose a novel approach to cross-lingual SRL.
Our contributions can be summarized as follows:
339
We introduce a unified model to perform
cross-lingual SRL with heterogeneous linguis-
tic resources;
We find that our model is competitive against
state-of-the-art systems on all the 6 languages
of the CoNLL-2009 benchmark;
We show that our model is robust to low-
resource scenarios, thanks to its ability to gen-
eralize across languages;
We probe our model and demonstrate that it
implicitly learns to align heterogeneous lin-
guistic resources;
We automatically build and release a cross-
lingual mapping that aligns linguistic for-
malisms from diverse languages.
We hope that our unified model will further ad-
vance cross-lingual SRL and represent a tool for
the analysis and comparison of linguistic theories
across multiple languages.
2 Related Work
End-to-end SRL.
The SRL pipeline is usually
divided into four steps: predicate identification,
predicate sense disambiguation, argument identi-
fication, and argument classification. While early
research focused its efforts on addressing each step
individually (Xue and Palmer,2004;Björkelund
et al.,2009;Zhao et al.,2009), recent work has suc-
cessfully demonstrated that tackling some of these
subtasks jointly with multitask learning (Caruana,
1997) is beneficial. In particular, He et al. (2018)
and, subsequently, Cai et al. (2018), Li et al. (2019)
and Conia et al. (2020), indicate that predicate
sense signals aid the identification of predicate-
argument relations. Therefore, we follow this line
and propose an end-to-end system for cross-lingual
SRL.
Multilingual SRL.
Current work in multilingual
SRL revolves mainly around the development of
novel neural architectures, which fall into two
broad categories, syntax-aware and syntax-agnostic
ones. On one hand, the quality and diversity of
the information encoded by syntax is an enticing
prospect that has resulted in a wide range of con-
tributions: Marcheggiani and Titov (2017) made
use of Graph Convolutional Networks (GCNs) to
better capture relations between neighboring nodes
in syntactic dependency trees; Strubell et al. (2018)
demonstrated the effectiveness of linguistically-
informed self-attention layers in SRL; Cai and
Lapata (2019b) observed that syntactic dependen-
cies often mirror semantic relations and proposed
a model that jointly learns to perform syntactic
dependency parsing and SRL; He et al. (2019) de-
vised syntax-based pruning rules that work for mul-
tiple languages. On the other hand, the complexity
of syntax and the noisy performance of automatic
syntactic parsers have deterred other researchers
who, instead, have found methods to improve SRL
without syntax: Cai et al. (2018) took advantage
of an attentive biaffine layer (Dozat and Manning,
2017) to better model predicate-argument relations;
Chen et al. (2019) and Lyu et al. (2019) obtained
remarkable results in multiple languages by cap-
turing predicate-argument interactions via capsule
networks and iteratively refining the sequence of
output labels, respectively; Cai and Lapata (2019a)
proposed a semi-supervised approach that scales
across different languages.
While we follow the latter trend and develop a
syntax-agnostic model, we underline that both the
aforementioned syntax-aware and syntax-agnostic
approaches suffer from a significant drawback:
they require training one model instance for each
language of interest. Their two main limitations
are, therefore, that i) the number of trainable pa-
rameters increases linearly with the number of lan-
guages, and ii) the information available in one
language cannot be exploited to make SRL more
robust in other languages. In contrast, one of the
main objectives of our work is to develop a unified
cross-lingual model which can mitigate the paucity
of training data in some languages by exploiting
the information available in other, resource-richer
languages.
Cross-lingual SRL.
A key challenge in perform-
ing cross-lingual SRL with a single unified model
is the dissimilarity of predicate sense and semantic
role inventories between languages. For example,
the multilingual dataset distributed as part of the
CoNLL-2009 shared task (Hajic et al.,2009) adopts
the English Proposition Bank (Palmer et al.,2005)
and NomBank (Meyers et al.,2004) to annotate En-
glish sentences, the Chinese Proposition Bank (Xue
and Palmer,2009) for Chinese, the AnCora (Taulé
et al.,2008) predicate-argument structure inventory
for Catalan and Spanish, the German Proposition
Bank which, differently from the other PropBanks,
is derived from FrameNet (Hajic et al.,2009), and
340
PDT-Vallex (Hajic et al.,2003) for Czech. Many of
these inventories are not aligned with each other as
they follow and implement different linguistic the-
ories which, in turn, may pose different challenges.
Padó and Lapata (2009), and Akbik et al. (2015,
2016) worked around these issues by making the
English PropBank act as a universal predicate
sense and semantic role inventory and projecting
PropBank-style annotations from English onto non-
English sentences by means of word alignment
techniques applied to parallel corpora such as Eu-
roparl (Koehn,2005). These efforts resulted in the
creation of the Universal PropBank, a multilingual
collection of semi-automatically annotated corpora
for SRL, which is actively in use today to train and
evaluate novel cross-lingual methods such as word
alignment techniques (Aminian et al.,2019). In
the absence of parallel corpora, annotation projec-
tion techniques can still be applied by automati-
cally translating an annotated corpus and then pro-
jecting the original labels onto the newly created
silver corpus (Daza and Frank,2020;Fei et al.,
2020), whereas Daza and Frank (2019) have re-
cently found success in training an encoder-decoder
architecture to jointly tackle SRL and translation.
While the foregoing studies have greatly ad-
vanced the state of cross-lingual SRL, they suffer
from an intrinsic downside: using translation and
word alignment techniques may result in a consider-
able amount of noise, which automatically puts an
upper bound to the quality of the projected labels.
Moreover, they are based on the strong assumption
that the English PropBank provides a suitable for-
malism for non-English languages, and this may
not always be the case. Among the numerous stud-
ies that adopt the English PropBank as a universal
predicate-argument structure inventory for cross-
lingual SRL, the work of Mulcaire et al. (2018)
stands out for proposing a bilingual model that is
able to perform SRL according to two different
inventories at the same time, although with signif-
icantly lower results compared to the state of the
art at the time. With our work, we go beyond cur-
rent approaches to cross-lingual SRL and embrace
the diversity of the various representations made
available in different languages. In particular, our
model has three key advantages: i) it does not rely
on word alignment or machine translation tools; ii)
it learns to perform SRL with multiple linguistic in-
ventories; iii) it learns to link resources that would
otherwise be disconnected from each other.
3 Model Description
In the wake of recent work in SRL, our model falls
into the broad category of end-to-end systems as
it learns to jointly tackle predicate identification,
predicate sense disambiguation, argument identi-
fication and argument classification. The model
architecture can be roughly divided into the follow-
ing components:
A universal sentence encoder whose pa-
rameters are shared across languages and
which produces word encodings that capture
predicate-related information (Section 3.2);
A universal predicate-argument encoder
whose parameters are also shared across lan-
guages and which models predicate-argument
relations (Section 3.3);
A set of language-specific decoders which in-
dicate whether words are predicates, select
the most appropriate sense for each predicate,
and assign a semantic role to every predicate-
argument couple, according to several differ-
ent SRL inventories (Section 3.4).
Unlike previous work, our model does not require
any preexisting cross-resource mappings, word
alignment techniques, translation tools, other an-
notation transfer techniques, or parallel data, to
perform high-quality cross-lingual SRL, as it relies
solely on implicit cross-lingual knowledge transfer.
3.1 Input representation
Pretrained language models such as ELMo (Peters
et al.,2018), BERT (Devlin et al.,2019) and XLM-
RoBERTa (Conneau et al.,2020), inter alia, are
becoming the de facto input representation method,
thanks to their ability to encode vast amounts of
knowledge. Following recent studies (Hewitt and
Manning,2019;Kuznetsov and Gurevych,2020;
Conia and Navigli,2020), which show that differ-
ent layers of a language model capture different
syntactic and semantic characteristics, our model
builds a contextual representation for an input word
by concatenating the corresponding hidden states
of the four top-most inner layers of a language
model. More formally, given a word
wi
in a
sentence
w=hw0, w1, . . . , wi, . . . , wn1i
of
n
words and its hidden state
hk
i=lk(wi|w)
from
the
k
-th inner layer
lk
of a language model with
K
layers, the model computes the word encoding
ei
341
as follows:
hi=hK
ihK1
ihK2
ihK3
i
ei= Swish(Wwhi+bw)
where
xy
is the concatenation of the two vec-
tors
x
and
y
, and
Swish(x) = x·sigmoid(x)
is
a non-linear activation which was found to pro-
duce smoother gradient landscapes than the more
traditional ReLU (Ramachandran et al.,2018).
3.2 Universal sentence encoder
Expanding on the seminal intuition of Fillmore
(1968), who suggests the existence of deep seman-
tic relations between a predicate and other sen-
tential constituents, we argue that such semantic
relations may be preserved across languages. With
this reasoning in mind, we devise a universal sen-
tence encoder whose parameters are shared across
languages. Intuitively, the aim of our universal
sentence encoder is to capture sentence-level in-
formation that is not formalism-specific and spans
across languages, such as information about pred-
icate positions and predicate senses. In our case,
we implement this universal sentence encoder as a
stack of BiLSTM layers (Hochreiter and Schmidhu-
ber,1997), similarly to Marcheggiani et al. (2017),
Cai et al. (2018) and He et al. (2019), with the
difference that we concatenate the output of each
layer to its input in order to mitigate the problem
of vanishing gradients. More formally, given a se-
quence of word encodings
e=he0,e1,...,en1i
,
the model computes a sequence of timestep encod-
ings tas follows:
tj
i=(eiif j= 0
tj1
iBiLSTMj
i(tj1)otherwise
t=htK0
0,tK0
1,...,tK0
n1i
where
BiLSTMj
i(·)
is the
i
-th timestep of the
j
-th
BiLSTM layer and
K0
is the total number of layers
in the stack. Starting from each timestep encoding
ti
, the model produces a predicate representation
pi
, which captures whether the corresponding word
wi
is a predicate, and a sense representation
si
which encodes information about the sense of a
predicate at position i:
pi= Swish(Wpti+bp)
si= Swish(Wsti+bs)
We stress that the vector representations obtained
for each timestep, each predicate and each sense lie
in three spaces that are shared across the languages
and formalisms used to perform SRL.
3.3 Universal predicate-argument encoder
In the same vein, and for the same reasoning
that motivated the design of the above universal
sentence encoder, our model includes a universal
predicate-argument encoder whose parameters are
also shared across languages. The objective of
this second encoder is to capture the relations be-
tween each predicate-argument couple that appears
in a sentence, independently of the input language.
Similarly to the universal sentence encoder, we
implement this universal predicate-argument en-
coder as a stack of BiLSTM layers. More for-
mally, let
wp
be a predicate in the input sentence
w=hw0, w1, . . . , wp, . . . , wn1i
, then the model
computes a sequence of predicate-specific argu-
ment encodings aas follows:
aj
i=(tptiif j= 0
aj1
iBiLSTMj
i(aj1)otherwise
a=haK00
0,aK00
1,...,aK00
n1i
where
ti
is the
i
-th timestep encoding from the uni-
versal sentence encoder and
K00
is the total number
of layers in the stack. Starting from each predicate-
specific argument encoding
ai
, the model produces
a semantic role representation rifor word wi:
ri= Swish(Wrai+br)
Similarly to the predicate and sense representations
p
and
s
, since the predicate-argument encoder is
one and the same for all languages, the seman-
tic role representation
r
obtained must draw upon
cross-lingual information in order to abstract from
language-specific peculiarities.
3.4 Language-specific decoders
The aforementioned predicate encodings
p
, sense
encodings
s
and semantic role encodings
r
are
shared across languages, forcing the model to learn
from semantics rather than from surface-level fea-
tures such as word order, part-of-speech tags and
syntactic rules, all of which may differ from lan-
guage to language. Ultimately, however, we want
our model to provide semantic role annotations
according to an existing predicate-argument struc-
ture inventory, e.g., PropBank, AnCora, or PDT-
Vallex. Our model, therefore, includes a set of
linear decoders that indicate whether a word
wi
is
342
a predicate, what the most appropriate sense for
a predicate
wp
is, and what the semantic role of a
word
wr
with respect to a specific predicate
wp
is,
for each language l:
σp(wi|l) = Wp|lpi+bp|l
σs(wp|l) = Ws|lsi+bs|l
σr(wr|wp, l) = Wr|lri+br|l
Although we could have opted for more complex
decoding strategies, in our case linear decoders
have two advantages: 1) they keep the language-
specific part of the model as simple as possible,
pushing the model into learning from its univer-
sal encoders; 2) they can be seen as linear probes,
providing an insight into the quality of the cross-
lingual knowledge that the model can capture.
3.5 Training objective
The model is trained to jointly minimize the sum
of the categorical cross-entropy losses on predicate
identification, predicate sense disambiguation and
argument identification/classification over all the
languages in a multitask learning fashion. More
formally, given a language
l
and the corresponding
predicate identification loss
Lp|l
, predicate sense
disambiguation loss
Ls|l
and argument identifica-
tion/classification loss
Lr|l
, the cumulative loss
L
is:
L=X
lLLp|l+Ls|l+Lr|l
where
L
is the set of languages – and the corre-
sponding formalisms – in the training set.
4 Experiments
We evaluate our model in dependency-based mul-
tilingual SRL. The remainder of this Section de-
scribes the experimental setup (Section 4.1), pro-
vides a brief overview of the multilingual dataset
we use for training, validation and testing (Sec-
tion 4.2), and shows the results obtained on each
language (Section 4.3).
4.1 Experimental Setup
We implemented the model in PyTorch
1
and Py-
Torch Lightning
2
, and used the pretrained lan-
guage models for multilingual BERT (m-BERT)
and XLM-RoBERTa (XLM-R) made available by
the Transformers library (Wolf et al.,2020). We
1https://pytorch.org
2https://www.pytorchlightning.ai
trained each model configuration for 30 epochs us-
ing Adam (Kingma and Ba,2015) with a “slanted
triangle” learning rate scheduling strategy which
linearly increases the learning rate for 1 epoch and
then linearly decreases the value for 15 epochs. We
did not perform hyperparameter tuning and opted
instead for standard values used in the literature;
we provide more details about our model configu-
ration and its hyperparameter values in Appendix
A. In the remainder of this Section, we report the
F
1
scores of the best models selected according to
the highest F
1
score obtained on the validation set
at the end of a training epoch.3
4.2 Dataset
To the best of our knowledge, the dataset provided
as part of the CoNLL-2009 shared task (Hajic et al.,
2009) is the largest and most diverse collection of
human-annotated sentences for multilingual SRL.
It comprises 6 languages
4
, namely, Catalan, Chi-
nese, Czech, English, German and Spanish, which
belong to different linguistic families and feature
significantly varying amounts of training samples,
from 400K predicate instances in Czech to only
17K in German; we provide an overview of the
statistics of each language in Appendix B. CoNLL-
2009 is the ideal testbed for evaluating the ability
of our unified model to generalize across hetero-
geneous resources since each language adopts its
own linguistic formalism, from English PropBank
to PDT-Vallex, from Chinese PropBank to AnCora.
We also include VerbAtlas (Di Fabio et al.,2019),
a recently released resource for SRL
5
, with the aim
of understanding whether our model can learn to
align inventories that are based on “distant” linguis-
tic theories; indeed, VerbAtlas is based on cluster-
ing WordNet synsets into frames that share similar
semantic behavior, whereas PropBank-based re-
sources enumerate and define the possible senses
of a lexeme.
As a final note, we did not evaluate our model
on Universal PropBank
6
since i) it was semi-
automatically generated through annotation pro-
3
Hereafter, all the results of our experiments are computed
by the official scorer of the CoNLL-2009 shared task, available
at https://ufal.mff.cuni.cz/conll2009-st/scorer.html.
4
The CoNLL-2009 shared task originally included a sev-
enth language, Japanese, which is not available anymore on
LDC due to licensing issues.
5
We build a training set for VerbAtlas using the mapping
from PropBank available at http://verbatlas.org.
6https://github.com/System-T/
UniversalPropositions
343
CONLL-2009 - MULTILINGUAL - INDOMAIN CA CZ DE EN ES ZH
CoNLL-2009 ST best 80.3 85.4 79.7 85.6 80.5 78.6
Marcheggiani et al. (2017)— 86.0 — 87.7 80.3 81.2
Chen et al. (2019)81.7 88.1 76.4 91.1 81.3 81.7
Cai and Lapata (2019b) 82.7 90.0 81.8 83.6
Cai and Lapata (2019a) 83.8 91.2 82.9 85.0
Lyu et al. (2019)80.9 87.5 75.8 90.1 80.5 83.3
He et al. (2019)86.0 89.7 81.1 90.9 85.2 86.9
This work m-BERT frozen / monolingual 86.2 90.0 85.2 90.5 85.0 86.4
This work m-BERT / monolingual 86.8 90.3 85.8 90.7 85.3 86.9
This work m-BERT / cross-lingual 87.1 90.8 86.5 91.0 85.6 87.3
This work XLM-R frozen / monolingual 86.8 90.4 86.5 90.8 85.2 86.9
This work XLM-R / monolingual 87.8 91.6 87.6 91.6 86.0 87.5
This work XLM-R / cross-lingual 88.0 91.5 88.0 91.8 86.3 87.7
Table 1: F1scores on the in-domain evaluation CoNLL-2009 with gold pre-identified predicates. “CoNLL-2009
ST best" refers to the best results obtained (by different systems) during the Shared Task. We include all the
systems that reported results in at least 4 languages. : syntax-aware system. : syntax-agnostic system.
CONLL-2009 - OOD CZ DE EN
CoNLL-2009 ST best 85.4 65.9 73.3
Zhao et al. (2009)82.7 67.8 74.6
Marcheggiani et al. (2017)87.2 — 77.7
Li et al. (2019) — 81.5
Chen et al. (2019) — 82.7
Lyu et al. (2019)86.0 65.7 82.2
This work m-BERT / mono 90.4 72.6 84.6
This work m-BERT / cross 91.0 73.0 85.0
This work XLM-R / mono 90.8 73.9 83.7
This work XML-R / cross 91.1 74.2 84.3
Table 2: F1scores on the out-of-domain evaluation of
CoNLL-2009 with gold pre-identified predicates.
jection techniques, and ii) it uses the English Prop-
Bank for all languages, which goes against our
interest in capturing cross-lingual knowledge over
heterogeneous inventories.
4.3 Results
Cross-lingual SRL.
Table 1compares the results
obtained by our unified cross-lingual model against
the state of the art in multilingual SRL, including
both syntax-agnostic and syntax-aware architec-
tures, on the in-domain test sets of CoNLL-2009
when using gold pre-identified predicates, rather
than the predicates identified by the model itself,
as standard in the CoNLL-2009 shared task. While
proposing a state-of-the-art architecture is not the
focus of this work, we believed it was important
to build our cross-lingual approach starting from
a strong and consistent baseline. For this reason,
Table 1includes the results obtained when training
a separate instance of our model for each language,
using the same strategy adopted by current multi-
lingual systems (Cai and Lapata,2019a;He et al.,
2019;Lyu et al.,2019) and showing results that are
competitive with He et al. (2019), inter alia. Re-
markably, thanks to its universal encoders shared
across languages and formalisms, our unified cross-
lingual model outperforms our state-of-the-art base-
line in all the 6 languages at a fraction of the cost in
terms of number of trainable parameters (a single
cross-lingual model against six monolingual mod-
els, each trained on a different language). Similar
results can be seen in Table 2where our cross-
lingual approach improves over the state of the
art on the out-of-domain evaluation of CoNLL-
2009, especially in the German and English test
sets which were purposely built to include predi-
cates that do not appear in the training set. These
results confirm empirically our initial hunch that
semantic role labeling relations are deeply rooted
beyond languages, independently of their surface
realization and their predicate-argument structure
inventories.
Finally, for completeness, Appendix Eincludes
the results of our system on the individual subtasks,
namely, predicate identification and predicate sense
344
CONLL-2009 - INDOMAIN CA CZ DE EN ES ZH
This work XLM-R / monolingual / 10% training 52.7 79.9 60.2 81.7 49.2 72.9
This work XLM-R / cross-lingual / 10% training 78.2 84.0 69.9 84.3 76.1 78.6
This work XLM-R / monolingual / 1-shot learning 44.5 21.8 40.9 67.4 46.5 72.1
This work XLM-R / cross-lingual / 1-shot learning 63.2 28.9 50.1 70.2 62.6 73.6
This work XLM-R / cross-lingual / 1-shot learning / 100% EN 66.4 29.6 55.5 91.6* 64.3 76.7
Table 3: F1scores on the in-domain evaluation CoNLL-2009 with gold pre-identified predicates for low-resource
(top) and one-shot learning (bottom) scenarios. *: the result in EN on the last line is not directly comparable with
those above as we use the full English training set.
disambiguation.
Low-resource cross-lingual SRL.
We evaluate
the robustness of our model in low-resource cross-
lingual SRL by artificially reducing the training set
of each language to 10% of its original size. Table
3(top) reports the results obtained by our model
when trained separately on the reduced training set
of each language (monolingual), and the results
obtained by the same model when trained on the
union of the reduced training sets (cross-lingual).
The improvements of our cross-lingual approach
compared to the more traditional monolingual base-
line are evident, especially in lower-resource sce-
narios, with absolute improvements in F
1
score of
25.5%, 9.7% and 26.9% on the Catalan, German
and Spanish test sets, respectively. This is thanks to
the ability of the model to use the knowledge from
a language to improve its performance on other
languages.
One-shot cross-lingual SRL.
An interesting
open question in SRL is whether a system can learn
to model the semantic relations between a predicate
sense
s
and its arguments, given a limited number
of training samples in which
s
appears. In particu-
lar in our case, we are interested in understanding
how the model fares in a synthetic scenario where
each sense appears at most once in the training
set, that is, we evaluate our model in a one-shot
learning setting. As we can see from Table 3(bot-
tom), our cross-lingual approach outperforms its
monolingual counterpart trained on each synthetic
dataset separately by a wide margin, once again
providing strong absolute improvements – 18.7%
in Catalan, 9.2% in German and 16.1% in Span-
ish in terms of F
1
score – for languages where the
number of training instances is smaller.
It is not uncommon for supervised cross-lingual
tasks to feature different amounts of data for each
language, depending on how difficult it is to get
manual annotations for each language of interest.
We simulate this setting in SRL by training our
model on 100% of the training data available for
the English language, while keeping the one-shot
learning setting for all the other languages. As
Table 3(bottom) shows, non-English languages
exhibit further improvements as the number of
English training samples increases, lending fur-
ther credibility to the idea that SRL can be learnt
across languages even when using heterogeneous
resources. Not only do these results suggest that
a cross-lingual/cross-resource approach might mit-
igate the need for a large training set in each lan-
guage, but also that reasonable cross-lingual re-
sults may be obtained by maintaining a single large
dataset for a high-resource language, together with
several small datasets for low-resource languages.
5 Analysis and Discussion
Cross-formalism SRL.
In contrast to existing
multilingual systems, a key benefit of our unified
cross-lingual model is its ability to provide annota-
tions for predicate senses and semantic roles in any
linguistic formalism. As we can see from Figure 1
(left), given the English sentence “the cat threw its
ball out of the window”, our language-specific de-
coders produce predicate sense and semantic role
labels not only according to the English PropBank
inventory, but also for all the other resources, as it
correctly identifies the agentive and patientive con-
stituents independently of the formalism of inter-
est. And this is not all, our model may potentially
work on any of the 100 languages supported by
the underlying language model (m-BERT or XLM-
RoBERTa), e.g., in Italian, as shown in Figure 1
(right). This is vital for those languages for which
a predicate-argument structure inventory has not
yet been developed – an endeavor that may take
345
Figure 1: Thanks to its universal encoders, our unified cross-lingual model is able to provide predicate sense and
semantic role labels according to several linguistic formalisms. Left: SRL labels for an English input sentence.
Right: SRL labels for an Italian input sentence, which can be translated into English as “The president refuses the
help of the opponents”. Notice that Italian is not among the languages in the training set.
EN - English PropBank
EN - VerbAtlas
ES - AnCora
CA - AnCora
DE - German PropBank
ZH - Chinese PropBank







 

开始







批评
攻击




















报道







提醒
警告











指责



Figure 2: Visualization of the cross-resource mapping learnt by our model. Left: Mapping from Chinese PropBank,
German PropBank and AnCora (both Catalan and Spanish) to English PropBank. Right: Mapping from English
PropBank, German PropBank, Chinese PropBank and AnCora (both Spanish and Catalan) to VerbAtlas.
years to come to fruition – and, therefore, manually
annotated data are unavailable. Thus, as long as a
large amount of pretraining data is openly accessi-
ble, our system provides a robust cross-lingual tool
to compare and analyze different linguistic theories
and formalisms across a wide range of languages,
on the one hand, and to overcome the issue of per-
forming SRL on languages where no inventory is
available, on the other.
Aligning heterogeneous resources.
As briefly
mentioned previously, the universal encoders in
the model architecture force our system to learn
cross-lingual features that are important across
different formalisms. A crucial consequence of
this approach is that the model learns to implic-
itly align the resources it is trained on, without
the aid of word aligners and translation tools, even
when these resources may be designed around spe-
cific languages and, therefore, present significant
differences. In order to bring to light what our
model implicitly learns to align in its shared cross-
lingual space (see Sections 3.2 and 3.3), we exploit
its language-specific decoders to build a mapping
from any source inventory, e.g., AnCora, to a target
inventory, e.g., the English PropBank. In particular,
we use our cross-lingual model to label a training
set originally tagged with a source inventory to
produce silver annotations according to a target
inventory, similarly to what is shown in Figure 1.
While producing the silver annotations, we keep
track of the number of times each predicate sense
in the source inventory is associated by the model
with a predicate sense of the target inventory. As
a result, we produce a weighted directed graph in
which the nodes are predicate senses and an edge
346
(a, b)
with weight
w
indicates that our model maps
the source predicate sense
a
to the target predicate
sense
b
at least
w
times. A portion of this graph
is displayed in Figure 2where, for visualization
purposes, we show the most frequent alignments
for each language, i.e., the top-3 edges with largest
weight from the nodes of each inventory to the
nodes of the English PropBank (Figure 2, left) and
to the nodes of VerbAtlas (Figure 2, right).7
For example, Figure 2(left) shows that our
model learns to map the Spanish AnCora sense em-
pezar.c1 and the German PropBank sense starten.2
to the English PropBank sense start.01, but also
that, depending on the context, the Chinese Prop-
Bank sense
.01 can correspond to both start.01
and begin.01. Figure 2(right) also shows that
our model learns to map senses from different lan-
guages and formalisms to the coarse-grained senses
of VerbAtlas, even though the latter formalism is
quite distant from the others as its frames are based
on clustering WordNet synsets – sets of synony-
mous words – that share similar semantic behavior,
rather than enumerating and defining all the possi-
ble senses of a lexeme as in the English and Chi-
nese PropBanks. To the best of our knowledge, our
unified model is the first transfer-based tool to auto-
matically align diverse linguistic resources across
languages without relying on human supervision.
6 Conclusion and Future Work
On one hand, recent research in multilingual SRL
has focused mainly on proposing novel model ar-
chitectures that achieve state-of-the-art results, but
require a model instance to be trained on and for
each language of interest. On the other hand, the
latest developments in cross-lingual SRL have re-
volved around using the English PropBank inven-
tory as a universal resource for other languages
through annotation transfer techniques. Following
our hunch that semantic relations may be deeply
rooted beyond the surface realizations that distin-
guish one language from another, we propose a
new approach to cross-lingual SRL and present a
model which learns from heterogeneous linguistic
resources in order to obtain a deeper understanding
of sentence-level semantics. To achieve this objec-
tive, we equip our model architecture with “uni-
versal” encoders which share their weights across
7
We release the full alignment and the correspond-
ing graph at
https://github.com/SapienzaNLP/
unify-srl.
languages and are, therefore, forced to learn knowl-
edge that spans across varying formalisms.
Our unified cross-lingual model, evaluated on
the gold multilingual benchmark of CoNLL-2009,
outperforms previous state-of-the-art multilingual
systems over 6 diverse languages, ranging from
Catalan to Czech, from German to Chinese, and,
at the same time, also considerably reduces the
amount of trainable parameters required to support
different linguistic formalisms. And this is not all.
We find that our approach is robust to low-resource
scenarios where the model is able to exploit the
complementary knowledge contained in the train-
ing set of different languages.
Most importantly, our model is able to provide
predicate sense and semantic role labels according
to 7 predicate-argument structure inventories in a
single forward pass, facilitating comparisons be-
tween different linguistic formalisms and investiga-
tions about interlingual phenomena. Our analysis
shows that, thanks to the prior knowledge encoded
in recent pretrained language models and our focus
on learning from cross-lingual features, our model
can be used on languages that were never seen at
training time, opening the door to alignment-free
cross-lingual SRL on languages where a predicate-
argument structure inventory is not yet available.
Finally, we show that our model implicitly learns
to align heterogeneous resources, providing useful
insights into inter-resource relations. We leave an
in-depth qualitative and quantitative analysis of the
learnt inter-resource mappings for future work.
We hope that our work can set a stepping
stone for future developments towards the uni-
fication of heterogeneous SRL. We release the
code to reproduce our experiments and the check-
points of our best models at
https://github.
com/SapienzaNLP/unify-srl.
Acknowledgments
The authors gratefully acknowledge
the support of the ERC Consolida-
tor Grant MOUSSE No. 726487 un-
der the European Union’s Horizon
2020 research and innovation pro-
gramme.
This work was supported in part by the MIUR
under grant “Dipartimenti di eccellenza 2018-
2022” of the Department of Computer Science of
Sapienza University.
347
References
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yun-
yao Li, Shivakumar Vaithyanathan, and Huaiyu Zhu.
2015. Generating high quality proposition banks for
multilingual Semantic Role Labeling. In Proceed-
ings of ACL.
Alan Akbik, Vishwajeet Kumar, and Yunyao Li. 2016.
Towards semi-automatic generation of proposition
banks for low-resource languages. In Proceedings
of EMNLP.
Maryam Aminian, Mohammad Sadegh Rasooli, and
Mona Diab. 2019. Cross-lingual transfer of seman-
tic roles: From raw text to semantic roles. In Pro-
ceedings of IWCS.
Anders Björkelund, Love Hafdell, and Pierre Nugues.
2009. Multilingual Semantic Role Labeling. In Pro-
ceedings of CoNLL.
Jiaxun Cai, Shexia He, Zuchao Li, and Hai Zhao. 2018.
A full end-to-end semantic role labeler, syntactic-
agnostic over syntactic-aware? In Proceedings
COLING.
Rui Cai and Mirella Lapata. 2019a. Semi-supervised
Semantic Role Labeling with cross-view training. In
Proceedings of EMNLP.
Rui Cai and Mirella Lapata. 2019b. Syntax-aware Se-
mantic Role Labeling without parsing.Transactions
of ACL (TACL), 7:343–356.
Rui Cai and Mirella Lapata. 2020. Alignment-free
cross-lingual Semantic Role Labeling. In Proceed-
ings of EMNLP.
Rich Caruana. 1997. Multitask learning. Machine
Learning, 28(1):41–75.
Xinchi Chen, Chunchuan Lyu, and Ivan Titov. 2019.
Capturing argument interaction in Semantic Role La-
beling with capsule networks. In Proceedings of
EMNLP.
Simone Conia, Fabrizio Brignone, Davide Zanfardino,
and Roberto Navigli. 2020. InVeRo: Making Se-
mantic Role Labeling accessible with intelligible
verbs and roles. In Proceedings of EMNLP.
Simone Conia and Roberto Navigli. 2020. Bridg-
ing the gap in multilingual semantic role labeling:
a language-agnostic approach. In Proceedings of
COLING.
Alexis Conneau, Kartikay Khandelwal, Naman Goyal,
Vishrav Chaudhary, Guillaume Wenzek, Francisco
Guzmán, Edouard Grave, Myle Ott, Luke Zettle-
moyer, and Veselin Stoyanov. 2020. Unsupervised
cross-lingual representation learning at scale. In
Proceedings of ACL.
Angel Daza and Anette Frank. 2019. Translate and la-
bel! an encoder-decoder approach for cross-lingual
semantic role labeling. In Proceedings of EMNLP,
pages 603–615.
Angel Daza and Anette Frank. 2020. X-SRL: A paral-
lel cross-lingual Semantic Role Labeling dataset. In
Proceedings of EMNLP.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training of
deep bidirectional Transformers for language under-
standing. In Proceedings of NAACL.
Andrea Di Fabio, Simone Conia, and Roberto Navigli.
2019. VerbAtlas: a novel large-scale verbal seman-
tic resource and its application to Semantic Role La-
beling. In Proceedings of EMNLP.
Timothy Dozat and Christopher D. Manning. 2017.
Deep biaffine attention for neural dependency pars-
ing. In Proceedings of ICLR.
Hao Fei, Meishan Zhang, and Donghong Ji. 2020.
Cross-lingual Semantic Role Labeling with high-
quality translated training corpus. In Proceedings
of ACL.
Charles J. Fillmore. 1968. The case for case. Univer-
sals in Linguistic Theory.
Daniel Gildea and Daniel Jurafsky. 2000. Automatic
labeling of semantic roles. In Proceedings of ACL.
Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Ar-
mand Joulin, and Tomas Mikolov. 2018. Learning
word vectors for 157 languages. In Proceedings of
LREC.
Jan Hajic, Massimiliano Ciaramita, Richard Johans-
son, Daisuke Kawahara, Maria Antònia Martí, Lluís
Màrquez, Adam Meyers, Joakim Nivre, Sebastian
Padó, Jan Stepánek, Pavel Stranák, Mihai Surdeanu,
Nianwen Xue, and Yi Zhang. 2009. The CoNLL-
2009 shared task: Syntactic and semantic depen-
dencies in multiple languages. In Proceedings of
CoNLL.
Jan Hajic, J. Panevová, Zdenka Uresová, Alevtina Bé-
mová, V. Kolárová, and P. Pajas. 2003. PDT-Vallex:
Creating a large-coverage valency lexicon for tree-
bank annotation.
Luheng He, Kenton Lee, Omer Levy, and Luke Zettle-
moyer. 2018. Jointly predicting predicates and ar-
guments in neural Semantic Role Labeling. In Pro-
ceedings of ACL.
Luheng He, Kenton Lee, Mike Lewis, and Luke Zettle-
moyer. 2017. Deep Semantic Role Labeling: What
works and what’s next. In Proceedings of ACL.
Shexia He, Zuchao Li, and Hai Zhao. 2019. Syntax-
aware multilingual Semantic Role Labeling. In Pro-
ceedings of EMNLP.
John Hewitt and Christopher D. Manning. 2019. A
structural probe for finding syntax in word represen-
tations. In Proceedings of NAACL.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
short-term memory.Neural Comput., 9(8).
348
Jungo Kasai, Dan Friedman, Robert Frank,
Dragomir R. Radev, and Owen Rambow. 2019.
Syntax-aware neural Semantic Role Labeling with
supertags. In Proceedings of NAACL.
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A
method for stochastic optimization. In Proceedigns
of ICLR.
Philipp Koehn. 2005. Europarl: A parallel corpus for
statistical machine translation. In MT Summit, vol-
ume 5, pages 79–86.
Ilia Kuznetsov and Iryna Gurevych. 2020. A matter
of framing: The impact of linguistic formalism on
probing results. In Proceedings of EMNLP.
Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhu-
osheng Zhang, Xi Zhou, and Xiang Zhou. 2019. De-
pendency or span, end-to-end uniform semantic role
labeling. In Proceedings of AAAI.
Chunchuan Lyu, Shay B. Cohen, and Ivan Titov. 2019.
Semantic Role Labeling with iterative structure re-
finement. In Proceedings of EMNLP.
Diego Marcheggiani, Anton Frolov, and Ivan Titov.
2017. A simple and accurate syntax-agnostic neural
model for dependency-based Semantic Role Label-
ing. In Proceedings of CoNLL.
Diego Marcheggiani and Ivan Titov. 2017. Encoding
sentences with graph convolutional networks for Se-
mantic Role Labeling. In Proceedings of EMNLP.
Lluís Màrquez, Xavier Carreras, Kenneth C. Litkowski,
and Suzanne Stevenson. 2008. Semantic Role Label-
ing: An introduction to the special issue.Computa-
tional Linguistics, 34(2):145–159.
Adam Meyers, Ruth Reeves, Catherine Macleod,
Rachel Szekely, Veronika Zielinska, Brian Young,
and Ralph Grishman. 2004. The NomBank project:
An interim report. In Proceedings of the Workshop
Frontiers in Corpus Annotation.
Phoebe Mulcaire, Swabha Swayamdipta, and Noah A.
Smith. 2018. Polyglot Semantic Role Labeling. In
Proceedings of ACL.
Roberto Navigli. 2018. Natural Language Understand-
ing: Instructions for (present and future) use. In Pro-
ceedings of IJCAI.
Sebastian Padó and Mirella Lapata. 2009. Cross-
lingual annotation projection for semantic roles.J.
Artif. Intell. Res., 36:307–340.
Martha Palmer, Daniel Gildea, and Paul Kingsbury.
2005. The Proposition Bank: An annotated cor-
pus of semantic roles.Computational Linguistics,
31(1):71–106.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke
Zettlemoyer. 2018. Deep contextualized word repre-
sentations. In Proceedings of NAACL.
Prajit Ramachandran, Barret Zoph, and Quoc V. Le.
2018. Searching for activation functions. In Pro-
ceedings of ICLR.
Emma Strubell, Patrick Verga, Daniel Andor,
David Weiss, and Andrew McCallum. 2018.
Linguistically-informed self-attention for Semantic
Role Labeling. In Proceedings of EMNLP.
Mariona Taulé, Maria Antònia Martí, and Marta Re-
casens. 2008. Ancora: Multilevel annotated corpora
for catalan and spanish. In Proceedings of LREC.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Chaumond, Clement Delangue, Anthony Moi, Pier-
ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
icz, Joe Davison, Sam Shleifer, Patrick von Platen,
Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Teven Le Scao, Sylvain Gugger, Mariama Drame,
Quentin Lhoest, and Alexander Rush. 2020. Trans-
formers: State-of-the-art natural language process-
ing. In Proceedings of EMNLP.
Nianwen Xue and Martha Palmer. 2004. Calibrating
features for semantic role labeling. In Proceedings
of EMNLP.
Nianwen Xue and Martha Palmer. 2009. Adding se-
mantic roles to the chinese treebank.Nat. Lang.
Eng., 15(1):143–172.
Hai Zhao, Wenliang Chen, Jun’ichi Kazama, Kiyotaka
Uchimoto, and Kentaro Torisawa. 2009. Multilin-
gual dependency learning: Exploiting rich features
for tagging syntactic and semantic dependencies. In
Proceedings of CoNLL.
349
A Model Hyperparameters
Table 4reports the hyperparameter values we
choose for our model configuration and experi-
ments.
Hyperparameter Value
dwSize of ei512
K0Universal sentence encoder layers 3
dtSize of ti512
K00 Universal pred.-arg. encoder layers 1
dzSize of ai256
dspSize of pi32
dssSize of si512
dsaSize of ri512
Batch size 32
Batch size when fine-tuning 128
Max learning rate 103
Min learning rate 105
Max lr for LM fine-tuning 105
Min lr for LM fine-tuning 106
Warmup epochs 1
Cooldown epochs 15
Training epochs 30
Table 4: Hyperparameter values for our model architec-
ture. We use the same hyperparameter values for our
monolingual and cross-lingual experiments.
B Data Statistics
Tables 5,6and 7provide an overview of the train-
ing sets provided as part of the CoNLL-2009 shared
task, with statistics about sentences, predicates and
arguments.
C Hardware Infrastructure
All the experiments were performed on a x86-64 ar-
chitecture with 64GB of RAM, an 8-core CPU run-
ning at 3.60GHz, and a single Nvidia RTX 2080Ti
with 11GB of VRAM.
D Training Details
Training was performed using half-precision via
Apex.
8
Training times varied considerably depend-
ing on the experiment setting: the shorter experi-
ment lasted 26 minutes (training m-BERT on 10%
of the Catalan training set), whereas the longest
8https://github.com/NVIDIA/apex
experiment lasted for 46 hours (training XLM-
RoBERTa on the union of all the datasets of all
the languages).
E Other Results
Predicate identification.
In Table 8we report
the results of our model on predicate identification.
Predicate sense disambiguation.
In Table 9we
report the results of our model on predicate sense
disambiguation.
F Alignment Examples
Figure 3provides two more examples, one in
French (left), the other in Catalan (right). We re-
mark that the training set of CoNLL-2009 does not
include sentences in French, however, our cross-
lingual model correctly outputs SRL tags according
to the other seven language-specific decoders.
350
Sentences Predicates Arguments
TotalsAnnotated Avg. Len. TotalpSenses TotalaRoles
CoNLL-2009
CA 13,200 12,873 30.2 37,431 3,554 84,367 38
CZ 38,727 38,578 16.9 414,237 9,135 365,255 60
DE 36,020 14,282 22.2 17,400 1,271 34,276 10
EN 39,279 37,847 25.0 179,014 8,237 393,699 52
ES 14,329 13,835 30.7 43,824 4,534 99,054 43
ZH 22,277 21,071 28.5 102,813 12,587 231,869 36
Table 5: Overview of the CoNLL-2009 training sets. For each dataset we report the number of sentences (Totals),
the number of sentences with at least an annotated predicate (Annotated), the average number of tokens per sen-
tence (Avg. Len.), the number of predicates (Totalp) and predicate senses (Senses), and also the number of argu-
ments (Totala) and argument roles (Roles).
Sentences Predicates Arguments
TotalsAnnotated Avg. Len. TotalpSenses TotalaRoles
CoNLL-2009
CA 1,724 1,675 31.5 5,105 1,436 11,529 34
CZ 5,228 5,210 16.9 55,517 3,467 49,071 54
DE 2,000 532 19.7 588 255 1,169 9
EN 1,334 1,283 25.7 6,390 1,990 13,865 32
ES 1,655 1,588 31.4 5,076 1,565 11,600 36
ZH 1,762 1,663 29.5 8,103 2,535 18,554 24
Table 6: Overview of the CoNLL-2009 development datasets. For each dataset we report the number of sentences
(Totals), the number of sentences with at least an annotated predicate (Annotated), the average number of tokens
per sentence (Avg. Len.), the number of predicates (Totalp) and predicate senses (Senses), and also the number of
arguments (Totala) and argument roles (Roles).
Sentences Predicates Arguments
TotalsAnnotated Avg. Len. TotalpSenses TotalaRoles
CoNLL-2009
CA 1,862 1,802 29.4 5,001 1,425 11,275 32
CZ 4,213 4,196 16.8 44,585 3,018 39,223 55
DE 2,000 506 20.1 550 238 1,073 8
EN 2,000 1,913 25.0 8,987 2,254 19,949 35
ES 1,725 1,663 30.2 5,175 1,623 11,824 33
ZH 2,556 2,400 30.1 12,282 3,458 27,712 26
Table 7: Overview of the CoNLL-2009 testing datasets. For each dataset we report the number of sentences
(Totals), the number of sentences with at least an annotated predicate (Annotated), the average number of tokens
per sentence (Avg. Len.), the number of predicates (Totalp) and predicate senses (Senses), and also the number of
arguments (Totala) and argument roles (Roles).
351
CONLL-2009 - PREDICATE IDENTIFICATION CA CZ DE EN ES ZH
This work m-BERT frozen / monolingual 97.9 98.6 90.5 93.8 97.8 94.3
This work m-BERT / monolingual 98.3 98.9 91.4 94.3 98.4 95.0
This work m-BERT / cross-lingual 98.3 99.0 91.6 94.4 98.4 95.1
This work XLM-R frozen / monolingual 97.9 98.9 90.5 93.9 98.0 94.7
This work XLM-R / monolingual 98.3 99.2 91.5 94.3 98.4 95.2
This work XLM-R / cross-lingual 98.5 99.3 91.9 94.6 98.6 95.4
Table 8: F1scores on the predicate identification subtask which is not part of the CoNLL-2009 shared task setting.
CONLL-2009 - PREDICATE DISAMBIGUATION CA CZ DE EN ES ZH
This work m-BERT frozen / monolingual 90.0 93.2 86.9 96.8 87.3 94.9
This work m-BERT / monolingual 90.3 93.5 87.3 97.2 87.5 95.0
This work m-BERT / cross-lingual 90.3 93.5 87.3 97.2 87.6 95.3
This work XLM-R frozen / monolingual 90.1 93.6 86.8 96.8 87.4 95.2
This work XLM-R / monolingual 90.4 93.7 87.3 97.1 87.6 95.6
This work XLM-R / cross-lingual 90.5 93.9 87.5 97.2 87.8 95.8
Table 9: Accuracy on the predicate sense disambiguation subtask computed by the official CoNLL-2009 scorer
which, by default, takes into account only the sense numbers, e.g., 01 of eat.01.
Figure 3: Output of our cross-lingual system for a French (left) and a Catalan (right) sentence.
... Recent works have witnessed prominent performances of multilingual pre-trained language models (PrLMs) (Devlin et al., 2019;Conneau et al., 2020) on cross-lingual tasks, including machine translation (Lin et al., 2020;Chen et al., 2021), semantic role labeling (SRL) (Conia and Navigli, 2020;Conia et al., 2021) and semantic parsing (Fei et al., 2020b;Sherborne and Lapata, 2021). How-arXiv:2204.04914v1 ...
... Cross-lingual Language Model (CLM) We concatenate all utterances into a sequence and then use a pre-trained cross-lingual language model such as XLM-R (Conneau et al., 2020) or mBERT (Devlin et al., 2019) to capture the syntactic and semantic characteristics. Following Conia et al. (2021), we obtain word representations e ∈ R |S|×d by concatenating the hidden states of the four topmost layers of the language model, where |S| is the sequence length and d is the dimension of the hidden state. ...
... Furthermore, we also find that fine-tuning all parameters leads to slightly better performance than freezing the language model during the CSRL training stage. This finding is consistent with the previous work (Conia et al., 2021). Table 3 presents the results of ablation studies on pre-training objectives and different modules. ...
Preprint
Full-text available
While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese conversational tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for the parser training. To avoid expensive data collection and error-propagation of translation-based methods, we present a simple but effective approach to perform zero-shot cross-lingual CSRL. Our model implicitly learns language-agnostic, conversational structure-aware and semantically rich representations with the hierarchical encoders and elaborately designed pre-training objectives. Experimental results show that our model outperforms all baselines by large margins on two newly collected English CSRL test sets. More importantly, we confirm the usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese by incorporating the CSRL information into the downstream conversation-based models. We believe this finding is significant and will facilitate the research of non-Chinese dialogue tasks which suffer the problems of ellipsis and anaphora.
... For USeA, we develop and encapsulate an SRL model that falls within the broad category of end-to-end systems, tackling the whole SRL pipeline -predicate identification, predicate sense disambiguation, argument identification and argument classification -in a single forward pass. Differently from other prepackaged SRL systems, such as InVeRo and Al-lenNLP's SRL demo, USeA is based on a multilingual neural model (Conia et al., 2021a;Conia et al., 2021b) which is able to perform state-of-the-art SRL not only across-languages, but also using different and heterogeneous linguistic inventories, namely, the English PropBank (Palmer et al., 2005), the Chinese Prop-Bank (Xue, 2008), AnCora (Taulé et al., 2008), and VerbAtlas (Di Fabio et al., 2019), inter alia. ...
Conference Paper
Full-text available
In this paper, we present the Universal Semantic Annotator (USeA), which offers the first unified API for high-quality automatic annotations of texts in 100 languages through state-of-the-art systems for Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing. Together, such annotations can be used to provide users with rich and diverse semantic information, help second-language learners, and allow researchers to integrate explicit semantic knowledge into downstream tasks and real-world applications.
... Specifically, we present a simple basic SRL model and enhance the model with the contextualized word representations from BERT for further improvements. Besides, we also present a MTL framework to improve the SRL performance by learning from multiple heterogeneous datasets simultaneously (Conia et al., 2021). ...
Preprint
During the past decade, neural network models have made tremendous progress on in-domain semantic role labeling (SRL). However, performance drops dramatically under the out-of-domain setting. In order to facilitate research on cross-domain SRL, this paper presents MuCPAD, a multi-domain Chinese predicate-argument dataset, which consists of 30,897 sentences and 92,051 predicates from six different domains. MuCPAD exhibits three important features. 1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates. 2) We explicitly annotate omitted core arguments to recover more complete semantic structure, considering that omission of content words is ubiquitous in multi-domain Chinese texts. 3) We compile 53 pages of annotation guidelines and adopt strict double annotation for improving data quality. This paper describes in detail the annotation methodology and annotation process of MuCPAD, and presents in-depth data analysis. We also give benchmark results on cross-domain SRL based on MuCPAD.
... However, since idiomaticity is a frequent phenomenon that can be observed in all languages, idiomatic expressions should play an important role in NLP. Indeed, their identification and understanding is crucial not only for Natural Language Understanding tasks such as Word Sense Disambiguation (Bevilacqua et al., 2021b), Semantic Role Labeling (Conia et al., 2021) and Semantic Parsing (Bevilacqua et al., 2021a), but also for Machine Translation (Edunov et al., 2018;Liu et al., 2020), Question Answering (Mishra and Jain, 2016) and Text Summarization (Chu and Wang, 2018), inter alia. ...
Conference Paper
Full-text available
Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual sub-tasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F 1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F 1 in the zero-shot setting. We release our code at https: //github.com/Babelscape/ner4id.
Conference Paper
Full-text available
A language-independent representation of meaning is one of the most coveted dreams in Natural Language Understanding. With this goal in mind, several formalisms have been proposed as frameworks for meaning representation in Semantic Parsing. And yet, the dependencies these formalisms share with respect to language-specific repositories of knowledge make the objective of closing the gap between high- and low-resourced languages hard to accomplish. In this paper, we present the BabelNet Meaning Representation (BMR), an interlingual formalism that abstracts away from language-specific constraints by taking advantage of the multilingual semantic resources of BabelNet and VerbAtlas. We describe the rationale behind the creation of BMR and put forward BMR 1.0, a dataset labeled entirely according to the new formalism. Moreover, we show how BMR is able to outperform previous formalisms thanks to its fully-semantic framing, which enables top-notch multilingual parsing and generation. We release the code at https://github.com/SapienzaNLP/bmr.
Conference Paper
Full-text available
Thanks to the effectiveness and wide availability of modern pretrained language models (PLMs), recently proposed approaches have achieved remarkable results in dependency-and span-based, multilingual and cross-lingual Semantic Role Labeling (SRL). These results have prompted researchers to investigate the inner workings of modern PLMs with the aim of understanding how, where, and to what extent they encode information about SRL. In this paper, we follow this line of research and probe for predicate argument structures in PLMs. Our study shows that PLMs do encode semantic structures directly into the con-textualized representation of a predicate, and also provides insights into the correlation between predicate senses and their structures, the degree of transferability between nominal and verbal structures, and how such structures are encoded across languages. Finally, we look at the practical implications of such insights and demonstrate the benefits of embedding predicate argument structure information into an SRL model.
Conference Paper
Full-text available
Notwithstanding the growing interest in cross-lingual techniques for Natural Language Processing , there has been a surprisingly small number of efforts aimed at the development of easy-to-use tools for cross-lingual Semantic Role Labeling. In this paper, we fill this gap and present InVeRo-XL, an off-the-shelf state-of-the-art system capable of annotating text with predicate sense and semantic role labels from 7 predicate-argument structure inventories in more than 40 languages. We hope that our system-with its easy-to-use RESTful API and Web interface-will become a valuable tool for the research community , encouraging the integration of sentence-level semantics into cross-lingual downstream tasks. InVeRo-XL is available online at http://nlp.uniroma1.it/invero.
Conference Paper
Full-text available
Multilingual and cross-lingual Semantic Role Labeling (SRL) have recently garnered increasing attention as multilingual text representation techniques have become more effective and widely available. While recent work has attained growing success, results on gold multilingual benchmarks are still not easily comparable across languages, making it difficult to grasp where we stand. For example, in CoNLL-2009, the standard benchmark for multilingual SRL, language-to-language comparisons are affected by the fact that each language has its own dataset which differs from the others in size, domains, sets of labels and annotation guidelines. In this paper, we address this issue and propose UNITED-SRL, a new benchmark for multilingual and cross-lingual, span-and dependency-based SRL. UNITED-SRL provides expert-curated parallel annotations using a common predicate-argument structure inventory, allowing direct comparisons across languages and encouraging studies on cross-lingual transfer in SRL. We release UNITED-SRL v1.0 at https://github.com/SapienzaNLP/united-srl.
Conference Paper
Full-text available
Recent research indicates that taking advantage of complex syntactic features leads to favorable results in Semantic Role Labeling. Nonetheless, an analysis of the latest state-of-the-art multilingual systems reveals the difficulty of bridging the wide gap in performance between high-resource (e.g., English) and low-resource (e.g., German) settings. To overcome this issue, we propose a fully language-agnostic model that does away with morphological and syntactic features to achieve robustness across languages. Our approach outperforms the state of the art in all the languages of the CoNLL-2009 benchmark dataset, especially whenever a scarce amount of training data is available. Our objective is not to reject approaches that rely on syntax, rather to set a strong and consistent language-independent baseline for future innovations in Semantic Role Labeling. We release our model code and checkpoints at https://github.com/SapienzaNLP/multi-srl.
Conference Paper
Full-text available
Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pre-trained implementation of a neural, span-based architecture for SRL. Both the resource and the system provide human-readable verb sense and semantic role information, with an easy to use Web interface and RESTful APIs available at http://nlp.uniroma1.it/invero.
Conference Paper
Full-text available
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org.