Conference PaperPDF Available

Classification-based strategies for combining multiple 5-w question answering systems


Abstract and Figures

We describe and analyze inference strategies for combining outputs from multiple question answering systems each of which was developed independently. Specifically, we address the DARPA-funded GALE information distillation Year 3 task of finding answers to the 5-Wh questions (who, what, when, where, and why) for each given sentence. The approach we take revolves around determining the best system using discrimina- tive learning. In particular, we train support vector machines with a set of novel features that encode systems' capabilities of returning as many correct answers as possible. We analyze two combination strategies: one combines multiple systems at the granularity of sentences, and the other at the granularity of individual fields. Our experimental results indicate that the pro- posed features and combination strategies were able to improve the overall performance by 22% to 36% relative to a random selection, 16% to 35% relative to a majority voting scheme, and 15% to 23% relative to the best individual system. Index Terms: Question answering, Systems for spoken lan- guage understanding
Content may be subject to copyright.
Classification-Based Strategies for Combining Multiple 5-W Question
Answering Systems
Sibel Yaman1, Dilek Hakkani-Tur1, Gokhan Tur2, Ralph Grishman3,
Mary Harper4, Kathleen R. McKeown5, Adam Meyers3, Kartavya Sharma5
1International Computer Science Institute, 2SRI International
3Computer Science Department, New York University
4Hopkins HLT Center of Excellence, University of Maryland
5Computer Science Department, Columbia University
We describe and analyze inference strategies for combining
outputs from multiple question answering systems each of
which was developed independently. Specifically, we address
the DARPA-funded GALE information distillation Year 3 task
of finding answers to the 5-Wh questions (who, what, when,
where, and why) for each given sentence. The approach we take
revolves around determining the best system using discrimina-
tive learning. In particular, we train support vector machines
with a set of novel features that encode systems’ capabilities
of returning as many correct answers as possible. We analyze
two combination strategies: one combines multiple systems at
the granularity of sentences, and the other at the granularity of
individual fields. Our experimental results indicate that the pro-
posed features and combination strategies were able to improve
the overall performance by 22% to 36% relative to a random
selection, 16% to 35% relative to a majority voting scheme, and
15% to 23% relative to the best individual system.
Index Terms: Question answering, Systems for spoken lan-
guage understanding
1. Introduction
Information distillation aims to analyze and interpret massive
amounts of multilingual speech and text archives and provide
queried information to the user. In general, a distillation sys-
tem first processes the given user query using an information
retrieval (IR) system to find the relevant documents among huge
document collections. Each sentence in these documents is then
processed to determine whether or not it answers the user’s
The goal in the third year of the DARPA-funded GALE
information distillation task is to predict answers to the 5-W
questions (who, what, when, where, and why) for the top-level
predicate, i.e., the main predicate, for each and every given sen-
tence. More specifically, the answer to who, the actor/agent,
should be the logical subject of the sentence, which is indicated
with a “by clause” in passive sentences. The answer to what
should contain the main predicate plus its logical object. The
answer to when, the temporal argument, should include specific
times (‘3 days ago’), non-exact times (‘prior to his term’), and
adverbs of frequency/duration (e.g. ‘for a year’, ‘sometimes’,
‘often’). The answer to where, the locative argument, refers to
a physical location (’behind the building’) or a metaphorical lo-
cation (‘in the speech’, ‘in the process’). Finally, the answer to
why, the causative argument, should be an explicitly triggered
expression for reason/cause (‘as a result of the crisis’, ‘because
of the explosion’).
The 5-W extraction task is important in many respects.
First, it eliminates the errors that originate from the IR com-
ponent of a typical distillation system. Second, since there is
only one sentence for which each and every 5-W question is
answered, it is possible to evaluate the success of a system in
finding answers to specific questions from the same sentence.
Third, it also becomes possible to evaluate the premise of the
state-of-the-art techniques for the isolated task of extracting an-
swers, because, for the task described, it is critical that a sophis-
ticated syntactic, semantic, and contextual processing of docu-
ments and queries is performed.
While document and information retrieval in response to
user queries has been well studied, finding exact answers to a
question is less well-studied. Answering factoid questions (i.e.,
questions like What is the capital of France?) using web or huge
data collections typically makes use of the redundancy of infor-
mation [1]. In addition to the work on written texts, many sys-
tems have been developed that can operate with either spoken
queries or spoken document collections [2]. The combination
strategies developed in this paper are similar to those developed
for combining multiple systems for semantic role labeling [3],
which is the task of finding the arguments of each of the pred-
icates in a sentence. In summary, the nodes of a tree are to be
marked with semantic role labels (SRLs) such as ARG0 to de-
note a subject. Combining various SRL systems is different in
that involves a heavy analysis of parse trees.
In this paper, we first describe three systems to extract an-
swers to 5-W questions, and then we move our focus to the
problem of combining these systems. We designed a feature ex-
traction module that makes use of novel features. Using the so-
formed feature vectors, we developed two combination strate-
gies to predict the system that would return as many correct
answers as possible. One primary contribution is the analysis
we performed to answer the specific questions of (i) what kind
of features provide useful information for this task, (ii) what
kind of a combination strategy works best, and (iii) in which
question and in which genre are the combination strategies most
effective. Our experimental results indicate a significant im-
provement over any individual system, as well as combination
strategies that are not based on learning.
Copyright © 2009 ISCA 6
10 September, Brighton UK2703
2. Answering 5-W Questions
The GALE information distillation task specifies finding an-
swers to sentences in one of four genres: newswire (NW), web
text (WT), broadcast news (BN), and broadcast conversations
(BC). Answers are judged “correct” only if they either correctly
identify a null answer (i.e., there is no answer to be returned) or
correctly extract an answer that is present in the sentence. An-
swers are not penalized for including extra text excluding text
from another answer or text from another top-level predicate.
2.1. UMD Parser
We have found that training a parser that matches the genre was
of critical importance. For this reason, in University of Mary-
land (UMD) parser, we have re-implemented and enhanced the
Berkeley parser [4] in several ways. While the enhancements
to the parser are important, it is even more important to train
grammars matched to the conditions of use. Therefore, our
genre-specific English grammars were trained by subsampling
from available treebanks (WSJ, BN, Brown, Fisher, and Switch-
board) after some genre-specific text transformations. For in-
formal speech, we replaced symbolic expressions with ver-
bal forms and removed punctuation, edits, and case. We also
utilized a large amount (210K to 300K sentences) of genre-
matched self-labeled training parses in training these grammars.
We also trained grammars with a subset of function tags anno-
tated in the treebank that indicate case role information (e.g.,
SBJ, OBJ, LOC, TMP, MNR) in order to produce function tags.
Using these parsers, we obtained F1measures of 91.7% on WSJ
section 23 training on WSJ (not including dev and test), 90.45%
on 10% of the labeled data drawn from English BN, and 87.84%
on English Fisher on 10% of the labeled data drawn from Fisher.
2.2. Individual Systems for Answering 5-W Questions
We developed three systems to answer 5-W questions: one at
ICSI/SRI, one at NYU, and one at Columbia University. We
will refer to these systems as System A, System B, and System
C, respectively.
System A works in two steps: The first step is a cascade
of several operations to determine one top-level predicate us-
ing UMD parser, including detecting and marking quotations,
removing conditional clauses, processing conjunction of sen-
tences and conjunction of verb phrases so that only one top-
level predicate remains, and detecting passive sentences. The
second step starts with an analysis of the sentence structure
since the positions of constituents in the sentence depend on
the sentence structure. Once the sentence is categorized accord-
ing to its structure, a set of linguistically-motivated handcrafted
rules is applied to extract answers. These rules make use of the
bracketed syntactic parse trees with function tags produced by
the UMD parser.
System B uses a Grammatical and Logical Argument
Framework (GLARF)-based program [5] to recognize logical
relations between words, such as subject, object, and indirect
object. The GLARF system regularizes these relations across
passive, relative clauses, coordinate structures, and so on. The
5Ws are read off of these relations, mapping the logical sub-
ject to who, the verb plus one core argument to what, temporal
adverbials to when, and so on. 5-W output was calculated for
GLARF output of both the Charniak parser [6] and the UMD
parser, using heuristics to choose between the two (the one with
more non-null answers, etc.). If neither of these systems pro-
duced output that included a verb in the WHAT slot, a secondary
string-based set of heuristics was used to fill the 5W slots. Typ-
ically, the backup system fired only when both parsers failed or
produced nonsensical output.
System C was developed specifically for handling the noisy
sentences resulting from ASR by a series of fall-back systems.
We developed a set of information extraction patterns, trained
over speech data, to identify the top level predicate using the de-
pendency tree and function tags produced by the UMD parser
tuned for speech. If a unique predicate was found, then the
function tags, along with time and location named entities, were
used to extract the 5Ws from the sentence. If no unique pred-
icate was found, then a disfluency removal algorithm was ap-
plied and the same IE patterns were then applied to the depen-
dency tree produced by the Stanford Parser to identify the top-
level predicate and its arguments. If this method failed, then
System C falls back to the use of NYU’s Jet system to perform
chunking and a different set of information extraction patterns.
3. Combining Multiple Answers
When only automatically generated syntactic parse trees are
available, the quality of the available information substantially
degrades. The automatic transcriptions of audio input present
many challenges as they do not contain punctuation, include
disfluencies, and often have segmentation issues. Under such
real-world conditions, a combination of several modules each of
which presents a different view provides a mechanism to over-
come the shortcomings of individual modules.
3.1. Feature Extraction
It was found that the performance of a 5-W question answer-
ing system correlates with some higher-level features, which
are summarized in Table 1. These features can be categorized
as follows:
System Level Features: The features numbered (1)
through (6) encode system-specific answer construction mecha-
nism. For instance, a system might be failing in answering who
in passive sentences that can be detected by a “non-null answer
to who” (feature (2)).
System Agreement Features: Features of type (7) compare
answers from different systems. The rationale for these features
is that if any two systems have overlapping answers, then it is
more likely that their answers are correct. In the meantime, if
two systems agree on their predicates, then some of their an-
swers should agree or overlap.
Syntactic Features: The features numbered (8) cap-
ture whether one should be expecting a locative, temporal, or
causative argument to be non-null.
Sentence Level Features: The features numbered (9) and
(10) are found by analyzing the given sentence itself.
System level features are extracted from each of the three
systems separately as these depend on the system. These fea-
tures are then concatenated with other features and used to train
3.2. Training Discriminative Classifiers
The answers are selected from one system only depending on
some prediction score that indicates how likely it is that a se-
lected system will return as many correct answers as possible.
Since it is possible more than one system to be the highest scor-
ing, this is a multiclass multilabel classification problem.
Since the arguments are dependent on the predicate cho-
sen, we do not mix and match fields. Even when two sys-
tems select the same top-level predicate, their answers might
be different, because, in some cases, constituents may answer
more than one question. For instance, for the sentence “She
was depressed after the car accident”, the constituent “after the
car accident” is allowed to answer either one of the when and
Table 1: The rationale of features extracted for combining multiple 5-W systems. Classification-based approach takes system level features, sentence
level features, system agrement features, and syntactic features to combine multiple systems at the granularity of sentences or at the granularity of
individual fields.
(1) Number of non-null answers A system returning fewer non-null answers is likely to be answering all 5-W questions.
(2) Positions of the answers The position of an answer should not be too far from the positions of other answers.
(3) Number of words in each answer
The length of an answer is typically dependent on the field; for instance, the length of the
answer to what is typically greater than one as the correct answer to what should include
the main predicate and the object (if any).
(4) The answer to who is null There are cases when the answer to who should be null (e.g., imperative sentences)
and other cases when it should not be null (e.g., simple declarative sentences).
(5) WordNet analysis of the answer to who The answer to who should include a noun.
(6) WordNet analysis of the answer to what The answer to what should include a verb, and typically a noun; it typically starts with
a verb and the verb comes before a noun.
(7) Agreement among systems
The answers to who, what, when, where, and why as well as the predicates would agree
when all the systems consider the same predicate. If predicates do not agree, answers
should not be expected to agree.
(8) Number of arguments in the sentence The number of times “LOC”, “TMP”, “PRP” arguments occur in the sentence
evidences if one should expect to have non-null answers.
(9) Voice of the sentence The answers to who and what depend on whether the so-found top-level predicate is in
its active or passive form.
(10) Length of the sentence Answers are typically more reliable for short sentences.
(11) Quotations The quotations should appear as they are.
why questions. Furthermore, as our experimental results show,
the systems we developed make complementary mistakes and
therefore the ideal combination strategy that selects the best sys-
tem for each sentence was able to give an accuracy of as much
as 96% overall.
3.2.1. 5-W Corpus
Since each sentence may have multiple correct sets of 5-Ws, it
is not straightforward to produce a gold-standard corpus for au-
tomatic evaluation. One would have to specify answers for each
possible top-level predicate, as well as which parts of the sen-
tence are optional and which are not allowed. This also makes
creating training data for system development problematic and
In this work, we selected the first five (reasonable) sen-
tences from 20 English documents. The answers for these 100
sentences were presented side by side in a Java-based inter-
face and graded by human judges. The interface showed the
sentence, the system responses, and options for selecting “cor-
rect”,“incorrect”, and “partial” options for each answer. For
BN and BC sentences, the graders were presented the manual
and automatic transcriptions, as the evaluation should be per-
formed against the manual transcription but the answers should
be extracted from the automatic transcriptions. Each sentence
was graded by two human judges, and disagreements were re-
solved by further discussions. Since the annotated data we col-
lected was not large, we ran a 20-fold cross-validation and take
average in our experiments.
3.2.2. System Level Combination
In this combination strategy, we selected training binary SVM
classifiers1to solve this multiclass multilabel classification
problem. Three binary classifiers, called SV MA, S V MBand
SV MC, predict how likely that system s∈ {A, B , C}would
return as many correct answers as possible for a given sentence.
Let xdenote the concatenation of the features extracted
from the answers of the three systems, from the sentence itself,
and from its syntactic analysis as described in Section 3.2. Let
ysdenote the labels that are used to train SV Ms. The labels,
ys, are such that
ys=1,if System sis a highest scoring system,
0,otherwise. (1)
In the test stage, the scores from the three SVMs are compared,
the system with the highest prediction score is selected, and its
answers are returned.
3.2.3. Answer Level Combination
In this combination strategy, the responses of those systems that
received a “correct” grade for a target question are labeled with
a “1”, and those of lower-scoring systems are labeled with a
“0”. The feature vectors are used to train five binary SVMs, one
for each of the 5-W questions. Each of these SVMs predicts
whether a given answer (rather than a given system) is correct or
not. Let xdenote feature vectors as before and let zq
sdenote the
binary labels attached to the answers of System s∈ {A, B, C }
to the question q∈ {who, what, when, where, why }. The
feature vectors are labeled such that
s=1,if System s’s answer to qis correct,
0,otherwise. (2)
The prediction scores of binary SVMs are then used to form
new feature vectors to train three binary SVMs to predict the
best system for the given sentence. These feature vectors, ˆps,
are composed of the sums of the prediction scores of System s
to the 5-W questions. The corresponding labels are such that
if System sis a highest-scoring system, its label is 1, and 0
otherwise. The final decision is made by comparing all three
prediction scores and selecting the highest one.
4. Experiments and Results
We ran experiments to evaluate the two combination strategies
and compare their performance against individual systems as
well as combination strategies that would be taken in case no
annotated data was available.
4.1. Performance Evaluation Per Genre
Table 2 compares the error rates of the different systems and
combination strategies on different genres. “Random” stands
for a strategy in which the answers of a randomly selected sys-
tem are returned. “Oracle” stands for the ideal strategy that
Table 2: Error rates per genre. The answer level combination reduces
the error rate of the majority-voting based combination by 17% to 35%.
NW WT BN BC Pooled
Sys A 14.8 17.0 12.0 12.8 14.2
Sys B 12.6 17.1 15.9 15.9 15.4
Sys C 22.0 22.8 9.2 11.8 16.2
Random 16.2 19.5 12.1 14.7 15.6
Majority 16.4 16.3 11.1 15.2 14.7
Oracle 3.9 7.2 2.7 2.2 4.0
System Level 13.0 13.3 9.1 12.6 12.0
Answer Level 10.6 13.6 8.0 11.4 10.9
δRandom 36.1 30.6 33.9 22.7 30.5
δMaj ority 35.4 16.6 27.9 25.0 25.9
δBest 15.87 20.0 13.0 3.4 23.2
has the knowledge of which system would perform best for a
given sentence, and hence denotes the best that can be expected
from any combination scheme. δR andom stands for the rela-
tive improvement of the answer level combination strategy over
“Random”, δmajority stands for that over “Majority”, and δbest
stands for that over the best individual system. The column
“Pooled”’ shows the error rate when answers from all genres
and all question types are considered.
If we had no annotated data and hence no knowledge about
how these individual systems perform, we could take two ap-
proaches. As a first approach, we could just select a system
randomly for each given sentence and return its answers. This
totally “ignorant” system combination would correspond to the
“Random” strategy. As an alternative, we could take a more in-
telligent strategy and test if any systems agree on any answer.
If there is any agreement, for each sentence, we could score
any agreement with a “+1” and then return the answers of the
highest-scoring system. This combination would correspond to
the “Majority” strategy in Table 2.
As seen in Table 2, the answer level combination strategy
outperformed the “Random” strategy by 22.7% to 36.1% rela-
tive and the “Majority” strategy by 16.6% to 35.4% relative. If
some annotated data were made available, then we could eval-
uate the performance of individual systems and select the best
one for each genre. As seen in Table 2, the best system in NW is
System B, in WT it is System A, and in BN and BC it is System
C. The answer level combination strategy was able to improve
all these systems and, more specifically, the best individual sys-
tem by 3.4% to 20% relative. The system level combination
performed slightly better than answer level combination in WT
but the difference is not significant.
4.2. Performance Evaluation Per Question Type
As seen in Table 3, the performance on the answers to what is
substantially lower than those answers to other questions. One
of the most important reasons is that the answers to what tend
to contain extra text, which quite often contains arguments of
a predicate other than the top-level predicate. It was also quite
common that the answer to what contained text that would cor-
rectly answer other questions. For instance, if a passive sen-
tence detector fails, then the answer to who is “incorrect” and
the answer to what is “partial” at best, and similarly, if the an-
swer to when is contained in the answer to what, then the an-
swers to both when and what are “incorrect”.
Table 3 shows that the answer level combination strategy
was able to significantly improve all “Random”, “Majority”,
and “Best system” strategies. These improvements were most
obvious in the answers to what, suggesting us to conclude that
Table 3: Error rates per question type. The answer level combination
reduces the error rate of the majority-voting based combination by 16%
to 40%.
who what when where why
Sys A 12.6 23.8 11.8 16.7 5.3
Sys B 15.8 30.9 13.7 10.8 4.9
Sys C 22.3 26.4 21.0 13.1 9.4
Random 14.1 28.2 16.1 13.8 5.4
Majority 13.7 24.1 13.1 14.0 6.0
Oracle 4.6 6.4 9.5 9.5 2.3
System Level 10.0 21.3 11.5 12.6 4.6
Answer Level 8.2 18.6 11.0 11.0 4.4
δRandom 41.8 34.0 31.8 19.7 19.1
δMaj ority 40.1 22.8 16.0 21.4 26.7
δBest 33.4 21.8 6.5 33.9 17.1
the most dramatic improvement would in the worst-performing
A major conclusion that we make from Tables 2 and 3 is
that the combination strategy that tackles the problem in the
granularity of individual fields is more successful than the strat-
egy that works in the granularity of sentences. A similar obser-
vation was made in combining systems for semantic role label-
ing as well [3].
5. Conclusions
We describe two combination strategies, one of which operates
at the granularity of sentences and the other at the granularity of
answers. The specific task we addressed was the 5-W in which
the goal is to answer (who, what, when, where, and why) ques-
tions for each given sentence. We trained SVM classifiers with
a set of novel features. We evaluated the proposed strategies us-
ing a set of text and audio sentences. Our experimental results
indicated that the proposed features and combination strategies
were successful at utilizing the strengths of each component
system. The combination strategies described in this paper can
be used to make effective use of multiple answers originating
from independent systems in information extraction, semantic
role labeling and etc.
6. Acknowledgements
The authors thank Sara Stolbach for developing a graphical user inter-
face for data annotation and Bob Coyne for participating in data anno-
tation. This work was supported by DARPA HR0011-06-C-0023. Any
opinions, findings and/or recommendations expressed in this paper are
those of the authors and do not necessarily reflect the views of the fund-
ing agencies.
7. References
[1] E. W. D. Whittaker, J. Mrozinski, and S. Furui, “Factoid ques-
tion answering with web, mobile and speech interfaces,” in
NAACL/HLT, 2006.
[2] L. Lamel, S. Rosset, C. Ayache, D. Mostefa, J. Turmo, and P. Co-
mas, “Question answering on speech transcriptions: the QAST
evaluation in CLEFF,” in LREC, 2008.
[3] M. Surdeanu, L. Marquez, X. Carreras, and P. R. Comas, “Com-
bination strategies for semantic role labeling,” Journal of Artificial
Intelligence Research, vol. 29, pp. 105–151, 2007.
[4] S. Petrov and D. Klein, “Improved inference for unlexicalized pars-
ing,” in NAACL/HLT, 2007.
[5] A. Meyers, M. Kosaka, N. Xue, H. Ji, A. Sun, S.Liao, and W. Xu,
“Automatic Recognition of Logical Relations for English, Chinese
and Japanese,” in SEW-2009 at NAACL-HLT-2009, 2009.
[6] E. Charniak, “Immediate-head parsing for language models,” in
Meeting of the ACL, 2001.
... The second category of approaches is highly specialized on task-specific event properties, such as the number of dead or injured people for crisis monitoring [32] or the number of protestors in demonstrations [26]. Approaches of the third category extract explicit event descriptors but are not publicly available [29,[34][35][36]. ...
... The task is closely related to closed-domain question answering, which is why some authors call their approaches 5W question answering (QA) systems. Systems for 5W QA on news texts typically perform three tasks to determine the article's main event: (1) preprocessing, (2) phrase extraction, and (3) candidate scoring [34,35]. The input data to QA systems is usually text, such as a full article including headline, lead paragraph, and main text [30], or a single sentence, e.g., in news ticker format [36]. ...
... The input data to QA systems is usually text, such as a full article including headline, lead paragraph, and main text [30], or a single sentence, e.g., in news ticker format [36]. Other systems use automatic speech recognition (ASR) to convert broad casts into text [35]. The outcomes of the process are five phrases, one for each of the 5W, which together represent the main event of a given news text, as exemplarily highlighted in Fig. 1.The preprocessing task (1) performs sentence splitting, tokenizes them, and often applies further NLP methods, including part-of-speech (POS) tagging, coreference resolution [30], NER [12], parsing [24], or semantic role labeling (SRL) [8]. ...
Conference Paper
Full-text available
Extraction of event descriptors from news articles is a commonly required task for various tasks, such as clustering related articles, summarization, and news aggregation. Due to the lack of generally usable and publicly available methods optimized for news, many researchers must redundantly implement such methods for their project. Answers to the five journalistic W questions (5Ws) describe the main event of a news article, i.e., who did what, when, where, and why. The main contribution of this paper is Giveme5W, the first open-source, syntax-based 5W extraction system for news articles. The system retrieves an article's main event by extracting phrases that answer the journalistic 5Ws. In an evaluation with three assessors and 60 articles, we find that the extraction precision of 5W phrases is p = 0.7.
... -Extraction de phrases : cette étape a pour but d'extraire les phrases candidates pour répondre à chacune des questions 5W1H à partir du texte prétraité. Pour ce faire, plusieurs méthodes et stratégies sont proposées dans la littérature [157,198,98,178,73]. Par exemple, les auteurs dans [157,198,98] proposent des méthodes basées sur des règles linguistiques établies manuellement. ...
... Pour ce faire, plusieurs méthodes et stratégies sont proposées dans la littérature [157,198,98,178,73]. Par exemple, les auteurs dans [157,198,98] proposent des méthodes basées sur des règles linguistiques établies manuellement. Dans le système proposé dans [98], les syntagmes nominaux sont identifiés comme candidats Who (c.-à-d., les expressions candidates à la réponse à la question Who), tandis que les syntagmes verbaux adjacents sont identifiés comme candidats What. ...
Full-text available
En raison de leur grand potentiel pour l'amélioration de la sécurité, du confort, de la productivité et des économies d'énergie, les environnements connectés sont devenus omniprésents dans notre vie quotidienne, ils ont eu un impact sur différents secteurs, tels que, les hôpitaux, les centres commerciaux, les fermes et les véhicules. Afin d'améliorer encore plus la qualité de vie dans ces environnements, beaucoup d'applications proposant des services basés sur l’exploitation des données collectées par les capteurs ont vu le jour. La détection d’événements est l’un de ces services (par exemple, la détection d’un incendie, la détection d’un accident vasculaire cérébral pour les patients, la détection de la pollution atmosphérique). Généralement, quand un événement est déclenché dans un environnement connecté, la réaction naturelle d’un responsable est d’essayer de comprendre ce qui s’est passé et pourquoi l’évènement s’est déclenché. Pour trouver des réponses à ces questions, l'approche traditionnelle est d'interroger manuellement les différents sources de données (système d'informations réseaux de capteurs et système d'informations corpus de documents), ce qui peut s'avérer très fastidieux, très coûteux en matière de temps et nécessite un énorme effort de compilation. Cette thèse, s’intéresse à l’explication des événements détectés dans les environnements connectés, et plus précisément à ceux qui se produisent dans des environnements disposant de systèmes d’information (SI) hétérogènes (SI documents et SI réseau de capteurs). Nous proposons le framework intitulé ISEE (Information System for Event Explanation). ISEE est basé sur:(i) un modèle pour la définition d'évènements dans les environnements hybride, ce dernier permet au utilisateurs de définir les événements qu'ils souhaitent détecter selon différents axes de description (document et réseaux de capteurs); (ii) un processus pour l'interconnexion ciblée du ..... , son rôle est d'exploiter les données issues des évènements (définition et déclenchement) pour construire des connexions sensibles aux contexte (explication des événements). Ces connexions vont servir à rapprocher les différentes sources d'information et guider le processus de recherche d'explication; (iii) un modèle inspiré de la technique 5W1H (what, who, when, where, how, why) pour structurer les explications d'une façon simple, intuitive et facile à comprendre par n'importe quel type d'utilisateurs. Nous proposons une solution générique qui peut être appliquée dans différents domaines d’applications métiers. Néanmoins, trois expérimentions ont été conduite pour valider cette proposition dans le contexte d'un grand bâtiment de recherche.
... Yaman [126] uses three independent subsystems to extract 5Ws answers which are then combined using a trained SVM model. Even though many models [60,126] utilize supervised learning models for 5Ws question answering, it is difficult to obtain ground truth data for domain-specific articles. ...
... [126] uses three independent subsystems to extract 5Ws answers which are then combined using a trained SVM model. Even though many models [60,126] utilize supervised learning models for 5Ws question answering, it is difficult to obtain ground truth data for domain-specific articles. In this thesis, we focus on lexico-syntactic features in combination with domain knowledge to better extract the 5Ws corresponding to an event from text. ...
Textual understanding is the process of automatically extracting accurate high-quality information from text. The amount of textual data available from different sources such as news, blogs and social media is growing exponentially. These data encode significant latent information which if extracted accurately can be valuable in a variety of applications such as medical report analyses, news understanding and societal studies. Natural language processing techniques are often employed to develop customized algorithms to extract such latent information from text. Journalistic 5Ws refer to the basic information in news articles that describes an event and include where, when, who, what and why. Extracting them accurately may facilitate better understanding of many social processes including social unrest, human rights violations, propaganda spread, and population migration. Furthermore, the 5Ws information can be combined with socio-economic and demographic data to analyze state and trajectory of these processes. In this thesis, a data driven pipeline has been developed to extract the 5Ws from text using syntactic and semantic cues in the text. First, a classifier is developed to identify articles specifically related to social unrest. The classifier has been trained with a dataset of over 80K news articles. We then use NLP algorithms to generate a set of candidates for the 5Ws. Then, a series of algorithms to extract the 5Ws are developed. These algorithms based on heuristics leverage specific words and parts-of-speech customized for individual Ws to compute their scores. The heuristics are based on the syntactic structure of the document as well as syntactic and semantic representations of individual words and sentences. These scores are then combined and ranked to obtain the best answers to Journalistic 5Ws. The classification accuracy of the algorithms is validated using a manually annotated dataset of news articles.
... A score that combines these two LMs is computed for the answers of each of the modules and the 5-W answers of the higher scoring module are accepted. As an alter- native approach to the problem of multiple 5W answers is pre- sented in [3]. The authors develop three independent systems and extract useful features from the answers returned by each answer. ...
... Our first 5-W question answering system uses the syntactic parses with function tags produced by the University of Mary- land parser that was specifically designed to handle ASR out- puts (see [5,3] for details) and works in two steps. The first step is a cascade of several operations to determine one top- level predicate: detecting and marking quotations, removing conditional clauses, processing conjunction of sentences and conjunction of verb phrases so that only one top-level predi- cate remains, and detecting passive sentences. ...
Full-text available
This chapter details the first component of person-oriented framing analysis: target concept analysis. This component aims to find and resolve mentions of persons, which can be subject to media bias. The chapter introduces and discusses two approaches for this task. First, the chapter introduces an approach for event extraction. The approach extracts answers to the journalistic 5W1H questions, i.e., who did what, when, where, why, and how. The in-text answers to these questions describe a news article’s main event. Afterward, the chapter introduces an approach that is the first to resolve highly context-dependent coreferences across news articles as they commonly occur in the presence of sentence-level bias forms. Our approach can resolve mentions that are coreferential also only in coverage on the same event and that otherwise may even be contradictory, such as “attack” or “self-defense” and “riot” or “protest.” Lastly, the chapter argues for using the latter approach for the target concept analysis component, in particular because of its high classification performance. Another reason for our decision is that using the event extraction approach in the target concept analysis component would require the development of a subsequent approach, i.e., to compare the events extracted from individual articles and resolve them across all articles.
Full-text available
In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.
Full-text available
We present GLARF, a framework for repre- senting three linguistic levels and systems for generating this representation. We focus on a logical level, like LFG's F-structure, but com- patible with Penn Treebanks. While less fine- grained than typical semantic role labeling ap- proaches, our logical structure has several ad- vantages: (1) it includes all words in all sen- tences, regardless of part of speech or seman- tic domain; and (2) it is easier to produce ac- curately. Our systems achieve 90% for En- glish/Japanese News and 74.5% for Chinese News - these F-scores are nearly the same as those achieved for treebank-based parsing.
Full-text available
In this paper we describe the web and mobile-phone interfaces to our multi-language factoid question answering (QA) system together with a prototype speech interface to our English-language QA sys-tem. Using a statistical, data-driven ap-proach to factoid question answering has allowed us to develop QA systems in five languages in a matter of months. In the web-based system, which is accessible at, we have com-bined the QA system output with standard search-engine-like results by integrating it with an open-source web search engine. The prototype speech interface is based around a VoiceXML application running on the Voxeo developer platform. Recog-nition of the user's question is performed on a separate speech recognition server dedicated to recognizing questions. An adapted version of the Sphinx-4 recog-nizer is used for this purpose. Once the question has been recognized correctly it is passed to the QA system and the re-sulting answers read back to the user by speech synthesis. Our approach is mod-ular and makes extensive use of open-source software. Consequently, each com-ponent can be easily and independently improved and easily extended to other lan-guages.
Conference Paper
Full-text available
This paper reports on the QAST track of CLEF aiming to evaluate Question Answering on Speech Transcriptions. Accessing information in spoken documents provides additional challenges to those of text-based QA, needing to address the characteristics of spoken language, as well as errors in the case of automatic transcriptions of s pontaneous speech. The framework and results of the pilot QAst evaluation held as part of CLEF 2007 is described, illustrating some of the additional challenges posed by QA in spoken documents relative to written ones. The current plans for future multiple-language and multiple-task QAst evaluations are described.
Conference Paper
We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. First, we present a novel coarse-to-fine method in which a grammar's own hierarchical projections are used for incremental pruning, including a method for ef- ficiently computing projections of a grammar with- out a treebank. In our experiments, hierarchical pruning greatly accelerates parsing with no loss in empirical accuracy. Second, we compare various inference procedures for state-split PCFGs from the standpoint of risk minimization, paying particular attention to their practical tradeoffs. Finally, we present multilingual experiments which show that parsing with hierarchical state-splitting is fast and accurate in multiple languages and domains, even without any language-specific tuning.
We present two language models based upon an "immediate-head" parser --- our name for a parser that conditions all events below a constituent c upon the head of c. While all of the most accurate statistical parsers are of the immediate-head variety, no previous grammatical language model uses this technology. The perplexity for both of these models significantly improve upon the trigram model baseline as well as the best previous grammar-based language model. For the better of our two models these improvements are 24% and 14% respectively. We also suggest that improvement of the underlying parser should significantly improve the model's perplexity and that even in the near term there is a lot of potential for improvement in immediate-head language models.