Experiments with Interactive Question-Answering.
ABSTRACT This paper describes a novel framework for interactive question-answering (Q/A) based on predictive questioning. Gen- erated off-line from topic representations of complex scenarios, predictive ques- tions represent requests for information that capture the most salient (and diverse) aspects of a topic. We present experimen- tal results from large user studies (featur- ing a fully-implemented interactive Q/A system named FERRET) that demonstrates that surprising performance is achieved by integrating predictive questions into the context of a Q/A dialogue.
- SourceAvailable from: Kirk E Roberts[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we describe how Lan-guage Computer Corporation's GISTEX-TER question-directed summarization sys-tem combines multiple strategies for ques-tion decomposition and summary genera-tion in order to produce summary-length answers to complex questions. In addi-tion, we introduce a novel framework for question-directed summarization that uses a state-of-the-art textual entailment sys-tem (Hickl et al., 2006) in order to se-lect a single responsive summary answer from amongst a number of candidate sum-maries. We show that by considering en-tailment relationships between sentences extracted for a summary, we can automati-cally create semantic "Pyramids" that can be used to identify answer passages that are both relevant and responsive.
- [Show abstract] [Hide abstract]
ABSTRACT: Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.Journal of Artificial Intelligence Research 02/2014; 46(1). · 0.90 Impact Factor
Conference Paper: Learning good decompositions of complex questions[Show abstract] [Hide abstract]
ABSTRACT: This paper proposes a supervised approach for automatically learning good decompositions of complex questions. The training data generation phase mainly builds on three steps to produce a list of simple questions corresponding to a complex question: i) the extraction of the most important sentences from a given set of relevant documents (which contains the answer to the complex question), ii) the simplification of the extracted sentences, and iii) their transformation into questions containing candidate answer terms. Such questions, considered as candidate decompositions, are manually annotated (as good or bad candidates) and used to train a Support Vector Machine (SVM) classifier. Experiments on the DUC data sets prove the effectiveness of our approach.Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems; 06/2012
Proceedings of the 43rd Annual Meeting of the ACL, pages 205–214,
Ann Arbor, June 2005. c ?2005 Association for Computational Linguistics
Experiments with Interactive Question-Answering
Sanda Harabagiu, Andrew Hickl, John Lehmann, and Dan Moldovan
Language Computer Corporation
Richardson, Texas USA
This paper describes a novel framework
for interactive question-answering (Q/A)
based on predictive questioning.
erated off-line from topic representations
of complex scenarios, predictive ques-
tions represent requests for information
that capture the most salient (and diverse)
aspects of a topic. We present experimen-
tal results from large user studies (featur-
ing a fully-implemented interactive Q/A
system named FERRET) that demonstrates
that surprising performance is achieved by
integrating predictive questions into the
context of a Q/A dialogue.
In this paper, we propose a new architecture for
interactive question-answering based on predictive
questioning. We present experimental results from
a currently-implemented interactive Q/A system,
named FERRET, that demonstrates that surprising
performance is achieved by integrating sources of
topic information into the context of a Q/A dialogue.
In interactive Q/A, professional users engage in
extended dialogues with automatic Q/A systems in
order to obtain information relevant to a complex
scenario. Unlike Q/A in isolation, where the per-
formance of a system is evaluated in terms of how
well answers returned by a system meet the specific
information requirements of a single question, the
performance of interactive Q/A systems have tradi-
tionally been evaluated by analyzing aspects of the
dialogue as awhole. Q/Adialogues have been evalu-
ated in terms of (1) efficiency, defined as the number
of questions that the user must pose to find particu-
lar information, (2) effectiveness, defined by the rel-
evance of the answers returned, (3) user satisfaction.
In order to maximize performance in these three
areas, interactive Q/A systems need a predictive di-
alogue architecture that enables them to propose re-
lated questions about the relevant information that
could be returned to a user, given a domain of inter-
est. We argue that interactive Q/A systems depend
on three factors: (1) the effective representation of
the topic of a dialogue, (2) the dynamic recognition
of the structure of the dialogue, and (3) the ability to
return relevant answers to a particular question.
In this paper, we describe results from experi-
ments we conducted with our own interactive Q/A
system, FERRET, under the auspices of the ARDA
AQUAINT1program, involving 8 different dialogue
scenarios and more than 30 users. The results pre-
sented here illustrate the role of predictive question-
ing in enhancing the performance of Q/A interac-
In the remainder of this paper, we describe a new
architecture for interactive Q/A. Section 2 presents
the functionality of several of FERRET’s modules
and describes the NLP techniques it relies upon. In
Section 3, we present one of the dialogue scenar-
ios and the topic representations we have employed.
Section 4 highlights the management of the inter-
action between the user and FERRET, while Sec-
tion 5 presents the results of evaluating our proposed
1AQUAINT is an acronym for Advanced QUestion Answer-
ing for INTelligence.
Online Question Answering
Off−line Question Answering
Figure 1: FERRET - A Predictive Interactive Question-Answering Architecture.
model, and Section 6 summarizes the conclusions.
2 Interactive Question-Answering
We have found that the quality of interactions pro-
duced by an interactive Q/A system can be greatly
enhanced by predicting the range of questions that
a user might ask in the context of a given topic.
If a large database of topic-relevant questions were
available for a wide variety of topics, the accuracy
of a state-of-the-art Q/A system such as (Harabagiu
et al., 2003) could be enhanced.
In FERRET, our interactive Q/A system, we store
such “predicted” pairs of questions and answers in a
database known as the Question Answer Database
(or QUAB). FERRET uses this large set of topic-
relevant question-and-answer pairs to improve the
interaction with the user by suggesting new ques-
tions. For example, when a user asks a question
like (Q1) (as illustrated in Table 1), FERRET returns
an answer to the question (A1) and proposes (Q2),
(Q3), and (Q4) as suggestions of possible continua-
tions of the dialogue. Users then choose how to con-
tinue the interaction by either (1) ignoring the sug-
gestions made by the system and proposing a differ-
ent question, or by (2) selecting one of the proposed
questions and examining its answer.
Figure 1 illustrates the architecture of FERRET.
The interactions are managed by a dialogue shell,
which processes questions by transforming them
into their corresponding predicate-argument struc-
The data collection used in our experiments was
2We have employed the same representation of predicate-
argument structures as those encoded in PropBank. We use a
semantic parser (described in (Surdeanu et al., 2003)) that rec-
ognizes predicate-argument structures.
(Q1) What weapons are included in Egypt’s stockpiles?
(A1) The Israelis point to comments made by former President Anwar Sadat,
who in 1970 stated that Egypt has biological weapons stored in
refrigerators ready to use against Israel if need be. The program might
include ”plague, botulism toxin, encephalitis virus, anthrax,
Rift Valley fever and mycotoxicosis.”
(Q2) Where did Egypt inherit its fi rst stockpiles of chemical weapons?
(Q3) Is there evidence that Egypt has dismantled its stockpiles of weapons?
(Q4) Where are Egypt’s weapons stockpiles located?
(Q5) Who oversees Egypt’s weapons stockpiles?
Table 1: User question and proposed questions from QUABs
made available by the Center for Non-Proliferation
Modules from the FERRET’s dialogue shell inter-
act with modules from the predictive dialogue block.
Central to the predictive dialogue is the topic repre-
sentation for each scenario, which enables the pop-
ulation of a Predictive Dialogue Network (PDN).
The PDN consists of a large set of questions that
were asked or predicted for each topic. It is a net-
work because questions are related by “similarity”
links, which are computed by the Question Simi-
larity module. The topic representation enables an
Information Extraction module based on (Surdeanu
and Harabagiu, 2002) to find topic-relevant infor-
mation in the document collection and to use it as
answers for the QUABs. The questions associated
with each predicted answer are generated from pat-
terns that are related to the extraction patterns used
for identifying topic relevant information. The qual-
ity of the dialog between the user and FERRET de-
pends on the quality of the topic representations and
the coverage of the QUABs.
3The Center for Non-Proliferation Studies at the Monterrey
Institute of International Studies distributes collections of print
and online documents on weapons of mass destruction. More
information at: http://cns.miis.edu.
Serving as a background to the scenarios, the following list contains subject areas that may be relevant
to the scenarios under examination, and it is provided to assist the analyst in generating questions.
1) Country Profile
2) Government: Type of, Leadership, Relations
3) Military Operations: Army, Navy, Air Force, Leaders, Capabilities, Intentions
4) Allies/Partners: Coalition Forces
5) Weapons: Chemical, Biological, Materials, Stockpiles, Facilities, Access, Research Efforts, Scientists
6) Citizens: Population, Growth Rate, Education
7) Industrial: Major Industrires, Exports, Power Sources
8) Economics: Growth Domestic Product, Growth Rate, Imports
9) Threat Perception: Border and Surrounding States, International, Terrorist Groups
10) Behaviour: Threats, Invasions, Sponsorship and Harboring of Bad Actors
11) Transportation Infrastructure: Kilometers of Road, Rail, Air Runways, Harbors and Ports, Rivers
12) Beliefs: Ideology, Goals, Intentions
14) Behaviour: Threats to use WMDs, Actual Usage, Sophistication of Attack, Anectodal or Simultaneous
15) Weapons: Chemical, Bilogical, Materials, Stockpiles, Facilities, Access
SCENARIO: Assessment of Egypt’s Biological Weapons
As terrorist Activity in Egypt increases, the Commander
of the United States Army believes a better understanding
of Egypt’s Military capabilities is needed. Egypt’s
biological weapons database needs to be updated to
correspond with the Commander’s request. Focus your
investigation on Egypt’s access to old technology,
assistance received from the Soviet Union for development
of their pharmaceutical infrastructure, production of
toxins and BW agents, stockpiles, exportation of these
materials and development technology to Middle Eastern
countries, and the effect that this information will have on
the United States and Coalition Forces in the Middle East.
Please incorporate any other related information to
Figure 2: Example of a Dialogue Scenario.
3 Modeling the Dialogue Topic
Our experiments in interactive Q/A were based on
several scenarios that were presented to us as part
of the ARDA Metrics Challenge Dialogue Work-
shop. Figure 2 illustrates one of these scenarios. It
is to be noted that the general background consists
of a list of subject areas, whereas the scenario is a
narration in which several sub-topics are identified
(e.g. production of toxins or exportation of materi-
als). The creation of scenarios for interactive Q/A
requires several different types of domain-specific
knowledge and a level of operational expertise not
available to most system developers. In addition to
identifying a particular domain of interest, scenar-
ios must specify the set of relevant actors, outcomes,
and related topics that are expected to operate within
the domain of interest, the salient associations that
may exist between entities and events in the sce-
nario, and the specific timeframe and location that
bound the scenario in space and time. In addition,
real-world scenarios also need to identify certain op-
erational parameters as well, such as the identity of
the scenario’s sponsor (i.e. the organization spon-
soring the research) and audience (i.e. the organiza-
tion receiving the information), as well as a series of
evidence conditions which specify how much verifi-
cation information must be subject to before it can
be accepted as fact. We assume the set of sub-topics
mentioned in the general background and the sce-
nario can be used together to define a topic structure
that willgovern future interactions with the Q/A sys-
tem. In order to model this structure, the topic rep-
resentation that we create considers separate topic
signatures for each sub-topic.
The notion of topic signatures wasfirst introduced
in (Lin and Hovy, 2000). For each subtopic in a sce-
nario, given (a) documents relevant to the sub-topic
and (b) documents not relevant to the subtopic, a sta-
tistical method based on the likelihood ratio is used
to discover a weighted list of the most topic-specific
concepts, known as the topic signature. Later work
by (Harabagiu, 2004) demonstrated that topic sig-
natures can be further enhanced by discovering the
most relevant relations that exist between pairs of
concepts. However, both of these types of topic rep-
resentations are limited by the fact that they require
the identification of topic-relevant documents prior
to the discovery of the topic signatures. In our ex-
periments, we were only presented with a set of doc-
uments relevant to a particular scenario; no further
relevance information was provided for individual
subject areas or sub-topics.
In order to solve the problem of finding relevant
documents for each subtopic, we considered four
Approach 1: All documents in the CNS col-
lection were initially clustered using K-Nearest
Neighbor (KNN) clustering (Dudani, 1976).
Each cluster that contained at least one key-
word that described the sub-topic was deemed
relevant to the topic.
Approach 2: Since individual documents may
contain discourse segments pertaining to differ-
ent sub-topics, we first used TextTiling (Hearst,
1994) to automatically segment all of the doc-
uments in the CNS collection into individual
text tiles. These individual discourse segments
then served as input to the KNN clustering al-
gorithm described in Approach 1.
Approach 3: In this approach, relevant docu-
ments were discovered simultaneously with the
discovery of topic signatures. First, we asso-
ciated a binary seed relation
by hand and using the method presented in
(Harabagiu, 2004).) Since seed relations are by
definition relevant to a particular subtopic, they
can be used to determine a binary partition of
the document collection
set of documents
sented in (Yangarber et al., 2000), a topic sig-
nature (as calculated by (Harabagiu, 2004)) is
then produced for the set of documents in
For each subtopic
alogue scenario, documents relevant to a cor-
responding seed relation
defined in (Yangarber et al., 2000)). If
resents the set of documents where
nized, then the density criterion can be defined
a new topic signature is calculated for
tions extracted from the newtopic signature can
then be used to determine a new document par-
tition by re-iterating the discovery of the topic
signature and of the documents relevant to each
??? for each each
? . (Seed relations were created both
into (1) a relevant
??? (that is, the documents rel-
? )and (2)aset ofnon-relevant
??? . Inspired by the method pre-
? defined as part of the di-
? are added to
??? meets the density criterion (as
??? is recog-
? ???. Once
is added to
??? , then
? . Rela-
Approach 4: Approach 4 implements the tech-
nique described in Approach 3, but operates
at the level of discourse segments (or texttiles)
rather than at the level of full documents. As
with Approach 2, segments were produced us-
ing the TextTiling algorithm.
In modeling the dialogue scenarios, we consid-
ered three types of topic-relevant relations:
structural relations, which represent hypernymy
or meronymy relations between topic-relevant con-
cepts, (2) definition relations, which uncover the
characteristic properties of a concept, and (3) ex-
traction relations, which model the most relevant
events or states associated with a sub-topic.
though structural relations and definition relations
are discovered reliably using patterns available from
our Q/A system (Harabagiu et al., 2003), we found
only extraction relations to be useful in determining
the set of documents relevant to a subtopic. Struc-
tural relations were available from concept ontolo-
gies implemented in the Q/A system. The definition
relations were identified by patterns used for pro-
cessing definition questions.
Extraction relations are discovered by processing
documents in order to identify three types of rela-
tions, including: (1) syntactic attachment relations
(including subject-verb, object-verb, and verb-PP
relations), (2) predicate-argument relations, and (3)
salience-based relations that can be used to encode
long-distance dependencies between topic-relevant
concepts. (Salience-based relations are discovered
using a technique first reported in (Harabagiu, 2004)
which approximates a Centering Theory-style ap-
proach (Kameyama, 1997) to the resolution of
Subtopic: Egypt’s production of toxins and BW agents
produce − phosphorous trichloride (TOXIN)
house − ORGANIZATION
cultivate − non−pathogenic Bacilus Subtilis (TOXIN)
produce − mycotoxins (TOXIN)
acquire − FACILITY
Subtopic: Egypt’s allies and partners
provide − COUNTRY
cultivate − COUNTRY
supply − precursors
cooperate − COUNTRY
train − PERSON
supply − know−how
Figure 3: Example of two topic signatures acquired
for the scenario illustrated in Figure 2.
We made the extraction relations associated with
each topic signature more general (a) by replacing
words with their (morphological) root form (e.g.
wounded with wound, weapons with weapon), (b)
by replacing lexemes with their subsuming category
from an ontology of 100,000 words (e.g. truck is re-
placed by VEHICLE, ARTIFACT, or OBJECT), and (c)
by replacing each name with its name class (Egypt
with COUNTRY). Figure 3 illustrates the topic sig-
natures resulting for the scenario illustrated in Fig-
Once extraction relations were obtained for a par-
ticular set of documents, the resulting set of re-
lations were ranked according to a method pro-
posed in (Yangarber, 2003). Under this approach,
the score associated with each relation is given by:
resents the cardinality of the documents where the
relation is identified, and
port associated with the relation
fined as the sum of the relevance of each document
of a document that contains a topic-significant re-
lation can be defined as:
represents the topic signature
of the subtopic4. The accuracy of the relation, then,
is given by:
??? , where
??? represents sup-
??? is de-
? . The relevance
???9? , where
evance of a subtopic
We use a different learner for each subtopic in or-
der to train simultaneously on each iteration. (The
calculation of topic signatures continues to iterate
until there are no more relations that can be added
to the overall topic signature.) When the precision
of a relation to a subtopic
into account the negative evidence of its relevance
to any other subtopic
the relation is not included in the topic signature,
where relations are ranked by the score
?9? . Here,
? measures the rel-
? to a particular document
? measures the relevance of
? is computed, it takes
B . If
Representing topics in terms of relevant concepts
and relations is important for the processing of ques-
tions asked within the context of a given topic. For
interactive Q/A, however, the ideal topic-structured
representation would be in the form of question-
answer pairs (QUABs) that model the individual
segments of the scenario. We have currently cre-
ated two sets of QUABs: a handcrafted set and
an automatically-generated set. For the manually-
created set of QUABs, 4 linguists manually gener-
ated 3210 question-answer pairs for each of the 8
dialogue scenarios considered in our experiments.
In a separate effort, we devised a process for au-
tomatically populating the QUAB for each scenario.
In order to generate question-answer pairs for each
subtopic, we first identified relevant text passages in
the document collection to serve as “answers” and
then generated individual questions that could be an-
relations can be added with each iteration.
contains only the seed relation. Additional
swered by each answer passage.
Answer Identification: We defined an an-
swer passage as a contiguous sequence of sentences
with a positive answer rank and a passage price
of4. To select answer passages for each sub-
? , we calculate an answer rank,
???Z? , that sums across the scores of each
relation from the topic signature that is identified in
the same text window. Initially, the text window
is set to one sentence. (If the sentence is part of a
quote, however, the text window is immediately ex-
panded to encompass the entire sentence that con-
tains the quote.) Each passage with
then considered to be a candidate answer passage.
The text window of each candidate answer passage
is then expanded to include the following sentence.
If the answer rank does not increase with the addi-
tion of the succeeding sentence, then the price (
the candidate answer passage is incremented by 1,
otherwise it is decremented by 1. The text window
of each candidate answer passage continues to ex-
answers can be considered by the Question Genera-
tion module, answer passages with a positive price
are stripped of the last sentences.
In the early 1970s, Egyptian President Anwar Sadat
validates that Egypt has a BW stockpile.
?_^ . Before the ranked list of candidate
arguments: A0 = E2: Answer Type: Definition
A1 = P2: have
arguments: A0 = E3
A1 = E4
ArgM−TMP: E1: Answer Type: Time
Reference 4 (relational)
Definition Pattern: Who is X?
Q1: Who is Anwar Sadat?
Pattern: When did E3 P1 to P2 E4?
Q2: When did Egypt validate to having BW stockpiles?
Pattern: When did E3 P3 to P2 E4?
Q3: When did Egypt admit to having BW stockpiles?
Pattern: When did E3 P3 to P2 E5?
Q4: When did Egypt admint to having a BW program?
Egyptian President X
Reference 2 (metonymic)
Reference 3 (part−whole)
E5: BW program
E1: "in the early 1970s"; Category: TIME
E2: "Egyptian President Anwar Sadat"; Category: PERSON
E3: "Egypt"; Category: COUNTRY
E4: "BW stockpile"; Category: UNKNOWN
2 predicates: P1="validate"; P2="has"
Reference 1 (definitional)
Figure 4: Associating Questions with Answers.
Question Generation: In order to automati-
cally generate questions from answer passages, we
considered the following two problems:
Problem 1: Every word in an answer passage
can refer to an entity, a relation, or an event. In
order for question generation be successful, we
must determine whether a particular reference