Proceedings of the 43rd Annual Meeting of the ACL, pages 205–214,
Ann Arbor, June 2005. c ?2005 Association for Computational Linguistics
Experiments with Interactive Question-Answering
Sanda Harabagiu, Andrew Hickl, John Lehmann, and Dan Moldovan
Language Computer Corporation
Richardson, Texas USA
This paper describes a novel framework
for interactive question-answering (Q/A)
based on predictive questioning.
erated off-line from topic representations
of complex scenarios, predictive ques-
tions represent requests for information
that capture the most salient (and diverse)
aspects of a topic. We present experimen-
tal results from large user studies (featur-
ing a fully-implemented interactive Q/A
system named FERRET) that demonstrates
that surprising performance is achieved by
integrating predictive questions into the
context of a Q/A dialogue.
In this paper, we propose a new architecture for
interactive question-answering based on predictive
questioning. We present experimental results from
a currently-implemented interactive Q/A system,
named FERRET, that demonstrates that surprising
performance is achieved by integrating sources of
topic information into the context of a Q/A dialogue.
In interactive Q/A, professional users engage in
extended dialogues with automatic Q/A systems in
order to obtain information relevant to a complex
scenario. Unlike Q/A in isolation, where the per-
formance of a system is evaluated in terms of how
well answers returned by a system meet the specific
information requirements of a single question, the
performance of interactive Q/A systems have tradi-
tionally been evaluated by analyzing aspects of the
dialogue as awhole. Q/Adialogues have been evalu-
ated in terms of (1) efficiency, defined as the number
of questions that the user must pose to find particu-
lar information, (2) effectiveness, defined by the rel-
evance of the answers returned, (3) user satisfaction.
In order to maximize performance in these three
areas, interactive Q/A systems need a predictive di-
alogue architecture that enables them to propose re-
lated questions about the relevant information that
could be returned to a user, given a domain of inter-
est. We argue that interactive Q/A systems depend
on three factors: (1) the effective representation of
the topic of a dialogue, (2) the dynamic recognition
of the structure of the dialogue, and (3) the ability to
return relevant answers to a particular question.
In this paper, we describe results from experi-
ments we conducted with our own interactive Q/A
system, FERRET, under the auspices of the ARDA
AQUAINT1program, involving 8 different dialogue
scenarios and more than 30 users. The results pre-
sented here illustrate the role of predictive question-
ing in enhancing the performance of Q/A interac-
In the remainder of this paper, we describe a new
architecture for interactive Q/A. Section 2 presents
the functionality of several of FERRET’s modules
and describes the NLP techniques it relies upon. In
Section 3, we present one of the dialogue scenar-
ios and the topic representations we have employed.
Section 4 highlights the management of the inter-
action between the user and FERRET, while Sec-
tion 5 presents the results of evaluating our proposed
1AQUAINT is an acronym for Advanced QUestion Answer-
ing for INTelligence.
Online Question Answering
Off−line Question Answering
Figure 1: FERRET - A Predictive Interactive Question-Answering Architecture.
model, and Section 6 summarizes the conclusions.
2 Interactive Question-Answering
We have found that the quality of interactions pro-
duced by an interactive Q/A system can be greatly
enhanced by predicting the range of questions that
a user might ask in the context of a given topic.
If a large database of topic-relevant questions were
available for a wide variety of topics, the accuracy
of a state-of-the-art Q/A system such as (Harabagiu
et al., 2003) could be enhanced.
In FERRET, our interactive Q/A system, we store
such “predicted” pairs of questions and answers in a
database known as the Question Answer Database
(or QUAB). FERRET uses this large set of topic-
relevant question-and-answer pairs to improve the
interaction with the user by suggesting new ques-
tions. For example, when a user asks a question
like (Q1) (as illustrated in Table 1), FERRET returns
an answer to the question (A1) and proposes (Q2),
(Q3), and (Q4) as suggestions of possible continua-
tions of the dialogue. Users then choose how to con-
tinue the interaction by either (1) ignoring the sug-
gestions made by the system and proposing a differ-
ent question, or by (2) selecting one of the proposed
questions and examining its answer.
Figure 1 illustrates the architecture of FERRET.
The interactions are managed by a dialogue shell,
which processes questions by transforming them
into their corresponding predicate-argument struc-
The data collection used in our experiments was
2We have employed the same representation of predicate-
argument structures as those encoded in PropBank. We use a
semantic parser (described in (Surdeanu et al., 2003)) that rec-
ognizes predicate-argument structures.
(Q1) What weapons are included in Egypt’s stockpiles?
(A1) The Israelis point to comments made by former President Anwar Sadat,
who in 1970 stated that Egypt has biological weapons stored in
refrigerators ready to use against Israel if need be. The program might
include ”plague, botulism toxin, encephalitis virus, anthrax,
Rift Valley fever and mycotoxicosis.”
(Q2) Where did Egypt inherit its fi rst stockpiles of chemical weapons?
(Q3) Is there evidence that Egypt has dismantled its stockpiles of weapons?
(Q4) Where are Egypt’s weapons stockpiles located?
(Q5) Who oversees Egypt’s weapons stockpiles?
Table 1: User question and proposed questions from QUABs
made available by the Center for Non-Proliferation
Modules from the FERRET’s dialogue shell inter-
act with modules from the predictive dialogue block.
Central to the predictive dialogue is the topic repre-
sentation for each scenario, which enables the pop-
ulation of a Predictive Dialogue Network (PDN).
The PDN consists of a large set of questions that
were asked or predicted for each topic. It is a net-
work because questions are related by “similarity”
links, which are computed by the Question Simi-
larity module. The topic representation enables an
Information Extraction module based on (Surdeanu
and Harabagiu, 2002) to find topic-relevant infor-
mation in the document collection and to use it as
answers for the QUABs. The questions associated
with each predicted answer are generated from pat-
terns that are related to the extraction patterns used
for identifying topic relevant information. The qual-
ity of the dialog between the user and FERRET de-
pends on the quality of the topic representations and
the coverage of the QUABs.
3The Center for Non-Proliferation Studies at the Monterrey
Institute of International Studies distributes collections of print
and online documents on weapons of mass destruction. More
information at: http://cns.miis.edu.
% of Top 5 Responses
Relevant to User Q
% of Top 1 Responses
Relevant to User Q
Table 6: Effectiveness of dialogs
generated pairs were submitted to human assessors
who annotated each as “relevant” or irrelevant to the
user’s query. Aggregate scores are presented in Ta-
Approach % of Top 5
to User Q
Approach 140.01% 0.295
Approach 236.00% 0.243
Approach 3 44.62%0.271
Table 7: Quality of QUABs acquired automatically
% of Top 5
to User Q
User Satisfaction Users were consistently satis-
fied with their interactions with FERRET. In all three
experiments, respondents claimed that they found
that FERRET (1) gave meaningful answers, (2) pro-
vided useful suggestions, (3) helped answer spe-
cific questions, and (4) promoted their general un-
derstanding of the issues considered in the scenario.
Complete results of this study are presented in Ta-
Helped with specifi c questions
Make good use of questions
Gave new scenario insights
Gave good collection coverage
Stimulated user thinking
Easy to use
Gave meaningful answers
Helped with new search methods
Provided novel suggestions
Is ready for work environment
Would speed up work
Overall like of system
Table 8: User Satisfaction Survey Results
We believe that the quality of Q/A interactions de-
pends on the modeling of scenario topics. An ideal
model is provided by question-answer databases
(QUABs) that are created off-line and then used to
completely describes the system
1-does not describe the system, 5-
make suggestions to a user of potential relevant con-
tinuations of a discourse. In this paper, we have
presented FERRET, an interactive Q/A system which
makes use of a novel Q/A architecture that integrates
QUAB question-answer pairs into the processing of
questions. Experiments with FERRET have shown
that, in addition to being rapidly adopted by users as
valid suggestions, the incorporation of QUABs into
Q/A can greatly improve the overall accuracy of an
interactive Q/A dialogue.
S. Dudani. 1976. The distance-weighted k-nearest-neighbour
rule. IEEE Transactions on Systems, Man, and Cybernetics,
S. Harabagiu, D. Moldovan, C. Clark, M. Bowden, J. Williams,
and J. Bensley. 2003. Answer Mining by Combining Ex-
traction Techniques with Abductive Reasoning. In Proceed-
ings of the Twelfth Text Retrieval Conference (TREC 2003).
Sanda Harabagiu. 2004. Incremental Topic Representations.
In Proceedings of the 20th COLING Conference, Geneva,
Marti Hearst. 1994. Multi-Paragraph Segmentation of Exposi-
tory Text. In Proceedings of the 32nd Meeting of the Associ-
ation for Computational Linguistics, pages 9–16.
Megumi Kameyama. 1997. Recognizing Referential Links: An
Information Extraction Perspective. In Workshop of Opera-
tional Factors in Practical, Robust Anaphora Resolution for
Unrestricted Texts, (ACL-97/EACL-97), pages 46–53.
Chin-Yew Lin and Eduard Hovy. 2000. The Automated Acqui-
sition of Topic Signatures for Text Summarization. In Pro-
ceedings of the 18th COLING Conference, pages 495–501.
S. Lytinen and N. Tomuro. 2002. The Use of Question Types
to Match Questions in FAQFinder. In Papers from the 2002
AAAI Spring Symposium on Mining Answers from Texts and
Knowledge Bases, pages 46–53.
Srini Narayanan and Sanda Harabagiu. 2004. Question An-
swering Based on Semantic Structures. In Proceedings of
the 20th COLING Conference, Geneva, Switzerland.
Mihai Surdeanu and Sanda M. Harabagiu. 2002. Infratructure
for open-domanin information extraction. In Conference for
Human Language Technology (HLT-2002).
Mihai Surdeanu, Sanda M. Harabagiu, John Williams, and Paul
Aarseth. 2003. Using predicate-argument structures for in-
formation extraction. In ACL, pages 8–15.
Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja
Huttunen. 2000. Automatic Acquisition of Domain Knowl-
edge for Information Extraction. In Proceedings of the 18th
COLING Conference, pages 940–946.
Roman Yangarber. 2003. Counter-Training in Discovery of
Semantic Patterns. In Proceedings of the 41th Meeting of the
Association for Computational Linguistics, pages 343–350.