Conference Paper

TREC 2005 Question Answering Experiments at Tokyo Institute of Technology.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we describe Tokyo Institute of Technology's speech group's first attempt at the TREC2005 question an- swering track which placed us eleventh overall among the best systems of the 30 participants in the track. All our eval- uation systems were based on novel, non-linguistic, data- driven approaches to question answering. Our main focus was on the factoid task and we describe in detail one of the new models used in this year's evaluation runs. The list task was treated as a simple extension of the factoid task while the other question task was treated as an automatic summa- rization problem by important sentence selection. Our best system on the factoid task gave 21.3% correct in first place; our best result on the list task was an average F-score of 0.069 and on the other question task a best average F-score of 0.138.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper we present our unified approach to question answering in different languages and describe our experiments on the Japanese language NTCIR-3 Question Answering Challenge (QAC-1) tasks 1 and 2. The model we use for Japanese language question answering (QA) is identical to the one we have applied successfully to English language QA on the TREC tasks [10]. Our QA system is based on a statistical, non-linguistic, data-driven approach to question answering, which uses the N -gram statistics from a large collection of example questions with corresponding answers (q-and-a) and large amounts of text data in which to find an answer. ...
... In Section 2 we give the highlights of our statistical classification approach to QA which is described more completely in [10]. In Section 3 we describe the experimental setup and present the results obtained on NTCIR-3 QAC-1 tasks 1 and 2. In Section 4 we discuss the results and conclude in Section 5. ...
... This is guaranteed to give us the optimal answer in a maximum likelihood sense if the probability distribution is the correct one. Making various conditional independence assumptions as described in [10] to simplify modelling, we obtain the final optimisation criterion: ...
Article
Full-text available
We present our unified approach to question an-swering in different languages and describe our ex-periments on the Japanese language NTCIR-3 Ques-tion Answering Challenge (QAC-1) tasks 1 and 2. The model we use for Japanese language question answer-ing (QA) is identical to the one we have applied suc-cessfully on the English language TREC QA tasks, based on a novel statistical, non-linguistic and data-driven approach to question answering. Using this method on the formal run of QAC-1 we obtain an MRR of 0.340 on task 1 and an average F-score of 0.159. The top1 accuracy of 26.5% compares very well with results obtained using an identical approach on the TREC evaluations.
... This section is re-produced verbatim from the paper "TREC2005 Question Answering Experiments at Tokyo Institute of Technology" [3]. ...
... Note that the value of i is simply the base10 number that represents the binary encoding of the active features in X i .3 A linear interpolation of models, which borrows directly from statistical language modeling techniques for speech recognition, was found to give retrieval performance approximately twice that of a naive-Bayes or log-linear formulation. ...
Article
Full-text available
In this paper 1 , we give an overview of the data-driven and non-linguistic approach to open-domain factoid question answering (QA) that has been developed over the past 4 years at Tokyo Institute of Technology and which culmi-nated this year in our participation in three international evaluations of QA technology at TREC (for English), CLEF (for Spanish and French) and NTCIR (for Japanese). In TREC2005 we placed 11th out of a total of 30 participants in the factoid QA task and in TREC2006 we came 9th out of 27 participants with an accuracy of 25.1%. While our performance in the official CLEF QA tracks was poor due to a large number of unsupported answers, the performance on an informal "real-time" Spanish QA exercise was one of the best. At the time of writing no results have yet been released for Japanese.
... In this paper, we describe the application of our datadriven and non-linguistic framework for the factoid QA task of TREC2006 that was applied succesfully in the TREC2005 Question Answering (QA) track [6]. For convenience we copy verbatim the exposition of our mathematical model for question answering in Section 2. ...
... This section is re-produced verbatim from the paper "TREC2005 Question Answering Experiments at Tokyo Institute of Technology" [6]. ...
Article
In this paper we describe Tokyo Institute of Technology's speech group's second attempt at the TREC2006 question answering (QA) track. Keeping the same theoretical QA model as for the TREC2005 task this year we investigated combinations of variations of models focusing once again on the factoid QA task. An experimental run combin-ing translated answers from separate English, French and Spanish systems proved inconclusive. However, our best combination of all component models gave us a factoid performance of 25.1% (which was well above the median of participating systems of 18.6%) and an overall perfor-mance including the results from the list and other question tasks of 11.6% (which was somewhat below the median of 13.4%).
... Les systèmes présentés par le Tokyo Institute of Technology [Whittaker, et al. 2005a ;Whittaker, et al. 2006] sont un peu l'opposé de ceux du LCC. L'idée est de construire un système purement statistique sans aucune connaissance linguistique. ...
Article
The objective of this work is to introduce new robust approaches to handle the problem of Question Answering in an open-domain, interactive setup. Our first contribution is the design and implementation of a generic rules-based engine for language analysis. That engine is open to any kind of analysis, within the limits of its internal representation, and leverages an heavy structuring of the analysis. Our second contribution is the design and implementation of a Question-Answering system which main strengths are the flexibility of the input, the robustness and the explicit performance control. These characteristics have been reached through an end-to-end integration of the language analysis results, allowing to manipulate structures build by that analysis only, without having to go back to the individual words. Another advance, and it is one of its main originalities, is an abstraction of the request, enabling its flexibility and making diagnostic and maintenance easier. We participated to a number of international evaluation campaigns where our system achieved excellent results. In particular they have shown a good robustness to automatic speech recognition induced errors. It is important to note that our aim has been reached. The Question-Answering system has the necessary capabilities to be integrated in an interactive system. It is used in the Ritel project and allowed some preliminary experiences aiming at studying the human behavior in front of such a system, and human-machine interaction in general.
... The approach to factoid question answering (QA) that we adopt was first described in (Whittaker et al., 2005b) where the details of the mathematical model and how it was trained for English were given. The approach has been successfully evaluated in the 2005 text retrieval conference (TREC) question answering track evaluations (Voorhees and Trang Dang, 2005) where our group placed eleventh out of thirty participants (Whittaker et al., 2005a). Although the TREC QA task is substantially different to web-based QA this evaluation showed that our approach works and provides an objective assessment of its quality. ...
... The framework itself was covered in detail and initial results were presented in [9]. Since then we have taken part in the TREC 2005 QA system evaluations organised by NIST where we placed 11th out of the best systems from the 30 participants in the task [7] with an answer accuracy of around 26% when supporting document accuracy is ignored. A significant part of the motivation for our approach to QA was that it could be ported to other languages with minimal effort. ...
Article
In this paper we report on the progress we have made in de-veloping a unified framework for automatic factoid question answering (QA). Our approach to question answering has now been successfully applied to the five distinct languages of English, Japanese, Chinese, Swedish and Russian. These systems form the core of our multilingual web-based QA system which has been online since December 2005 and is publicly accessible at http:asked.jp. In this paper we discuss, in particular, the improvements we have made to the system in terms of answering response speed.
... The approach to factoid question answering (QA) that we adopt was first described in (Whittaker et al., 2005b) where the details of the mathematical model and how it was trained for English were given. The approach has been successfully evaluated in the 2005 text retrieval conference (TREC) question answering track evaluations (Voorhees and Trang Dang, 2005) where our group placed eleventh out of thirty participants (Whittaker et al., 2005a). Although the TREC QA task is substantially different to web-based QA this evaluation showed that our approach works and provides an objective assessment of its quality. ...
Article
Full-text available
In this paper we describe the web and mobile-phone interfaces to our multi-language factoid question answering (QA) system together with a prototype speech interface to our English-language QA sys-tem. Using a statistical, data-driven ap-proach to factoid question answering has allowed us to develop QA systems in five languages in a matter of months. In the web-based system, which is accessible at http://asked.jp, we have com-bined the QA system output with standard search-engine-like results by integrating it with an open-source web search engine. The prototype speech interface is based around a VoiceXML application running on the Voxeo developer platform. Recog-nition of the user's question is performed on a separate speech recognition server dedicated to recognizing questions. An adapted version of the Sphinx-4 recog-nizer is used for this purpose. Once the question has been recognized correctly it is passed to the QA system and the re-sulting answers read back to the user by speech synthesis. Our approach is mod-ular and makes extensive use of open-source software. Consequently, each com-ponent can be easily and independently improved and easily extended to other lan-guages.
... and largely language independent QA framework for the QAst track, which was similar but not identical to that which we used in previous QA evaluations such as TREC 2006, CLEF 2006, NTCIR 2006 This approach, which is detailed in [11,12,13] centers on a noisy-channel model of the QA problem and generally speaking relies on the redundancy of answer data in the target corpus in order to identify and extract correct answers. ...
Conference Paper
Full-text available
In this paper we present the experiments performed at Tokyo Institute of Technology for the CLEF2006 Multiple Language Question Answering (QA@CLEF) track. Our approach to QA centres on a non-linguistic, data-driven, statistical classification model that uses the redundancy of the web to find correct answers. For the cross-language aspect we employed publicly available web-based text translation tools to translate the question from the source into the corresponding target language, then used the corresponding mono-lingual QA system to find the answers. The hypothesised correct answers were then projected back on to the appropriate closed-domain corpus. Correct and supported answer performance on the mono-lingual tasks was around 14% for both Spanish and French. Performance on the cross-language tasks ranged from 5% for Spanish-English, to 12% for French-Spanish. Our method of projecting answers onto documents was shown not to work well: in the worst case on the French-English task we lost 84% of our otherwise correct answers. Ignoring the need for correct support information the exact answer accuracy increased to 29% and 21% correct on the Spanish and French mono-lingual tasks, respectively.
... As all the modules produce a ranking on one way or other, we use Zipf's Law to convert ranks into probabilities. A simple version of this idea is an arithmetic average of inverse ranks which was proposed in (Whittaker et al., 2005). ...
Conference Paper
Full-text available
We present our new statistically-inspired open-domain Q&A research system that allows to carry out a wide range of ex- periments easily and flexibly by modify- ing a central file containing an experimen- tal "recipe" that controls the activation and parameter selection of a range of widely- used and custom-built components. Based on this, we report our experiments for the TREC 2006 question answering track, where we used a cascade of LM- based document retrieval, LM-based sen- tence extraction, MaxEnt-based answer extraction over a dependency relation rep- resentation followed by a fusion process that uses linear interpolation to integrate evidence from various data streams to de- tect answers to factoid questions more ac- curately than the median of all partici- pants.
... The training is stopped after at least 40 matches from different pages have been identified. Although the attempts to formalize the estimation of patterns and candidate answers accuracies within a probabilistic framework exist (Downey et al., 2005;Whittaker et al., 2005), their suggested models have not been empirically shown to be superior to simple heuristic models such as the one used here. ...
Article
Full-text available
The World Wide Web has become a vital supplier of information that allows organizations to carry on such tasks as business intelligence, security monitoring, and risk assessments. Having a quick and reliable supply of correct facts from perspective is often mission critical. By following design science guidelines, we have explored ways to recombine facts from multiple sources, each with possibly different levels of responsiveness and accuracy, into one robust supply chain. Inspired by prior research on keyword-based meta-search engines (e.g., metacrawler.com), we have adapted the existing question answering algorithms for the task of analysis and triangulation of facts. We present a first prototype for a meta approach to fact seeking. Our meta engine sends a user's question to several fact seeking services that are publicly available on the Web (e.g., ask.com, brainboost.com, answerbus.com, NSIR, etc.) and analyzes the returned results jointly to identify and present to the user those that are most likely to be factually correct. The results of our evaluation on the standard test sets widely used in prior research support the evidence for the following: 1) the value-added of the meta approach: its performance surpasses the performance of each supplier, 2) the importance of using fact seeking services as suppliers to the meta engine rather than keyword driven search portals, and 3) the resilience of the meta approach: eliminating a single service does not noticeably impact the overall performance. We show that these properties make the meta-approach a more reliable supplier of facts than any of the currently available stand-alone services.
Article
Full-text available
Los Sistemas de Búsqueda de Respuestas (SBR) amplían las capacidades de un buscador de información tradicional con la capacidad de encontrar respuestas precisas a las preguntas del usuario. El objetivo principal es facilitar el acceso a la información y disminuir el tiempo y el esfuerzo que el usuario debe emplear para encontrar una información concreta en una lista de documentos relevantes. En esta investigación se han abordado dos trabajos relacionados con los SBR. La primera parte presenta una arquitectura para SBR en castellano basada en la combinación y adaptación de diferentes técnicas de Recuperación y de Extracción de Información. Esta arquitectura está integrada por tres módulos principales que incluyen el análisis de la pregunta, la recuperación de pasajes relevantes y la extracción y selección de respuestas. En ella se ha prestado especial atención al tratamiento de las Entidades Nombradas puesto que, con frecuencia, son el tema de las preguntas o son buenas candidatas como respuestas. La propuesta se ha encarnado en el SBR del grupo MIRACLE que ha sido evaluado de forma independiente durante varias ediciones en la tarea compartida CLEF@QA, parte del foro de evaluación competitiva Cross-Language Evaluation Forum (CLEF). Se describen aquí las participaciones y los resultados obtenidos entre 2004 y 2007. El SBR de MIRACLE ha obtenido resultados moderados en el desempeño de la tarea con tasas de respuestas correctas entre el 20% y el 30%. Entre los resultados obtenidos destacan los de la tarea principal de 2005 y la tarea piloto de Búsqueda de Respuestas en tiempo real de 2006, RealTimeQA. Esta última tarea, además de requerir respuestas correctas incluía el tiempo de respuesta como un factor adicional en la evaluación. Estos resultados respaldan la validez de la arquitectura propuesta como una alternativa viable para los SBR sobre colecciones textuales y también corrobora resultados similares para el inglés y otras lenguas. Por otro lado, el análisis de los resultados a lo largo de las diferentes ediciones de CLEF así como la comparación con otros SBR apunta nuevos problemas y retos. Según nuestra experiencia, los sistemas de QA son más complicados de adaptar a otros dominios y lenguas que los sistemas de Recuperación de Información. Este problema viene heredado del uso de herramientas complejas de análisis de lenguaje como analizadores morfológicos, sintácticos y semánticos. Entre estos últimos se cuentan las herramientas para el Reconocimiento y Clasificación de Entidades Nombradas (NERC en inglés) así como para la Detección y Clasificación de Relaciones (RDC en inglés). Debido a la di cultad de adaptación del SBR a distintos dominios y colecciones, en la segunda parte de esta tesis se investiga una propuesta diferente basada en la adquisición de conocimiento mediante métodos de aprendizaje ligeramente supervisado. El objetivo de esta investigación es adquirir recursos semánticos útiles para las tareas de NERC y RDC usando colecciones de textos no anotados. Además, se trata de eliminar la dependencia de herramientas de análisis lingüístico con el fin de facilitar que las técnicas sean portables a diferentes dominios e idiomas. En primer lugar, se ha realizado un estudio de diferentes algoritmos para NERC y RDC de forma semisupervisada a partir de unos pocos ejemplos (bootstrapping). Este trabajo propone primero una arquitectura común y compara diferentes funciones que se han usado en la evaluación y selección de resultados intermedios, tanto instancias como patrones. La principal propuesta es un nuevo algoritmo que permite la adquisición simultánea e iterativa de instancias y patrones asociados a una relación. Incluye también la posibilidad de adquirir varias relaciones de forma simultánea y mediante el uso de la hipótesis de exclusividad obtener mejores resultados. Como característica distintiva el algoritmo explora la colección de textos con una estrategia basada en indización, que permite adquirir conocimiento de grandes colecciones. La estrategia de selección de candidatos y la evaluación se basan en la construcción de un grafo de instancias y patrones, que justifica nuestro método para la selección de candidatos. Este procedimiento es semejante al frente de exploración de una araña web y permite encontrar las instancias más parecidas a las semillas con las evidencias disponibles. Este algoritmo se ha implementado en el sistema SPINDEL y para su evaluación se ha comenzado con el caso concreto de la adquisición de recursos para las clases de Entidades Nombradas más comunes, Persona, Lugar y Organización. El objetivo es adquirir nombres asociados a cada una de las categorías así como patrones contextuales que permitan detectar menciones asociadas a una clase. Se presentan resultados para la adquisición de dos idiomas distintos, castellano e inglés, y para el castellano, en dos dominios diferentes, noticias y textos de una enciclopedia colaborativa, Wikipedia. En ambos casos el uso de herramientas de análisis lingüístico se ha limitado de acuerdo con el objetivo de avanzar hacia la independencia de idioma. Las listas adquiridas mediante bootstrapping parten de menos de 40 semillas por clase y obtienen del orden de 30.000 instancias de calidad variable. Además se obtienen listas de patrones indicativos asociados a cada clase de entidad. La evaluación indirecta confirma la utilidad de ambos recursos en la clasificación de Entidades Nombradas usando un enfoque simple basado únicamente en diccionarios. La mejor configuración obtiene para la clasificación en castellano una medida F de 67,17 y para inglés de 55,99. Además se confirma la utilidad de los patrones adquiridos que en ambos casos ayudan a mejorar la cobertura. El módulo requiere menor esfuerzo de desarrollo que los enfoques supervisados, si incluimos la necesidad de anotación, aunque su rendimiento es inferior por el momento. En definitiva, esta investigación constituye un primer paso hacia el desarrollo de aplicaciones semánticas como los SBR que requieran menos esfuerzo de adaptación a un dominio o lenguaje nuevo.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Question Answering (QA) systems add new capabilities to traditional search engines with the ability to find precise answers to user questions. Their objective is to enable easier information access by reducing the time and effort that the user requires to find a concrete information among a list of relevant documents. In this thesis we have carried out two works related with QA systems. The first part introduces an architecture for QA systems for Spanish which is based on the combination and adaptation of different techniques from Information Retrieval (IR) and Information Extraction (IE). This architecture is composed by three modules that include question analysis, relevant passage retrieval and answer extraction and selection. The appropriate processing of Named Entities (NE) has received special attention because of their importance as question themes and candidate answers. The proposed architecture has been implemented as part of the MIRACLE QA system. This system has taken part in independent evaluations like the CLEF@QA track in the Cross-Language Evaluation Forum (CLEF). Results from 2004 to 2007 campaigns as well as the details and the evolution of the system have been described in deep. The MIRACLE QA system has obtained moderate performance with a first answer accuracy ranging between 20% and 30%. Nevertheless, it is important to highlight the results obtained in the 2005 main QA task and the RealTimeQA pilot task in 2006. The last one included response time as an important additional variable of the evaluation. These results back the proposed architecture as an option for QA from textual collection and confirm similar findings obtained for English and other languages. On the other hand, the analysis of the results along evaluation campaigns and the comparison with other QA systems point problems with current systems and new challenges. According to our experience, it is more dificult to tailor QA systems to different domains and languages than IR systems. The problem is inherited by the use of complex language analysis tools like POS taggers, parsers and other semantic analyzers, like NE Recognition and Classification (NERC) and Relation Detection and Characterization (RDC) tools. The second part of this thesis tackles this problem and proposes a different approach to adapting QA systems for di erent languages and collections. The proposal focuses on acquiring knowledge for the semantic analyzers based on lightly supervised approaches. The goal is to obtain useful resources that help to perform NERC or RDC using as few annotated resources as possible. Besides, we try to avoid dependencies from other language analysis tools with the purpose that these methods apply to different languages and domains. First of all, we have study previous work on building NERC and RDC modules with few supervision, particularly bootstrapping methods. We propose a common framework for different bootstrapping systems that help to unify different evaluation functions for intermediate results. The main proposal is a new algorithm that is able to simultaneously acquire instances and patterns associated to a relation of interest. It also uses mutual exclusion among relations to reduce concept drift and achieve better results. A distinctive characteristic is that it uses a query based exploration strategy of the text collection which enables their use for larger collections. Candidate selection and evaluation are based on incrementally building a graph of instances and patterns which also justifies our evaluation function. The discovery approach is analogous to the front of exploration in a web crawler and it is able to find the most similar instances to the available seeds. This algorithm has been implemented in the SPINDEL system. We have selected for evaluation the task of acquiring resources for the most common NE classes, Person, Location and Organization. The objective is to acquire name instances that belong to any of the classes as well as contextual patterns that help to detect mentions of NE that belong to that class. We present results for the acquisition of resources from raw text from two different languages, Spanish and English. We also performed experiments for Spanish in two different collections, news and texts from a collaborative encyclopedia, Wikipedia. Both cases are tackled with limited language analysis tools and resources. With an initial list of 40 instance seeds, the bootstrapping process is able to acquire large name lists containing up to 30.000 instances with a variable quality. Besides, large lists of indicative patterns are obtained too. Our indirect evaluation confirms the utility of both resources to classify NE using a simple dictionary recognition approach. Best results for Spanish obtained a F-score of 67,17 and for English this value is 55,99. The module requires much less development effort than annotation for supervised algorithms although the performance is not in pair yet. This research is a first step towards the development of semantic applications like QA for a new language or domain with no annotated corpora that requires less adaptation effort.
Article
Full-text available
We present a preliminary analysis of the use of Word- Net hypernyms for answering "What-is" questions. We analyse the approximately 130 definitional questions in the TREC10 corpus with respect to our technique of Virtual Annotation (VA), which has previously been shown to be effective on the TREC9 definitional ques- tion set and other questions. We discover that VA is effective on a subset of the TREC10 definitional ques- tions, but that some of these questions seem to need a user model to generate correct answers, or at least an- swers that agree with the NIST judges. Furthermore, there remains a large enough subset of definitional questions that cannot benefit at all from the WordNet isa-hierarchy, prompting the need to investigate alterna- tive external resources.
Conference Paper
Full-text available
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR) using proximity and question type features achieves a total reciprocal document rank of .20 on the TREC 8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Article
Full-text available
This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing the relation between question and answer with a statistical model. For the purpose of learning this relation, we propose two sources of data: Usenet FAQ documents and customer service call-center dialogues from a large retail company. We will show that the task of "answer-finding" differs from both document retrieval and traditional questionanswering, presenting challenges different from those found in these problems. The central aim of this work is to discover, through theoretical and empirical investigation, those statistical techniques best suited to the answer-finding problem.
Article
Full-text available
The increased complexity of the TREC QA questions requires advanced text processing tools that rely on natural language processing and knowledge reasoning. This paper presents the suite of tools that account for the performance of the PowerAnswer question answering system. It is shown how questions, answers and world knowledge are transformed first in logic representation, followed by a systematic and rigorous logic proof that validly answers questions posed to the QA system. At TREC QA 2002, PowerAnswer obtained a confidence-weighted score of 0.856, answering correctly 415 out of 500 questions.
Article
Full-text available
In this paper, we document our e#orts to extend our statistical question answering system for TREC-11. We incorporated a web search feature, and novel extensions of statistical machine translation as well as extracting lexical patterns for exact answers from a supervised corpus. Without modification to our base set of thirty-one categories, we were able to achieve a confidence weighted score of 0.455 and an accuracy of 29%. We improved our model on selecting exact answers by insisting on exact answers in the training corpus and this resulted in a 7% gain on TREC-11 but a much larger gain of 46% on TREC-10.
Article
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 on the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Conference Paper
We present a strategy for answering fact-based natural language questions that is guided by a characterization of real-world user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Knowledge annotation is an approach to answering large classes of frequently occurring questions by utilizing semi\-structured and structured Web sources. Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges. We have integrated these two different paradigms into a question answering system capable of providing users with concise answers that directly address their information needs.
Conference Paper
In this paper we treat question answering (QA) as a classification problem. Our motivation is to build systems for many languages without the need for highly tuned linguistic modules. Consequently, word tokens and Web data are used extensively but no explicit linguistic knowledge is incorporated. A mathematical model for answer retrieval, answer classification and answer length prediction is derived. The TREC 2002 QA task is used for system development where 33% of questions are answered correctly. Performance is then evaluated on the factoid questions of the TREC 2003 QA task where 23% of questions were answered correctly, which would rank the system in the top 10 of contemporary QA systems on the same task.
Conference Paper
Following recent developments in the au- tomatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a mea- sure called POURPRE, for automatically evaluating answers to definition questions. Until now, the only way to assess the cor- rectness of answers to such questions in- volves manual determination of whether an information nugget appears in a sys- tem's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
Conference Paper
In this paper we describe and evaluate a Ques- tion Answering system that goes beyond an- swering factoid questions. We focus on FAQ- like questions and answers, and build our sys- tem around a noisy-channel architecture which exploits both a language model for answers and a transformation model for an- swer/question terms, trained on a corpus of 1 million question/answer pairs collected from the Web.
Conference Paper
This paper proposes a new automatic speech summarization method having two stages: important sentence extraction and sentence compaction. Relatively important sentences are extracted based on the amount of information and the confidence measures of constituent words, and the set of extracted sentences is compressed by our sentence compaction method. The sentence compaction is performed by selecting a word set that maximizes a summarization score consisting of the amount of information and the confidence measure of each word, the linguistic likelihood of word strings, and the word concatenation probability. The selected words are concatenated to create a summary. Effectiveness of the proposed method was confirmed by summarizing a spontaneous presentation.
Article
In this paper, we show that we can obtain a good baseline performance for Question Answering (QA) by using only 4 simple features. Using these features, we contrast two approaches used for a Maximum Entropy based QA system. We view the QA problem as a classification problem and as a re-ranking problem. Our results indicate that the QA system viewed as a re-ranker clearly outperforms the QA system used as a classifier. Both systems are trained using the same data.
Article
Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This suggests that perhaps a better indexing model would help solve the problem. However, we feel that making unwarranted parametric assumptions will not lead to better retrieval performance. Furthermore, making prior assumptions about the similarity of documents is not warranted either. Instead, we propose an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to modeling is non-parametric and integrates document indexing and document retrieval into a single model. One advantage of our approach is that collection statistics which are used heuristically in many other retrieval models are an integral part of our model. We have...
Article
We describe the architecture of a question answering system and systematically evaluate contributions of different system components to accuracy. The system differs from most question answering systems in its dependency on data redundancy rather than sophisticated linguistic analyses of either questions or candidate answers. Because a wrong answer is often worse than no answer, we also explore strategies for predicting when the question answering system is likely to give an incorrect answer.
Article
This paper describes recent development in the Webclopedia QA system, focusing on the use of knowledge resources such as WordNet and a QA typology to improve the basic operations of candidate answer retrieval, ranking, and answer matching.
Article
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online. Most question answering systems use a wide variety of linguistic resources. We focus instead on the redundancy available in large corpora as an important resource. We use this redundancy to simplify the query rewrites that we need to use, and to support answer mining from returned snippets. Our system performs quite well given the simplicity of the techniques being utilized. Experimental results show that question answering accuracy can be greatly improved by analyzing more and more matching passages. Simple passage ranking and n-gram extraction techniques work well in our system making it efficient to use with many backend retrieval engines.
Article
We introduce a probabilistic noisychannel model for question answering and we show how it can be exploited in the context of an end-to-end QA system. Our noisy-channel system outperforms a stateof -the-art rule-based QA system that uses similar resources. We also show that the model we propose is flexible enough to accommodate within one mathematical framework many QA-specific resources and techniques, which range from the exploitation of WordNet, structured, and semi-structured databases to reasoning, and paraphrasing.
Automatically Evaluating Answers to Definition Questions Technical Report LAMP-TR-119/CS-TR-4695/UMIACS-TR-2005-04 Question Answering from the Web Using Knowledge Annotation and Knowledge Mining Techniques
  • J Lin
  • D Demner
  • J Fushman
  • B Lin
  • Katz
J. Lin and D. Demner-Fushman. Automatically Evaluating Answers to Definition Questions. Technical Report LAMP-TR-119/CS-TR-4695/UMIACS-TR-2005-04, University of Maryland, 2005. [10] J. Lin and B. Katz. Question Answering from the Web Using Knowledge Annotation and Knowledge Mining Techniques. In Proceedings of Twelfth International Conference on Information and Knowledge Management (CIKM 2003), 2003.
Knowledge Master Educational Software
  • A Hallmarks
A. Hallmarks. Knowledge Master Educational Software. PO Box 998, Durango, CO 81302 http://www.greatauk.com/, 2002.