Article

Translating Wh-questions into Logical Queries

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In this paper we present ORAKEL, a natural language interface which trans-lates wh-questions into logical queries and evaluates them with respect to a given knowledge base. The system is in principle able to deal with arbitrary logical lan-guages and knowledge representation paradigms, i.e. relational models, frame-based models, etc. However, in this paper we present a concrete implementation based on F-Logic and Ontobroker as underlying inference engine.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The natural language interface to an inference engine, i.e. OntoBroker, is not part of this paper, as it is described in [5].2 . The objective of applying our method is to transform (a) an arbitrary input table into (b) an F-Logic frame with formalized data which (c) subsequently supports the ontology population and query answering using an inference engine OntoBroker [12]. ...
... Here we will not describe how the conversion of table data into an object base is performed, nor the translation of the natural language query into its F-Logic equivalent, which is dealt with in [5], but will rather give two query examples with respective results. For better intuitiveness we will present each query by the following items: ...
Article
Full-text available
The tremendous success of the World Wide Web is countervailed by efforts needed to search and find relevant information. For tabular structures embedded in HTML documents typical keyword or link-analysis based search fails. The Semantic Web relies on annotating resources such as documents by means of ontologies and aims to overcome the bottleneck of finding relevant information. Turning the current Web into a Semantic Web requires automatic approaches for annotation since manual approaches will not scale in general. Most efforts have been devoted to automatic generation of ontologies from text, but with quite limited success. However, tabular structures require additional efforts, mainly because understanding of table contents requires a table structures comprehension task and a semantic interpretation task, which exceeds in complexity the linguistic task. The focus of this paper is on auto-matic transformation and generation of semantic (F-Logic) frames from table-like structures. The presented work consists of a methodology, an accompanying imple-mentation (called TARTAR) and a thorough evaluation. It is based on a grounded cognitive table model which is stepwise instantiated by the methodology. A typi-cal application scenario is the automatic population of ontologies to enable query answering over arbitrary tables (e.g. HTML tables).
... The natural language interface to an inference engine, i.e. OntoBroker, is not part of this paper, as it is described in [5].2 . The objective of applying our method is to transform (a) an arbitrary input table into (b) an F-Logic frame with formalized data which (c) subsequently supports the ontology population and query answering using an inference engine OntoBroker [12]. ...
... Here we will not describe how the conversion of table data into an object base is performed, nor the translation of the natural language query into its F-Logic equivalent, which is dealt with in [5], but will rather give two query examples with respective results. For better intuitiveness we will present each query by the following items: ...
Article
The tremendous success of the World Wide Web is countervailed by efforts needed to search and find relevant information. For tabular structures embedded in HTML documents, typical keyword or link-analysis based search fails. The Semantic Web relies on annotating resources such as documents by means of ontologies and aims to overcome the bottleneck of finding relevant information. Turning the current Web into a Semantic Web requires automatic approaches for annotation since manual approaches will not scale in general. Most efforts have been devoted to automatic generation of ontologies from text, but with quite limited success. However, tabular structures require additional efforts, mainly because understanding of table contents requires the comprehension of the logical structure of the table on the one hand, as well as its semantic interpretation on the other. The focus of this paper is on the automatic transformation and generation of semantic (F-Logic) frames from table-like structures. The presented work consists of a methodology, an accompanying implementation (called TARTAR) and a thorough evaluation. It is based on a grounded cognitive table model which is stepwise instantiated by the methodology. A typical application scenario is the automatic population of ontologies to enable query answering over arbitrary tables (e.g. HTML tables).
... Currently, the SPARQL language essentially supports only conjunctive queries such that the above query would not be translatable to SPARQL. A direct translation to some target formalism as performed in [Cimiano, 2003] is also possible, but clearly such an approach is not as flexi-ble as the one pursued within ORAKEL. Currently, our system supports two formalisms used in the Semantic Web, the Web Ontology Language (OWL) 5 with the query language SPARQL 6 as well as F-Logic as ontology language together with its corresponding query language [Kifer et al., 1995] . ...
... In [Cimiano, 2003 ] the author presents a approach to map natural-language whquestions into F(rame)-logic queries based on Montague-style compositional semantics where semantic representation is constructed on the basis of Lexicalized Tree Adjoining ...
... A[x], in which [x] indicates that the variable x occurs free in A. We follow that line and consider concepts, and a fortiori contexts and actions, to be expressions of a lambda calculus with types. A typed version of the lambda calculus with two basic types 6 , i.e., e for individuals and t for truth values [52], has been widely used in natural language processing [30,54,65,17]. However, it has been noticed that the expressiveness of such a theory could be significantly enhanced with the help of types [3], and better, with dependent types [57,18,6]. ...
Article
In the area of knowledge representation, a challenging topic is the formalization of context knowledge on the basis of logical foundations and ontological semantics. However, most attempts to provide a formal model of contexts suffer from a number of difficulties, such as limited expressiveness of representation, restricted variable quantification, lack of (meta) reasoning about properties, etc. In addition, type theory originally developed for formal modeling of mathematics has also been successfully applied to the correct specification of programs and in the semantics of natural language. In this paper, we suggest a type theoretical approach to the problem of context and action modeling. Type theory is used both for representing the system's knowledge of the discourse domain and for reasoning about it. For that purpose, we extend an existing dependent type theory having nice properties, with context-based rules and appropriate inductive types. We claim that the resulting theory exploiting the power of dependent types is able to provide a very expressive system together with a unified theory allowing higher-order reasoning.
... tation behind the KAON2 system supports only conjunctive queries such that the above query would not be translatable to SPARQL in our system. A direct translation to some target formalism as performed in [12] is also possible, but clearly such an approach is not as flexible as the one pursued within ORAKEL. Currently, our system supports two formalisms used in the Semantic Web, the Web Ontology Language (OWL) 7 with the query language SPARQL 8 as well as F-Logic as ontology language together with its corresponding query language [34]. ...
Article
The customization of a natural language interface to a certain application, domain or knowledge base still represents a major effort for end users given the current state-of-the-art. In this article, we present our natural language interface ORAKEL, describe its architecture, design choices and implementation. In particular, we present ORAKEL’s adaptation model which allows users which are not familiar with methods from natural language processing (NLP) or formal linguistics to port a natural language interface to a certain domain and knowledge base. The claim that our model indeed meets our requirement of intuitive adaptation is experimentally corroborated by diverse experiments with end users showing that non-NLP experts can indeed create domain lexica for our natural language interface leading to similar performance compared to lexica engineered by NLP experts.
Article
In the domain of ontology design as well as in Knowledge Representation, modeling universals is a challenging problem. Most approaches that have addressed this problem rely on Description Logics DLs but many difficulties remain, due to under-constrained representation which reduces the inferences that can be drawn and further causes problems in expressiveness. In mathematical logic and program checking, type theories have proved to be appealing but, so far they have not been applied in the formalization of ontologies. To bridge this gap, we present in this paper a theory for representing ontologies in a dependently-typed framework which relies on strong formal foundations including both a constructive logic and a functional type system. The language of this theory defines in a precise way what ontological primitives such as classes, relations, properties, etc., and thereof roles, are. The first part of the paper details how these primitives are defined and used within the theory. In a second part, we focus on the formalization of the role primitive. A review of significant role properties leads to the specification of a role profile and most of the remaining work details through numerous examples, how the proposed theory is able to fully satisfy this profile. It is demonstrated that dependent types can model several non-trivial aspects of roles including a formal solution for generalization hierarchies, identity criteria for roles and other contributions. A discussion is given on how the theory is able to cope with many of the constraints inherent in a good role representation.
Article
Cognitive situation awareness has recently caught the attention of the information fusion community. Some approaches have developed formalizations that are both ontology-based and underpinned with Situation Theory. While the semantics of Situation Theory is very attractive from the cognitive point of view, the languages that are used to express knowledge and to reason with suffer from a number of limitations concerning both expressiveness and reasoning capabilities. In this paper we propose a more general formal foundation denoted S-DTT (Situation-based Dependent Type Theory) that is expressed with the language of the Extended Calculus of Constructions (ECC), a widely used theory in mathematical formalization and in software validation. Situation awareness relies on small blocks of knowledge called situation fragment types whose composition leads to a very expressive and unifying theory. The semantic part is provided by an ontology that is rooted in the S-DTT theory and, on which higher-order reasoning can be performed. The basis of the theory is summarized and its expressing power is illustrated with numerous examples. A scenario in the healthcare context for patient safety issues is detailed and a comparison with well-known approaches is discussed.
Conference Paper
In this paper we present ORAKEL, a natural language interface which translates wh-questions into logical queries and evaluates them with respect to a given knowledge base. For this purpose, ORAKEL makes use of a compositional approach in order to construct the semantics of a wh-question. The system is in principle able to deal with arbitrary logical languages and knowledge representation paradigms, i.e. relational models, frame-based models, etc. However, in this paper we present a concrete implementation based on F-Logic and Ontobroker as underlying inference engine.
Article
Full-text available
This paper presents the result of an experimental system aimed at performing a robust semantic analysis of analyzed speech input in the are of information system access. The goal of this experiment was to investigate the eoeectiveness of such a system in a pipelined architecture, where no control is possible over the morpho-syntactic analysis which precedes the semantic analysis and query formation. The approach taken used technology from robust syntactic parsing applied to a sequence of parse trees rather than a sequence of lexical items, together with a constraint-based logical inference system that evaluated logical hypotheses about the query using fuzzy combination criteria.
Article
Full-text available
We propose a novel formalism, called Frame Logic (abbr., F-logic), that accounts in a clean and declarative fashion for most of the structural aspects of object-oriented and frame-based languages. These features include object identity, complex objects, inheritance, polymorphic types, query methods, encapsulation, and others. In a sense, F-logic stands in the same relationship to the object-oriented paradigm as classical predicate calculus stands to relational programming. F-logic has a model-theoretic semantics and a sound and complete resolution-based proof theory. A small number of fundamental concepts that come from object-oriented programming have direct representation in F-logic; other, secondary aspects of this paradigm are easily modeled as well. The paper also discusses semantic issues pertaining to programming with a deductive object-oriented language based on a subset of F-logic.
Article
This paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.
Article
This paper is an introduction to natural language interfaces to databases (NLIDBS). A brief overview of the history of NLIDBS is first given. Some advantages and disadvantages of NLIDBS are then discussed, comparing NLIDBS to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBS have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBS), and NLIDBS with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBS. The paper ends with reflections on the current state of the art.
Article
Ambiguity is ubiquitous and one of the major prob-lems in NLP. It occurs at the lexical level, in cer-tain syntactic constructions, meaning assignment and also pragmatically in determining the purpose of a particular sentence. Each component of an NLP system thus faces the problem of ambiguity control. At the levels of tokenizing, part-of-speech tag-ging and morphological analysis, the progress that has been made within linguistics and within compu-tational linguistics over the past years has brought systems into reach that are able to cope with the am-biguity problem at their own level. This success is mainly due to the development of statistical disam-biguation techniques and inductive methodologies to derive linguistic knowledge from large data sam-ples. To cope with the ambiguity problem at the level of syntax, packed parse forests are used to rep-resent sets of grammatical representations and statis-tical algorithms allow in addition to put probabilistic weights on analyzes. The experimental and inductive methodologies as well as the amount of linguistic knowledge needed for the development of a natural language interpre-tation system varies, however, between its differ-ent levels of analysis. The derivation of detailed lexical knowledge that is relevant to the construc-tion of semantic representations is a significantly more complex task than the derivation of knowledge needed for a computational grammar or morphol-ogy. And so is the task of dealing with ambiguity control within semantics and pragmatics. Within semantics the problem of ambiguities has
Article
This article describes TEAM, a transportable natural-language interface system. TEAM was constructed to test the feasibility of building a natural-language system that could be adapted to interface with new databases by users who are not experts in natural-language processing. An overview of the system design is presented, emphasizing those choices that were imposed by the demands of transportability. Several general problems of natural-language processing that were faced in constructing the system are discussed, including quantifier scoping, various pragmatic issues, and verb acquisition. TEAM is compared with several other transportable systems; this comparison includes a discussion of the range of natural language handled by each as well as a description of the approach taken to achieving transportability in each system.
Conference Paper
Aspects of an intelligent interface that provides natural language access to a large body of data distributed over a computer network are described. The overall system architecture is presented, showing how a user is buffered from the actual database management systems (DBMSs) by three layers of insulating components. These layers operate in series to convert natural language queries into calls to DBMSs at remote sites. Attention is then focused on the first of the insulating components, the natural language system. A pragmatic approach to language access that has proved useful for building interfaces to databases is described and illustrated by examples. Special language features that increase system usability, such as spelling correction, processing of incomplete inputs, and run-time system personalization, are also discussed. The language system is contrasted with other work in applied natural language processing, and the system’s limitations are analyzed.
Article
This paper is a discussion of the technical issues and solutions encountered in making the ASK System transportable. A natural language system can be “transportable” in a number of ways. Although transportability to a new domain is most prominent, other ways are also important if the system is to have viability in the commercial marketplace. On the one hand, transporting a system to a new domain may start with the system prior to adding any domain of knowledge and extend it to incorporate the new domain. On the other hand, one may wish to add to a system that already has knowledge of one domain the knowledge concerning a second domain, that is, to extend the system to cover this second domain. In the context of ASK, it has been natural to implement extending and then achieve transportability as a special case. In this paper, we consider six ways in which the ASK System can be extended to include new capabilities: Special-purpose applications, such as those to accommodate standard office tasks, would make use of these various means of extension.
Article
We present Logical Description Grammar (LDG), a model ofgrammar and the syntax-semantics interface based on descriptions inelementary logic. A description may simultaneously describe the syntacticstructure and the semantics of a natural language expression, i.e., thedescribing logic talks about the trees and about the truth-conditionsof the language described. Logical Description Grammars offer a naturalway of dealing with underspecification in natural language syntax andsemantics. If a logical description (up to isomorphism) has exactly onetree plus truth-conditions as a model, it completely specifies thatgrammatical object. More common is the situation, corresponding tounderspecification, in which there is more than one model. A situation inwhich there are no models corresponds to an ungrammatical input.
Article
this paper, we will describe a tree generating system called tree-adjoining grammar (TAG) and state some of the recent results about TAGs. The work on TAGs is motivated by linguistic considerations. However, a number of formal results have been established for TAGs, which we believe, would be of interest to researchers in formal languages and automata, including those interested in tree grammars and tree automata.
Article
This report is mainly a documentation of the parser implementation and the underlying theoretical concepts and not a manual for the LoPar program. For the latter, the reader is referred to the online manual pages.
Article
. The World Wide Web (WWW) can be viewed as the largest multimedia database that has ever existed. However, its support for query answering and automated inference is very limited. Metadata and domain specific ontologies were proposed by several authors to solve this problem. We developed Ontobroker which uses formal ontologies to extract, reason, and generate metadata in the WWW. The paper describes the formalisms and tools for formulating queries, defining ontologies, extracting metadata, and generating metadata in the format of the Resource Description Framework (RDF), as recently proposed by the World Wide Web Consortium (W3C). These methods provide a means for semantic based query handling even if the information is spread over several sources. Furthermore, the generation of RDF descriptions enables the exploitation of the ontological information in RDF-based applications.
Article
this paper was presented at the Third International Conference on Very Large Data Bases, Tokyo, Japan, October 1977. Authors' address: Artificial Intelligence Center, SRI International, Menlo Park, CA 94025 1978 ACM 0362,5915/78/06000105 $00.75 ACM Transactions on Database Systems, Vol. 3, No. 2, June 1978, Pages 105-147. 106 G. G. Hendrix, E. D. Sacerdoti, D. Sagalowicz, and J. Slocum and who is thoroughly familiar with its file structure, the DBMSs on which it resides, how it is distributed among various computer systems, the coded field names for the data items, the kinds of values that different fields are expected to contain, and other idiosyncrasies. The technician must understand the decision maker's question, reformulate it in terms of the data that is actually stored, plan a sequence of requests for particular items from particular files on particular computers, open connections with remote sites, build programs to query the remote systems using the primitives of the DBMSs of the remote systems, monitor the execution of those programs, recover from errors, and correlate the results. This is a demanding, time-consuming, and exacting task requiring much attention to detail. Escalated levels of sophistication are needed as the VLDB increases in size and complexity and as it is distributed over a wider range of host computers. With the goal of making large, distributed databases directly available to decision makers (while freeing technicians from increasingly tedious details), a group of researchers at SRI International has developed a prototype system that, for many classes of questions, automates the procedures usually performed by technicians. This paper presents an overview of this system, called LADDER (for language access to distributed data with error re...
Article
The absence of training data is a real problem for corpus-based approaches to sense disambiguation, one that is unlikely to be solved soon. Selectional preference is traditionally connected with sense ambiguity; this paper explores how a statistical model of selectionai preference, requiring neither manual annotation of selection restrictions nor supervised training, can be used in sense disambiguation.
Article
In BBN's natural language understanding and generation system (Janus), we have used a hybrid approach to representation, employing an intensional logic for the representation of the semantics of utterances and a taxonomic language with formal semantics for specification of descriptive constants and axioms relating them. Remarkably, 99.9% of 7,000 vocabulary items in our natural language applications could be adequately axiomatizad in the taxonomic language.
Article
We propose a semantic construction method for Feature-Based Tree Adjoining Grammar which is based on the derived tree, compare it with related proposals and briefly discuss some implementation possibilities.
Article
The need for Natural Language Interfaces (NLIs) to databases has become increasingly acute as more nontechnical people access information through their web browsers, PDAs and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. We introduce the Precise NLI [2], which reduces the semantic interpretation challenge in NLIs to a graph matching problem. Precise uses the max-flow algorithm to e#ciently solve this problem. Each max-flow solution corresponds to a possible semantic interpretation of the sentence. precise collects max-flow solutions, discards the solutions that do not obey syntactic constraints and retains the rest as the basis for generating SQL queries corresponding to the question q. The syntactic information is extracted from the parse tree corresponding to the given question which is computed by a statistical parser [1]. For a broad, well-defined class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query.
Article
For most natural language processing tasks, a parser that maps sentences into a semantic representation is significantly more useful than a grammar or automata that simply recognizes syntactically wellformed strings. This paper reviews our work on using inductive logic programming methods to learn deterministic shift-reduce parsers that translate natural language into a semantic representation. We focus on the task of mapping database queries directly into executable logical form. An overview of the system is presented followed by recent experimental results on corpora of Spanish geography queries and English jobsearch queries. Introduction Language learning is frequently interpreted as acquiring a recognizer, a procedure that returns "yes" or "no" to the question: "Is this string a syntactically wellformed sentence in the language?". However, a blackbox recognizer is of limited use to a natural language processing system. A simple recognizer may be useful to a limited grammar checke...
Logical Foundations of Object-Oriented and Frame- Based Languages On the Proper Treatment of Quantification in Ordinary English
  • M Kifer
  • G Lausen
  • J Wu
  • R Montague
M. Kifer, G. Lausen, and J. Wu. Logical Foundations of Object-Oriented and Frame- Based Languages. Journal of the Association for Computing Machinery, May 1995. [Mon74] R. Montague. On the Proper Treatment of Quantification in Ordinary English. In R. H. Thomason, editor, Formal Philosophy: Selected Papers of Richard Montague, pages 247–270. 1974. [Mus01]
Robust parsing techniques for semantic analysis of natural language queries Translating Wh-Questions into F-Logic Queries Seven Steps to RENDEZVOUS with the Casual User Ontology-based semantic construction, underspecification and disambiguation
  • Afzal Ballim
  • Vincenzo Pallotta
Afzal Ballim and Vincenzo Pallotta. Robust parsing techniques for semantic analysis of natural language queries. In Proceedings of VEXTAL99 conference, 1999. [Cim03] P. Cimiano. Translating Wh-Questions into F-Logic Queries. In R. Bernardi and M. Moortgat, editors, Proceedings of the Workshop on Questions and Answers: Theoretical and Applied Perspectives, pages 130–137, 2003. [CJ89] A. Copestake and K. Sparck Jones. Natural Language Interfaces to Databases. Knowledge Engineering Review, 1989. Special Issue in the Applications of Natural Language Processing Techniques. [Cod74] E.F. Codd. Seven Steps to RENDEZVOUS with the Casual User. In J. Kimbie and K. Koffeman, editors, Data Base Management. North-Holland publishers, 1974. [CR03] P. Cimiano and U. Reyle. Ontology-based semantic construction, underspecification and disambiguation. In Proceedings of the Prospects and Advances in the Syntax- Semantic Interface Workshop, 2003.