Conference Paper

Automatically Constructing Concept Hierarchies of Health-Related Human Goals

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To realize the vision of intelligent agents on the web, agents need to be capable of understanding people’s behavior. Such an understanding would enable them to better predict and support human activities on the web. If agents had access to knowledge about human goals, they could, for instance, recognize people’s goals from their actions or reason about people’s goals. In this work, we study to what extent it is feasible to automatically construct concept hierarchies of domain-specific human goals. This process consists of the following two steps: (1) extracting human goal instances from a search query log and (2) inferring hierarchical structures by applying clustering techniques. To compare resulting concept hierarchies, we manually construct a golden standard and calculate taxonomic overlaps. In our experiments, we achieve taxonomic overlaps of up to ~51% for the health domain and up to ~60% for individual health subdomains. In an illustration scenario, we provide a prototypical implementation to automatically complement goal concept hierarchies by means-ends relations, i.e. relating goals to actions which potentially contribute to their accomplishment. Our findings are particularly relevant for knowledge engineers interested in (i) acquiring knowledge about human goals as well as (ii) automating the process of constructing goal concept hierarchies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most of the existing work focusing on categorical data belongs to the field of knowledge engineering. There, various techniques exist to create concept hierarchies whose aim is usually to facilitate the understanding of documents and processes, or to enhance semantic interoperability [11,22]. However, their direct applicability in PPDP is limited as they do not consider the particular characteristics needed by a VGH in the context of data anonymization. ...
Conference Paper
Full-text available
Concept hierarchies are widely used in multiple fields to carry out data analysis. In data privacy, they are known as Value Generalization Hierarchies (VGHs), and are used by generalization algorithms to dictate the data anonymization. Thus, their proper specification is critical to obtain anonymized data of good quality. The creation and evaluation of VGHs require expert knowledge and a significant amount of manual effort, making these tasks highly error-prone and time-consuming. In this paper we present AIKA, a knowledge-based framework to automatically construct and evaluate VGHs for the anonymization of categorical data. AIKA integrates ontologies to objectively create and evaluate VGHs. It also implements a multi-dimensional reward function to tailor the VGH evaluation to different use cases. Our experiments show that AIKA improved the creation of VGHs by generating VGHs of good quality in less time than when manually done. Results also showed how the reward function properly captures the desired VGH properties.
Conference Paper
Problem solving knowledge is omnipresent and scattered on the Web. While extracting and gathering such knowledge has been a focus of attention, it is equally important to devise a way to organize such knowledge for both human and machine consumption with respect to task goals. As a way to provide an extensive knowledge structure for human task goals, with which human problem solving knowledge extracted from Web resources can be organized, we devised a method for automatically grouping and organizing the goal statements in a Web 2.0 site that contains over two millions how-to instruction articles covering almost all task domains. In the proposed method, task goals having semantically and task-categorically similar action types and object types are grouped together by analyzing predicate-argument association patterns across all the goal statements through bipartite EM-like modeling. The result obtained with the unsupervised machine learning algorithm was evaluated by means of a human-annotated data set in a sample domain.
Article
Full-text available
In our research on Commonsense reasoning, we have found that an especially important kind of knowledge is knowl-edge about human goals. Especially when applying Com-monsense reasoning to interface agents, we need to recog-nize goals from user actions (plan recognition), and generate sequences of actions that implement goals (planning). We also often need to answer more general questions about the situations in which goals occur, such as when and where a particular goal might be likely, or how long it is likely to take to achieve. In past work on Commonsense knowledge acquisition, users have been directly asked for such information. Recently, however, another approach has emerged—to entice users into playing games where supplying the knowledge is the means to scoring well in the game, thus motivating the players. This approach has been pioneered by Luis von Ahn and his col-leagues, who refer to it as Human Computation. Common Consensus is a fun, self-sustaining web-based game, that both collects and validates Commonsense knowledge about everyday goals. It is based on the structure of the TV game show Family Feud 1 . A small user study showed that users find the game fun, knowledge quality is very good, and the rate of knowledge collection is rapid.
Conference Paper
Full-text available
We have been developing a task-based service navigation system that offers to the user services relevant to the task the user wants to perform. The system allows the user to concretize his/her request in the task-model developed by human-experts. In this study, to reduce the cost of collect-ing a wide variety of activities, we investigate the automatic modeling of users' real world activities from the web. To extract the widest possible variety of activities with high precision and recall, we investigate the appropriate num-ber of contents and resources to extract. Our results show that we do not need to examine the entire web, which is too time consuming; a limited number of search results (e.g. 900 from among 21,000,000 search results) from blog con-tents are needed. In addition, to estimate the hierarchical relationships present in the activity model with the lowest possible error rate, we propose a method that divides the representation of activities into a noun part and a verb part, and calculates the mutual information between them. The result shows almost 80% of the hierarchical relationships can be captured by the proposed method.
Conference Paper
Full-text available
Ontologies now play an important role for many knowledge-intensive applications for which they provide a source of precisely defined terms. However, with their wide-spread usage there come problems concerning their proliferation. Ontology engineers or users frequently have a core ontology that they use, e.g., for browsing or querying data, but they need to extend it with, adapt it to, or compare it with the large set of other ontologies. For the task of detecting and retrieving relevant ontologies, one needs means for measuring the similarity between ontologies. We present a set of ontology similarity measures and a multiple-phase empirical evaluation.
Conference Paper
Full-text available
Service robots will have to accomplish more and more complex, open-ended tasks and regularly acquire new skills. In this work, we propose a new approach to the problem of generating plans for such household robots. Instead composing them from atomic actions - the common approach in robot planning - we propose to transform task descriptions on web sites like ehow.com into executable robot plans. We present methods for automatically converting the instructions from natural language into a formal, logic-based representation, for resolving the word senses using the WordNet database and the Cyc ontology, and for exporting the generated plans into the mobile robot's plan language RPL. We discuss the problem of inferring information that is missing in these descriptions and the problem of grounding the abstract task descriptions in the perception and action system, and we propose techniques for solving them. The whole system works autonomously without human interaction. It has successfully been tested with a set of about 150 natural language directives, of which up to 80% could be correctly transformed.
Conference Paper
Full-text available
People interact with interfaces to accomplish goals, and knowledge about human goals can be useful for building intelligent user interfaces. We suggest that modeling high, human-level goals like "repair my credit score", is especially useful for coordinating workflows between interfaces, automated planning, and building introspective applications. We analyzed data from 43Things.com, a website where users share and discuss goals and plans in natural language, and constructed a goal network that relates what goals people have with how people solve them. We then label goals with specific details, such as where the goal typically is met and how long it takes to achieve, facilitating plan and goal recognition. Lastly, we demonstrate a simple application of goal networks, deploying it in a mobile, location-aware to-do list application, ToDoGo, which uses goal networks to help users plan where and when to accomplish their desired goals.
Conference Paper
Full-text available
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Conference Paper
Full-text available
A significant portion of web search queries are name entity queries. The major search engines have been exploring various ways to provide better user experiences for name entity queries, such as showing "search tasks" (Bing search) and showing direct answers (Yahoo!, Kosmix). In order to provide the search tasks or direct answers that can satisfy most popular user intents, we need to capture these intents, together with relationships between them. In this paper we propose an approach for building a hierarchical taxonomy of the generic search intents for a class of name entities (e.g., musicians or cities). The proposed approach can find phrases representing generic intents from user queries, and organize these phrases into a tree, so that phrases indicating equivalent or similar meanings are on the same node, and the parent-child relationships of tree nodes represent the relationships between search intents and their sub-intents. Three different methods are proposed for tree building, which are based on directed maximum spanning tree, hierarchical agglomerative clustering, and pachinko allocation model. Our approaches are purely based on search logs, and do not utilize any existing taxonomies such as Wikipedia. With the evaluation by human judges (via Mechanical Turk), it is shown that our approaches can build trees of phrases that capture the relationships between important search intents.
Article
Full-text available
Classic IR (information retrieval) is inherently predicated on users searching for information, the so-called "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs.
Article
Full-text available
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Conference Paper
Full-text available
A novice search engine user may find searching the web for information difficult and frustrating because she may naturally express search goals rather than the topic keywords search engines need. In this paper, we present GOOSE (goal-oriented search engine), an adaptive search engine interface that uses natural language processing to parse a user's search goal, and uses "common sense" reasoning to translate this goal into an effective query. For a source of common sense knowledge, we use Open Mind, a knowledge base of approximately 400,000 simple facts such as "If a pet is sick, take it to the veterinarian " garnered from a Web-wide network of contributors. While we cannot be assured of the robustness of the common sense inference, in a substantial number of cases, GOOSE is more likely to satisfy the user's original search goals than simple keywords or conventional query expansion.
Article
Full-text available
Since a decade ago, a person-century of effort has gone into building CYC, a universal schema of roughly 105 general concepts spanning human reality. Most of the time has been spent codifying knowledge about the concept; approximately 106 common sense axioms have been handcrafted for and entered into CYC's knowledge base, millions more have been inferred and cached by CYC. This paper studies the fundamental assumptions of doing such a large-scale project, reviews the technical lessons learned by the developers, and surveys the range of applications that are enabled by the technology.
Article
To build a machine that truly learns by itself will require a commonsense knowledge representing the kinds of things even a small child already knows.
Article
The Open Mind Common Sense project has been collecting common-sense knowledge from volun-teers on the Internet since 2000. This knowledge is represented in a machine-interpretable seman-tic network called ConceptNet. We present ConceptNet 3, which improves the acquisition of new knowledge in ConceptNet and facilitates turning edges of the network back into natural language. We show how its modular de-sign helps it adapt to different data sets and languages. Finally, we evaluate the content of ConceptNet 3, showing that the information it contains is comparable with WordNet and the Brandeis Semantic Ontology.
Conference Paper
We survey many of the measures used to describe and evaluate the efficiency and effectiveness of large-scale search services. These measures, herein visualized versus verbalized, reveal a domain rich in complexity and scale. We cover six principle facets of search: the query space, users' query sessions, user behavior, operational requirements, the content space, and user demographics. While this paper focuses on measures, the measurements themselves raise questions and suggest avenues of further investigation.
Conference Paper
Since mobile Internet services are rapidly proliferating, finding the most appropriate service or services from among the many offered requires profound knowledge about the services which is becoming virtually impossible for ordinary mobile users. We propose a system that assists non-expert mobile users in finding the appropriate services that solve the real-world problems encountered by the user. Key components are a task knowledge base of tasks that a mobile user performs in daily life and a service knowledge base of services that can be used to accomplish user tasks. We present the architecture of the proposed system including a knowledge modeling framework, and a detailed description of a prototype system. We also show preliminary user test results; they indicate that the system allows a user to find appropriate services quicker with fewer loads than conventional commercial methods.
Conference Paper
Traditionally, Information Extraction (IE) has fo- cused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new ex- traction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces T EXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and explo- ration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to per- form extraction for a handful of pre-specified re- lations, T EXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more rela- tions, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract as- sertions.
Article
The article focuses on cognitive modeling for games and animation The article deals with the subject of developing commonsense-based user interfaces, a machine that truly learns by itself, which demands a commonsense knowledge of representing activities that are commonly known. In order that computers understand humans, it is essential to equip them with adequate knowledge. Commonsense thoughts refer to things that most people can do, without even knowing they are doing. Machines do not posses these thoughts, as they are unable to comprehend meaning, which is an intuitive thing. Acquiring commonsense thought invokes the aspect of knowledge about how to think. The most usual way to represent knowledge is to select a representation. Commonsense knowledge requires multiple representations, and there is no best way to represent knowledge. A proficient architecture theory based on multiple representations and multi-modal reasoning can assist in designing systems that allow studying and understanding commonsense knowledge.
Article
A better understanding of what motivates humans to perform certain actions is relevant for a range of research challenges including generating action sequences that implement goals (planning). A first step in this direction is the task of acquiring knowledge about human goals. In this work, we investigate whether Search Query Logs are a viable source for extracting expressions of human goals. For this purpose, we devise an algorithm that automatically identifies queries containing explicit goals such as find home to rent in Florida. Evaluation results of our algorithm achieve useful precision/recall values. We apply the classification algorithm to two large Search Query Logs, recorded by AOL and Microsoft Research in 2006, and obtain a set of ∼110,000 queries containing explicit goals. To study the nature of human goals in Search Query Logs, we conduct qualitative, quantitative and comparative analyses. Our findings suggest that Search Query Logs (i) represent a viable source for extracting human goals, (ii) contain a great variety of human goals and (iii) contain human goals that can be employed to complement existing commonsense knowledge bases. Finally, we illustrate the potential of goal knowledge for addressing following application scenario: to refine and extend commonsense knowledge with human goals from Search Query Logs. This work is relevant for (i) knowledge engineers interested in acquiring human goals from textual corpora and constructing knowledge bases of human goals (ii) researchers interested in studying characteristics of human goals in Search Query Logs.
Article
Knowing a user's plans and goals can significantly improve the effectiveness of an interactive system. However, recognizing such goals and the user's intended plan for achieving them is not an easy task. Although much research has dealt with representing the knowledge necessary for plan inference and developing strategies that hypothesize the user's evolving plans, a number of serious problems still impede the use of plan recognition in large-scale, real-world applications. This paper describes the various approaches that have been taken to plan inference, along with techniques for dealing with ambiguity, robustness, and representation of requisite domain knowledge, and discusses areas for further research.
Book
Standard formalisms for knowledge representation such as RDFS or OWL have been recently developed by the semantic web community and are now in place. However, the crucial question still remains: how will we acquire all the knowledge available in people's heads to feed our machines? Natural language is THE means of communication for humans, and consequently texts are massively available on the Web. Terabytes and terabytes of texts containing opinions, ideas, facts and information of all sorts are waiting to be mined for interesting patterns and relationships, or used to annotate documents to facilitate their retrieval. A semantic web which ignores the massive amount of information encoded in text, might actually be a semantic, but not a very useful, web. Knowledge acquisition, and in particular ontology learning from text, actually has to be regarded as a crucial step within the vision of a semantic web. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications presents approaches for ontology learning from text and will be relevant for researchers working on text mining, natural language processing, information retrieval, semantic web and ontologies. Containing introductory material and a quantity of related work on the one hand, but also detailed descriptions of algorithms, evaluation procedures etc. on the other, this book is suitable for novices, and experts in the field, as well as lecturers. Datasets, algorithms and course material can be downloaded at http://www.cimiano.de/olp. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications is designed for practitioners in industry, as well researchers and graduate-level students in computer science. © 2006 Springer Science+Business Media, LLC. All rights reserved.
Article
We describe ConceptNet, a freely available semantic network presently consisting of over 250,000 elements of commonsense knowledge. Inspired by Cyc, ConceptNet includes a wide range of commonsense concepts and relations, and inspired by WordNet, it is structured as a simple, easy-to-use semantic network. ConceptNet supports many of the same applications as WordNet, such as query expansion and determining semantic similarity, but it also allows simple temporal, spatial, affective, and several other types of inferences. This paper is structured as follows. We first discuss how ConceptNet was built and the nature and structure of its contents. We then present the ConceptNet toolkit, a reasoning system designed to support textual reasoning tasks by providing facilities for spreading activation, analogy, and path-finding between concepts. Third, we provide some quantitative and qualitative analyses of ConceptNet. We conclude by describing some ways we are currently exploring to improve ConceptNet.
Learning concept hierarchies from text corpora using formal concept analysis
  • P Cimiano
  • A Hotho
  • S Staab
  • P. Cimiano