Article

Acquiring knowledge about human goals from Search Query Logs

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A better understanding of what motivates humans to perform certain actions is relevant for a range of research challenges including generating action sequences that implement goals (planning). A first step in this direction is the task of acquiring knowledge about human goals. In this work, we investigate whether Search Query Logs are a viable source for extracting expressions of human goals. For this purpose, we devise an algorithm that automatically identifies queries containing explicit goals such as find home to rent in Florida. Evaluation results of our algorithm achieve useful precision/recall values. We apply the classification algorithm to two large Search Query Logs, recorded by AOL and Microsoft Research in 2006, and obtain a set of ∼110,000 queries containing explicit goals. To study the nature of human goals in Search Query Logs, we conduct qualitative, quantitative and comparative analyses. Our findings suggest that Search Query Logs (i) represent a viable source for extracting human goals, (ii) contain a great variety of human goals and (iii) contain human goals that can be employed to complement existing commonsense knowledge bases. Finally, we illustrate the potential of goal knowledge for addressing following application scenario: to refine and extend commonsense knowledge with human goals from Search Query Logs. This work is relevant for (i) knowledge engineers interested in acquiring human goals from textual corpora and constructing knowledge bases of human goals (ii) researchers interested in studying characteristics of human goals in Search Query Logs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Transform the data in required form to apply the proposed approaches of data mining in supervised or unsupervised format. Now the dataset is ready to prepare the intention map, pseudo map or to be divided in intention categories [17,18]. ...
... [17] Showed that search query logs represented a viable, yet largely untapped, source for acquiring knowledge about human goals. Four types of wish detectors proposed in [18] to take insight of world"s wants and desires. They analyzed 80,000 English wish sentences of New Year. ...
... Research paper Quantity Tweets [9], [14], [22], [25], [28], [29], [23] 7 Facebook comments [10] 1 Product reviews [11], [15], [18], [19], [30] 5 Questionnaire Survey [16], [27], [24] 3 Web Search Log [21], [31], [32], [17], [19], [20] 4 Political reviews [18] 1 Chat logs [12] 1 Figure 5 reflects the classification of articles by quantity of used dataset as follows: ...
Article
Full-text available
Text mining is a frame work to retrieve valuable knowledge from unstructured form of textual documents. Extracted knowledge presents in the user understandable form of facts and knowledge. Text mining further classified into information retrieval, NLP, statistics, web mining and Intention mining. An intention is a human mental state represents a current or future action. Intention mining is an up-and-coming research area of text mining describes what actually customer wants and what actions he can take in future. It explicitly finds what people want to happen not just what they like or dislike. This paper aims to take a deep insight on intention mining by categorizing user intentions like sale / purchase, wish, emotional, search and real-time intentions. It will help the upcoming researchers to take a view on research have done on intention mining and compare the adopted approaches to find the optimal method. User may express the intention implicitly or explicitly. Explicit intention is the direct explosion of user's wishes which can easily detect from text documents. Implicit intentions communicated indirectly by user in perspective of other features of related object. Multiple classification, clustering, keyword based and machine learning techniques are used on different datasets to extract the user intentions. It is analyzed that now a days the most frequent used dataset for intention mining is micro blog tweets and frequent used techniques are support vector machine and Naïve Bayes with maximum accuracy rate.
... The mainstream research on intention mining lies in the domain of information retrieval (Jathava et al., 2011), (Baeza Yates et al., 2006) (González-Caro & Baeza-Yates, 2011), (Hashemi et al., 2008), (Sadikov et al., 2010), (Strohmaier & Kröll, 2012), (Zheng et al., 2002). Other applications have also been published, e.g. ...
... A quick search in the literature reveals that (a) many intention mining techniques have already been proposed, and (b) this research area is extremely dynamic with new contributions continuously published. Rather than aiming at a systematic literature review, this section first introduces the area by describing three particular approaches that were selected because of their impact or originality: (Strohmaier & Kröll, 2012), (Baeza et al., 2006) and (Outmazgin & Soffer, 2013). ...
... Strohmaier & Kröll, 2012 Strohmaier and Kröll's method is one of the many approaches to acquire knowledge about human intentions (the word used here is "goal") by investigating web engine query logs. The idea is that better understanding the rationale behind the actions of web engine users can be useful to deal with a range of issues such as recognizing users' intentions, reasoning about them, or generating plans to help them achieve their intentions. ...
... In the information retrieval context, the key idea is better understanding the rationale behind the users' activities through Web engine. This can be useful to deal with a range of issues such as, recognizing users' intentions, reasoning about them, or generating plans to help users to achieve their intentions [Strohmaier 2012, Hashemi 2008, Baeza-Yates 2006, Park 2010, Jethava 2011, González-Caro 2011. Most of the intention mining techniques focus on mining individual intentions out of Web engine queries. ...
... The salient feature of Strohmaier and Kröll's approach is that, it differentiates between implicit and explicit intentions [Strohmaier 2012]. Implicit intentions underlie what is expressed by people or can be observed from them. ...
... • Discovery: intention mining techniques mainly deal with the intention discovery problem [Strohmaier 2012, Hashemi 2008, Baeza-Yates 2006, Park 2010, Jethava 2011, González-Caro 2011. Discovery of intentions allows understanding how humans' think, how humans' brains work, identifying the users' intents behind their activities. ...
Article
So far, process mining techniques suggested to model processes in terms of tasks that occur during the enactment of a process. However, research on process modeling has illustrated that many issues, such as lack of flexibility or adaptation, are solved more effectively when intentions are explicitly specified. This thesis presents a novel approach of process mining, called Map Miner Method (MMM). This method is designed to automate the construction of intentional process models from traces. MMM uses Hidden Markov Models to model the relationship between users' activities and the strategies (i.e., the different ways to fulfill the intentions). The method also includes two specific algorithms developed to infer users' intentions and construct intentional process model (Map), respectively. MMM can construct Map process models with different levels of granularity (pseudo-Map and Map process models) with respect to the Map metamodel formalism. The entire proposed method was applied and validated on practical traces in a large-scale experiment, on event logs of developers of Eclipse UDC (Usage Data Collector). The resulting Map process models provide a precious understanding of the processes followed by the developers, and also provide feedback on the effectiveness and demonstrate scalability of MMM in terms of traces. Map Miner tool has been developed to enable practicing the proposed approach. This permits users to obtain the pseudo-Map and Map process model out of traces.
... The mainstream research on intention mining lies in the domain of information retrieval (Jathava et al., 2011), (Baeza Yates et al., 2006) (González-Caro & Baeza-Yates, 2011), (Hashemi et al., 2008), (Sadikov et al., 2010), (Strohmaier & Kröll, 2012), (Zheng et al., 2002). Other applications have also been published, e.g. ...
... A quick search in the literature reveals that (a) many intention mining techniques have already been proposed, and (b) this research area is extremely dynamic with new contributions continuously published. Rather than aiming at a systematic literature review, this section first introduces the area by describing three particular approaches that were selected because of their impact or originality: (Strohmaier & Kröll, 2012), (Baeza et al., 2006) and (Outmazgin & Soffer, 2013). ...
... Strohmaier & Kröll, 2012 Strohmaier and Kröll's method is one of the many approaches to acquire knowledge about human intentions (the word used here is "goal") by investigating web engine query logs. The idea is that better understanding the rationale behind the actions of web engine users can be useful to deal with a range of issues such as recognizing users' intentions, reasoning about them, or generating plans to help them achieve their intentions. ...
Article
Full-text available
Understanding people's goals is a challenging issue that is met in many different areas such as security, sales, information retrieval, etc. Intention Mining aims at uncovering intentions from observations of actual activities. While most Intention Mining techniques proposed so far focus on mining individual intentions to analyze web engine queries, this paper proposes a generic technique to mine intentions from activity traces. The proposed technique relies on supervised learning and generates intentional models specified with the Map formalism. The originality of the contribution lies in the demonstration that it is actually possible to reverse engineer the underlying intentional plans built by people when in action, and specify them in models e.g. with intentions at different levels, dependencies, links with other concepts, etc. After an introduction on intention mining, the paper presents the Supervised Map Miner Method and reports two controlled experiments that were undertaken to evaluate precision, recall and F-Score. The results are promising since the authors were able to find the intentions underlying the activities as well as the corresponding map process model with satisfying accuracy, efficiency and performance.
... The identification of online users' commercial intents has been quite an important research problem in the past. Most researches focus on capturing commercial intention from search queries (Dai et al., 2006;Strohmaier and Kröll, 2012), clickthrough behaviors (Ashkan and Clarke, 2009), users' mouse movements or scrolling behaviors (Guo and Agichtein, 2010) and search logs (Strohmaier and Kröll, 2012). The most related to our work is the work in (Hollerit et al., 2013), which attempts to detect commercial intent on twitter. ...
... The identification of online users' commercial intents has been quite an important research problem in the past. Most researches focus on capturing commercial intention from search queries (Dai et al., 2006;Strohmaier and Kröll, 2012), clickthrough behaviors (Ashkan and Clarke, 2009), users' mouse movements or scrolling behaviors (Guo and Agichtein, 2010) and search logs (Strohmaier and Kröll, 2012). The most related to our work is the work in (Hollerit et al., 2013), which attempts to detect commercial intent on twitter. ...
... Guo et al. [5] attempted to differentiate between search intents by using interaction features such as mouse movements or scrolling behavior. In previous work Strohmaier et al. [13] showed that search query logs represented a viable, yet largely untapped, source for acquiring knowledge about human goals. ...
... By ''trivial to identify'' Kirsh means the ability to make a decision in constant time. This definition was adapted from previous work ( [13]) to serve the specific needs of our research. ...
Conference Paper
Full-text available
Since more and more people use the micro-blogging platform Twitter to convey their needs and desires, it has become a particularly interesting medium for the task of identifying commercial activities. Potential buyers and sellers can be contacted directly thereby opening up novel perspectives and economic possibilities. By detecting commercial intent in tweets, this work is considered a first step to bring together buyers and sellers. In this work, we present an automatic method for detecting commercial intent in tweets where we achieve reasonable precision 57% and recall 77% scores. In addition, we provide insights into the nature and characteristics of tweets exhibiting commercial intent thereby contributing to our understanding of how people express commercial activities on Twitter.
... feature request, opinion asking, problem discovery, solution proposal, information giving, etc.). Baeza-Yates et al. (2006), Strohmaier and Kröll (2012) have proposed the development of specific approaches that work for the automatic identification of the user's interest. Baum and Eagon (1967) developed a new strategy to model and mine the captured intentions of camcorder users using digital video recorders and home video data. ...
Article
Full-text available
Process Mining focused only on the activity-oriented process and neglected the users’ behaviors behind the activities, which led to overlooking the reality that they proposed to create. Recognizing the users’ underlying intentions can improve the guidance and offer better recommendations. As a result, an area of study known as Intention Mining has been merged. It aims at discovering the users’ behaviors using an event log. The intention is frequently used in different computer science research fields, including requirements definition, business process, and method engineering for context adaption. This paper reviews Intention-Oriented Process Mining based on event logs in the information systems engineering field. The objective is to identify the different models, methodologies, and algorithms proposed, the tools used, and the different challenges in these fields based on the four steps of review for the selection process, which start with the identification, followed by the screening, the eligibility, and the inclusion. For the first time, we are focused on Process Mining and intention mining based on log files and their relationship to get an idea about the area of intention mining. This paper reviews academic papers that are published in peer-reviewed venues from 2013 to 2022. These papers were examined through six main investigate questions and a systematic review. Also, we detailed the existing approaches in the Intention Mining area and present our comparative study. The results of the existing approaches indicate that Intention Mining shows a meaningful trace of research and creates existing opportunities for real technical applications.
... Dai et al. (2006) first proposed to identify search queries that contain online commercial intention. Since queries do not carry much information, much research extends the queries by including information extracted from search logs (Strohmaier and Kröll 2012), click through behavior data (Ashkan and Clarke 2009b), and users' mouse movements behaviors data (Guo and Agichtein 2010). ...
Article
Social media platforms are often used by people to express their needs and desires. Such data offer great opportunities to identify users’ consumption intention from user-generated contents, so that better tailored products or services can be recommended. However, there have been few efforts on mining commercial intents from social media contents. In this paper, we investigate the use of social media data to identify consumption intentions for individuals. We develop a Consumption Intention Mining Model (CIMM) based on convolutional neural network (CNN), for identifying whether the user has a consumption intention. The task is domain-dependent, and learning CNN requires a large number of annotated instances, which can be available only in some domains. Hence, we investigate the possibility of transferring the CNN mid-level sentence representation learned from one domain to another by adding an adaptation layer. To demonstrate the effectiveness of CIMM, we conduct experiments on two domains. Our results show that CIMM offers a powerful paradigm for effectively identifying users’ consumption intention based on their social media data. Moreover, our results also confirm that the CNN learned in one domain can be effectively transferred to another domain. This suggests that a great potential for our model to significantly increase effectiveness of product recommendations and targeted advertising.
... Online commercial intention identification This task is to identify online commercial intention from queries, documents or tweets. Most studies focus on capturing commercial intent by analyzing search queries (Dai et al. 2006;Strohmaier and Kröll 2012) or click-through (Ashkan and Clarke 2009). Chen et al. (2013) aims at identifying intents expressed in posts of forums. ...
Article
In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent tweets into six categories, namely Food & Drink, Travel, Career & Education, Goods & Services, Event and Activities and Trifle. We propose a semisupervised learning approach to categorizing intent tweets into the six categories.We construct a test collection by using a bootstrap method. Our experimental results show that our approach is effective in inferring intent categories for tweets.
... The smart systems are able to gain people's knowledge about new ways of executing tasks or simply people's expectations for their behaviour (Strohmaier, 2012). Knowledge acquired directly from people, even if they are affording to attain much more knowledge by using machine learning, is often an essential skill for smart systems. ...
Chapter
Knowledge representation is of immense importance in the field of artificial intelligence and natural language processing. The representation of knowledge goes hand in hand with automated reasoning as one of the key goals of representing knowledge effectively is being able to reason about it. Researchers of knowledge representation and reasoning have built techniques and methods that are the main source of development in computer science and have made tremendous progress in a wide variety of real-life applications, ranging from natural language processing to robotics and software engineering. Further research is required in order to allow a more active role in guiding the reasoning process through the knowledge representation framework. This article has discussed knowledge representation and reasoning and analyzed the major challenges and new opportunities where novel knowledge representation and reasoning research have had a major impact.
... Weak Supervision Data Collection. Inspired by past work [57] that demonstrated search engine queries can contain explicit user objectives (e.g., get rid of belly fat), we first reify every goal in the goal taxonomy with a set of seed queries. For example, we manually generate queries such as "how to be charismatic" and "how to meet new friends" to elicit the high-ordered goal of being likeable, making friends, drawing others near. ...
Preprint
Full-text available
Motives or goals are recognized in psychology literature as the most fundamental drive that explains and predicts why people do what they do, including when they browse the web. Although providing enormous value, these higher-ordered goals are often unobserved, and little is known about how to leverage such goals to assist people's browsing activities. This paper proposes to take a new approach to address this problem, which is fulfilled through a novel neural framework, Goal-directed Web Browsing (GoWeB). We adopt a psychologically-sound taxonomy of higher-ordered goals and learn to build their representations in a structure-preserving manner. Then we incorporate the resulting representations for enhancing the experiences of common activities people perform on the web. Experiments on large-scale data from Microsoft Edge web browser show that GoWeB significantly outperforms competitive baselines for in-session web page recommendation, re-visitation classification, and goal-based web page grouping. A follow-up analysis further characterizes how the variety of human motives can affect the difference observed in human behavioral patterns.
... This is an important task widely applicable in goal-oriented dialog systems, conversation analysis and online advertisement, and supervised learning methods [5][6][7][8][9][10][11][12] are typically adopted to learn classifiers from labeled intent datasets. According to the different application scenarios, intent recognition can be categorized into (1) query intent classification (e.g., a search engine [13][14][15]); (2) intent identification from social media (e.g. Twitter messages) [16,17]; (3) user intent understanding in a dialog system [6,18,19]. ...
Preprint
Full-text available
Intent understanding plays an important role in dialog systems, and is typically formulated as a supervised classification problem. However, it is challenging and time-consuming to design the intent labels manually to support a new domain. This paper proposes an unsupervised two-stage approach to discover intents and generate meaningful intent labels automatically from a collection of unlabeled utterances. In the first stage, we aim to generate a set of semantically coherent clusters where the utterances within each cluster convey the same intent. We obtain the utterance representation from various pre-trained sentence embeddings and present a metric of balanced score to determine the optimal number of clusters in K-means clustering. In the second stage, the objective is to generate an intent label automatically for each cluster. We extract the ACTION-OBJECT pair from each utterance using a dependency parser and take the most frequent pair within each cluster, e.g., book-restaurant, as the generated cluster label. We empirically show that the proposed unsupervised approach can generate meaningful intent labels automatically and achieves high precision and recall in utterance clustering and intent discovery.
... According to the Review of Information Extraction [21], the traditional approach to the extraction of information is based on constructing lexical databases, such as WordNet [9]. The second idea is based on the concept of constructing a comprehensive reflection of the factual content of the text [3,1]. ...
Article
This article presents the approach to the acquisition of expert knowledge and includes the following stages: (1) observation-verbal reporting, (2) obtainment of a record of the observation, (3) protocol analysis, (4) formal record of expert knowledge. The first part of the article discusses each element of our approach based on a real-time case study. Next, the structure and exemplar functionalities of the proposed information system for the acquisition of expert knowledge are presented. Our information system supporting the acquisition of expert knowledge, as presented here, is based on the example of service department workers but may be used in other areas where the procedures for performing any given activity are definable.
... Other uses for query logs include personalization of search results, search terms autocompletion, and correction of spelling errors in user queries, among others. For example, query logs have been used for acquiring knowledge about search user higher-level goals (Strohmaier and All, 2012); sampling for improved query probing in noncooperative distributed information retrieval environments (Shokouhi et al., 2007); capturing the hidden semantics of search queries (Bing et al., 2018); and discovering children's web search behavior (Duarte Torres et al., 2010). ...
Chapter
This chapter presents a tutorial introduction to modern information retrieval concepts, models, and systems. It begins with a reference architecture for the current Information Retrieval (IR) systems, which provides a backdrop for rest of the chapter. Text preprocessing is discussed using a mini Gutenberg corpus. Next, a categorization of IR models is presented followed by Boolean IR model description. Positional index is introduced, and execution of phrase and proximity queries is discussed. Various term weighting schemes are discussed next followed by descriptions of three IR models—Vector Space, Probabilistic, and Language models. Approaches to evaluating IR systems are presented. Relevance feedback techniques as a means to improving retrieval effectiveness are described. Various IR libraries, frameworks, and test collections are indicated. The chapter concludes by outlining facets of IR research and indicating additional reading.
... As future work, we plan to evaluate the utility of the results provided by our method with other user-centered data uses of query logs, such as behavioral analysis [6] or goal extraction [42]. ...
Article
Full-text available
Query logs are of great interest for data analysis. They allow characterizing user profiles, user behaviors and search habits. However, since query logs usually contain personal information, data controllers should implement appropriate data protection mechanisms before releasing them for secondary use. In the past, the anonymization of query logs was tackled from the perspective of statistical disclosure control and by relying on privacy models such as k-anonymity, which do not scale well with the high dimensionality and dynamicity of query logs. To offer better privacy protection, some authors have recently embraced the robust privacy guarantees of ɛ-differential privacy. However, this comes at the cost of limiting the number and types of analyses that can be made on the protected queries. To tackle this issue, in this paper we propose a privacy protection method for query logs that joins the flexibility and convenience of privacy-preserving data releases with the strong privacy guarantees of ɛ-differential privacy. Moreover, to retain the analytical utility of the protected query, we have put special care in capturing, managing and preserving the semantics of the queries during the protection process. The empirical experiments we report show that our method produces differentially private query logs that are more useful for analysis than related works.
... Can we teach computers to reliably and accurately understand human intentions? is of course one of the great challenges of science, and language related technology is one of the great opportunities of information technology due to the need to automatically analyze large amounts of information stored within arbitrary text sources on the internet [1]. Yet, the acquisition of knowledge about common human goals represents a major challenge [3], we attempt to make use of 43things [6] Online Social Network that contain a great wealth of information about human"s goals and how to achieve them. ...
Technical Report
Full-text available
Intention detection is one of the main components of human language understanding, which allows user goals to be identified. A challenging sub-task of intention detection is building a human intention " s knowledge base. We proposed a technique that build human intentions knowledge base, which has been extracted from 43things Online Social Network. In addition, we present results from a study that focused on evaluating intent profiles generated from transcripts of Egyptian presidential candidate speeches in
... Most of them try to categorize the queries as informational, navigational and transactional as proposed by Jansen et al [32]. Given a query suggestion, efforts have been done to understand the user intention using different means like web search logs [26], [33][34][35][36], previous user's search log for same query [37], clicked pages [38], user's search session history [39], Wikipedia [40], Wordnet and Google n-gram [41]. Using search query logs for existing users to identify intention cannot guarantee the correctness of search results [37]. ...
Article
Full-text available
p>Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings. </p
... For semantic features we explore user posts on a number of words denoting sentimental (positive and negative) attitude and cognitive work with the help of LIWC dictionary [7]. Moreover, we detect phrases of users that indicate their intentions from a linguistic point of view [5,8]. Furthermore, we added modularity [6] of a snapshot where a user appears. ...
... In [7], Gupta et al. follow an approach that defines domain-specific features, such as purchase action words, using the dependency structure of sentences. In web search, Strohmaier and Kröll [15] develop a method that learns a classifier from syntactic structure of (explicit) intent phrases and constructs a knowledge base for those intents with the search results obtained from the intent phrases as queries. The knowledge base is used to mark intents in a given document according to similarity measurements. ...
Conference Paper
Full-text available
Intent classification refers to the process of identifying a set of intents of interest that appear in a given document. This work considers the task of annotating travel-related reviews with travel intents that best represent the reviewer's reason for visiting the place of interest (POI). A domain-tailored word embedding model is learned to construct intent-specific feature vectors, thereby improving classification accuracy. The feasibility of multiclass intent classification is explored using an intent corpus, consisting of 6,560 labelled reviews.
... For semantic features we explore user posts on a number of words denoting sentimental (positive and negative) attitude and cognitive work with the help of LIWC dictionary [7]. Moreover, we detect phrases of users that indicate their intentions from a linguistic point of view [5,8]. Furthermore, we added modularity [6] of a snapshot where a user appears. ...
... Both are representational states, expressing a possible attitude towards a current state of affairs, occupying however different places in path that leads towards action: An intention is the result of a reflection process, pondering through different desires and perspectives, being a step closer to real action than a desire [17].Analyzing intent is orthogonal to sentiment analysis as well as opinion mining [14] and provides a different perspective about the human goals and desires. Strohmaier and Kroll [25][27] propose a novel NLP application called Intent Analysis, focusing on the extraction of goals and intentions present in textual context. Intent Analysis is similar to Sentiment Analysis; the main difference is while the former focuses on topic categorization by labeling them "positive" or "negative, the latter aims to classify text by the presence or not of an intent on its contents. ...
Conference Paper
Full-text available
Traditional approaches for process modeling usually comprise the control flow of well-structured activities that an organization performs in order to achieve its objectives. However, many processes involving decision-making and creativity do not follow a well-structured flow of activities, having rather a more ad-hoc nature at each instance. Knowledge Intensive Processes (KIP) is an example of this kind of process. It is difficult to gather information about a KIP and create a representative model, since it might vary from instance to instance due to decisions made by its participants. The contextual information of each activity - as well as the desires and intentions of the participants - are vital to the complete understanding of the process itself. In this paper, we propose a method to extract intentions and desires from KIP participants using NLP Techniques and social media content, as well as exploring its possibilities on a real case study using Twitter.
... Moreover, one further feature was extracted while mining user posts. We detect phrases of users that indicate their intentions from a linguistic point of view [8,13]. ...
Article
Full-text available
Understanding fluctuation of users help stakeholders to provide a better support to communities. Below we present an experiment where we detect communities, their evolution and based on the data characterize users that stay, leave or join a community. Using a resulted feature set and logistic regression we operate with models of users that are joining and users that are staying in a community. In the related work we emphasize a number of features we will include in our future experiments to enhance train accuracy. This work represents a ?first from a series of experiments devoted to user fluctuation in communities.
... Over the past years, there has been an increasing awareness of the user"s goals and intentions and such information has been proven important in a variety of applications which support information search, retrieval (Rose and Levinson, 2004;Strohmeier, 2008;Strohmeier and Kröll, 2012) and social networking (43Things Website mentioned earlier). ...
Chapter
Intention Mining has the purpose to manipulate of large volumes of data, integrate information from different sources and formats and extract useful insights as facts from this data in order to discover users’ intentions. It is used in different fields: Robotics, Network forensics, Security, Bioinformatics, Learning, Map Visualization, Game, etc. There is actually a large variety of intention mining techniques applied to different domains as information retrieval, security, robotics, etc. However, no systematic review had been conducted on this recent research domain. There is a need to understand what is Intention Mining, what is its purpose, what are the existing techniques and tools to mine intentions. In this paper, we propose a comparison framework to structure and to describe the domain °of Intention Mining for a further complete systematic literature review of this field. We validate our comparison framework by applying it to five relevant approaches in the domain.
Conference Paper
Full-text available
In medical processes such as surgical procedures and trauma resuscitations, medical teams perform treatment activities according to underlying invisible goals or intentions. In this study, we present an approach to uncover these intentions from observed treatment activities. Developed on top of a hierarchical hidden Markov model (H-HMM), our approach can identify multi-level intentions. To accurately infer the H-HMM, we used state splitting method with maximum a posteriori probability (MAP) as the scoring function. We evaluated our approach in both qualitative and quantitative ways, using a case study of the trauma resuscitation process. This dataset includes 123 trauma resuscitation cases collected at a level 1 trauma center. Our results show our intention mining achieved an accuracy of 86.6% in classifying medical teams' intentions. This work is an exploration of unsupervised intention mining of complex real-world medical processes.
Article
Full-text available
En este estudio se analiza el modo en que los usuarios realizan tareas de búsqueda y recuperación de información mediante consulta en la Biblioteca Digital Hispánica, distinguiendo grupos de usuarios en función de su distinto comportamiento informacional. Para ello se emplean los ficheros log recopilados por el servidor durante un año y se cotejan distintos algoritmos de agrupamiento. Se observa que el algoritmo k-means es un procedimiento de agrupamiento adecuado al análisis de extensos ficheros log de consultas en bibliotecas digitales. En el caso de la Biblioteca Digital Hispánica se distinguen tres grupos de usuarios cuyo comportamiento informacional distintivo se describe.
Conference Paper
Nowadays, mobile devices are the first choice for seeking information and content consumption on the Web. However, the overwhelming amount of available web resources, significantly affects the quality of the results returned by search systems. Traditionally, the web resource's retrieval is performed by using syntactic and/or semantic matches between the user query and content of the resources, leaving aside aspects such as: Goals and intentions that an end-user has when performing a query. This paper introduces a novel approach that allows to improve the mobile search user experience, delivering results according to his goals and intentions. This proposal is based on three main processes: i ) To infer goals and intentions of end-users from their search query through a probabilistic generative model called LDA ( Latent Dirichlet Allocation ). ii ) to discover resources based on inferred goals and intentions. And iii ) to generate a dynamic mashup with the retrieved resources. We argue that the concept of mashup can contribute to improve the user experience in mobile search. Experiments show promising results in the user search experience in contrast to traditional approaches.
Article
Today'smultimedia search engines are expected to respond to queries reflecting a wide variety of information needs from users with different goals. The topical dimension ("what" the user is searching for) of these information needs is well studied; however, the intent dimension ("why" the user is searching) has received relatively less attention. Specifically, intent is the "immediate reason, purpose, or goal" that motivates a user to query a search engine. We present a thorough survey of multimedia information retrieval research directed at the problem of enabling search engines to respond to user intent. The survey begins by defining intent, including a differentiation from related, often-confused concepts. It then presents the key conceptual models of search intent. The core is an overview of intent-aware approaches that operate at each stage of the multimedia search engine pipeline (i.e., indexing, query processing, ranking). We discuss intent in conventional text-based search wherever it provides insight into multimedia search intent or intentaware approaches. Finally, we identify and discuss the most important future challenges for intent-aware multimedia search engines. Facing these challenges will allow multimedia information retrieval to recognize and respond to user intent and, as a result, fully satisfy the information needs of users.
Article
Task-models concretize general requests to support users in real-world scenarios. In this paper, we present an IR based algorithm (IRTML) to automate the construction of hierarchically structured task-models. In contrast to other approaches, our algorithm is capable of assigning general tasks closer to the top and specific tasks closer to the bottom. Connections between tasks are established by extending Turney's PMI-IR measure. To evaluate our algorithm, we manually created a ground truth in the health-care domain consisting of 14 domains. We compared the IRTML algorithm to three state-of-the-art algorithms to generate hierarchical structures, i.e. BiSection K-means, Formal Concept Analysis and Bottom-Up Clustering. Our results show that IRTML achieves a 25.9% taxonomic overlap with the ground truth, a 32.0% improvement over the compared algorithms.
Article
Hot trends are likely to bring new business opportunities. For example, "Air Pollution" might lead to a significant increase of the sales of related products, e.g., mouth mask. For ecommerce companies, it is very important to make rapid and correct response to these hot trends in order to improve product sales. In this paper, we take the initiative to study the task of how to identify trend related products. The major novelty of our work is that we automatically learn commercial intents revealed from microblogs. We carefully construct a data collection for this task and present quite a few insightful findings. In order to solve this problem, we further propose a graph based method, which jointly models relevance and associativity. We perform extensive experiments and the results showed that our methods are very effective.
Article
In rapidly changing business environments, enterprises are encountering increasingly complicated and multidimensional challenges related to R&D and manufacturing processes. To address these challenges, knowledge requesters working for these enterprises must effectively gain knowledge from enterprise knowledge bases, other enterprises or knowledge markets. However, knowledge requesters cannot obtain a desired and distinctive solution from a single knowledge source, including their own enterprise knowledge base. If knowledge can be customised by combining knowledge from various sources to create personalised complementary knowledge combinations that are more suited to their knowledge requirements, then knowledge acquisition and searches invariably become more efficient and accurate. Therefore, an ontology-based complementary knowledge combination mechanism, which can be employed to enhance online digitised knowledge recommendations or enterprise knowledge management systems, was developed in this study. First, a knowledge requirement model and a knowledge-product ontology model was constructed to describe and structure knowledge content, and then an ontology similarity calculation method was developed to enable precise comparisons of the requirements and knowledge structuralised by the knowledge requirement and product models. Finally, according to the four indicators of similarity, duplication, amount of knowledge and cost, a genetic algorithm (GA)-based knowledge-product ontology combination method was developed to identify optimal knowledge combinations and subsequently provide a reference for knowledge requesters.
Conference Paper
Problem solving knowledge is omnipresent and scattered on the Web. While extracting and gathering such knowledge has been a focus of attention, it is equally important to devise a way to organize such knowledge for both human and machine consumption with respect to task goals. As a way to provide an extensive knowledge structure for human task goals, with which human problem solving knowledge extracted from Web resources can be organized, we devised a method for automatically grouping and organizing the goal statements in a Web 2.0 site that contains over two millions how-to instruction articles covering almost all task domains. In the proposed method, task goals having semantically and task-categorically similar action types and object types are grouped together by analyzing predicate-argument association patterns across all the goal statements through bipartite EM-like modeling. The result obtained with the unsupervised machine learning algorithm was evaluated by means of a human-annotated data set in a sample domain.
Conference Paper
To realize the vision of intelligent agents on the web, agents need to be capable of understanding people’s behavior. Such an understanding would enable them to better predict and support human activities on the web. If agents had access to knowledge about human goals, they could, for instance, recognize people’s goals from their actions or reason about people’s goals. In this work, we study to what extent it is feasible to automatically construct concept hierarchies of domain-specific human goals. This process consists of the following two steps: (1) extracting human goal instances from a search query log and (2) inferring hierarchical structures by applying clustering techniques. To compare resulting concept hierarchies, we manually construct a golden standard and calculate taxonomic overlaps. In our experiments, we achieve taxonomic overlaps of up to ~51% for the health domain and up to ~60% for individual health subdomains. In an illustration scenario, we provide a prototypical implementation to automatically complement goal concept hierarchies by means-ends relations, i.e. relating goals to actions which potentially contribute to their accomplishment. Our findings are particularly relevant for knowledge engineers interested in (i) acquiring knowledge about human goals as well as (ii) automating the process of constructing goal concept hierarchies.
Article
Full-text available
Computation is a process of making explicit, information that was implicit. In computing 5 as the solution to ∛125, for example, we move from a description that is not explicitly about 5 to one that is. We are drawing out numerical consequences to the description ∛125. We are extracting information implicit in the problem statement. Can we precisely state the difference between information thati s implicit in a state, structure or process and information that is explicit?
Article
Full-text available
In our research on Commonsense reasoning, we have found that an especially important kind of knowledge is knowl-edge about human goals. Especially when applying Com-monsense reasoning to interface agents, we need to recog-nize goals from user actions (plan recognition), and generate sequences of actions that implement goals (planning). We also often need to answer more general questions about the situations in which goals occur, such as when and where a particular goal might be likely, or how long it is likely to take to achieve. In past work on Commonsense knowledge acquisition, users have been directly asked for such information. Recently, however, another approach has emerged—to entice users into playing games where supplying the knowledge is the means to scoring well in the game, thus motivating the players. This approach has been pioneered by Luis von Ahn and his col-leagues, who refer to it as Human Computation. Common Consensus is a fun, self-sustaining web-based game, that both collects and validates Commonsense knowledge about everyday goals. It is based on the structure of the TV game show Family Feud 1 . A small user study showed that users find the game fun, knowledge quality is very good, and the rate of knowledge collection is rapid.
Article
Full-text available
The degree to which users' make their search intent explicit can be assumed to represent an upper bound on the level of service that search engines can provide. In a departure from traditional query expansion mechanisms, we introduce Intentional Query Suggestion as a novel idea that is attempting to make users' intent more explicit during search. In this paper, we present a prototypical algorithm for Intentional Query Suggestion and we discuss corresponding data from comparative experiments with traditional query suggestion mechanisms. Our preliminary results indicate that intentional query suggestions 1) diversify search result sets (i.e. it reduces result set overlap) and 2) have the potential to yield higher click-through rates than traditional query suggestions.
Conference Paper
Full-text available
Many activities on the web are driven by high-level goals of users, such as “plan a trip” or “buy some product”. In this paper, we are interested in exploring the role and structure of users’ goals in web search. We want to gain insights into how users express goals, and how their goals can be represented in a semi-formal way. This paper presents results from an exploratory study that focused on analyzing selected search sessions from a search engine log. In a detailed example, we demonstrate how goal-oriented search can be represented and understood as a traversal of goal graphs. Finally, we provide some ideas on how to construct large-scale goal graphs in a semi-algorithmic, collaborative way. We conclude with a description of a series of challenges that we consider to be important for future research.
Article
Full-text available
This paper presents a hierarchical taxonomy of human goals, based on similarity judgments of 135 goals gleaned from the literature. Women and men in 3 age groups—17–30, 25–62, and 65 and older—sorted the goals into conceptually similar groups. These were cluster analyzed and a taxonomy of 30 goal clusters was developed for each age group separately and for the total sample. The clusters were conceptually meaningful and consistent across the 3 samples. The broadest distinction in each sample was between interpersonal or social goals and intrapersonal or individual goals, with interpersonal goals divided into family-related and more general social goals. Further, the 30 clusters were organized into meaningful higher order clusters. The role of such a taxonomy in promoting theory development and research is discussed, as is its relationship to other organizations of human goals and to the Big Five structure of personality.
Conference Paper
Full-text available
Service robots will have to accomplish more and more complex, open-ended tasks and regularly acquire new skills. In this work, we propose a new approach to the problem of generating plans for such household robots. Instead composing them from atomic actions - the common approach in robot planning - we propose to transform task descriptions on web sites like ehow.com into executable robot plans. We present methods for automatically converting the instructions from natural language into a formal, logic-based representation, for resolving the word senses using the WordNet database and the Cyc ontology, and for exporting the generated plans into the mobile robot's plan language RPL. We discuss the problem of inferring information that is missing in these descriptions and the problem of grounding the abstract task descriptions in the perception and action system, and we propose techniques for solving them. The whole system works autonomously without human interaction. It has successfully been tested with a set of about 150 natural language directives, of which up to 80% could be correctly transformed.
Article
Full-text available
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
Article
Full-text available
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings.
Conference Paper
Full-text available
We describe results from Web search log studies aimed at elucidating user behaviors associated with queries and destination URLs that appear with different frequencies. We note the diversity of information goals that searchers have and the differing ways that goals are specified. We examine rare and common information goals that are specified using rare or common queries. We identify several significant differences in user behavior depending on the rarity of the query and the destination URL. We find that searchers are more likely to be successful when the frequencies of the query and destination URL are similar. We also establish that the behavioral differences observed for queries and goals of varying rarity persist even after accounting for potential confounding variables, including query length, search engine ranking, session duration, and task difficulty. Finally, using an information-theoretic measure of search difficulty, we show that the benefits obtained by search and navigation actions depend on the frequency of the information goal.
Conference Paper
Full-text available
Text categorization - the assignment of natural language texts to one or more predefined categories based on their content - is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, real-time classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned automatically from training examples. Linear Support Vector Machines (SVMs) are particularly promising because they are very accurate, quick to train, and quick to evaluate.
Conference Paper
Full-text available
People interact with interfaces to accomplish goals, and knowledge about human goals can be useful for building intelligent user interfaces. We suggest that modeling high, human-level goals like "repair my credit score", is especially useful for coordinating workflows between interfaces, automated planning, and building introspective applications. We analyzed data from 43Things.com, a website where users share and discuss goals and plans in natural language, and constructed a goal network that relates what goals people have with how people solve them. We then label goals with specific details, such as where the goal typically is met and how long it takes to achieve, facilitating plan and goal recognition. Lastly, we demonstrate a simple application of goal networks, deploying it in a mobile, location-aware to-do list application, ToDoGo, which uses goal networks to help users plan where and when to accomplish their desired goals.
Conference Paper
Full-text available
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Full-text available
The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user’s needs.
Conference Paper
Full-text available
In this research, we investigate a methodology to classify automatically Web queries by topic and user intent. Taking a 20,000 plus Web query data set sectioned by topic, we manually classified each query using a three-level hierarchy of user intent. We note that significant differences in user intent across topics. Results show that user intent (informational, navigational, and transactional) varies by topic (15 to 24 percent depending on the category). We then use this manually classified data set to classify searches in a Web search engine query stream automatically, using an exact match followed by n-gram approach. These approaches have the advantage of being implementable in real time for query classification of Web searches. The implications are that a search engine can improve retrieval performance by more effectively identifying the intent underlying user queries.
Conference Paper
Full-text available
An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side click logs, we present a new class of search behavior models that also exploit fine-grained user interactions with the search results. We show that mining these interactions, such as mouse movements and scrolling, can enable more effective detection of the user's search goals. Potential applications include automatic search evaluation, improving search ranking, result presentation, and search advertising. We describe extensive experimental evaluation over both controlled user studies, and logs of interaction data collected from hundreds of real users. The results show that our method is more effective than the current state-of-the-art techniques, both for detection of searcher goals, and for an important practical application of predicting ad clicks for a given search session.
Conference Paper
Full-text available
Annotations represent an increasingly popular means for organizing, categorizing and finding resources on the "social" web. Yet, only a small portion of the total resources available on the web are annotated. In this paper, we describe a prototype - iTAG - for automatically annotating textual resources with human intent, a novel dimension of tagging. We investigate the extent to which the automatic analysis of human intentions in textual resources is feasible. To address this question, we present selected evidence from a study aiming to automatically annotate intent in a simplified setting, that is transcripts of speeches given by US presidential candidates in 2008.
Conference Paper
Full-text available
It is often too expensive to compute and materialize a complete high-dimensional data cube. Computing an iceberg cube, which contains only aggregates above certain thresholds, is an effective way to derive nontrivial multi-dimensional aggregations for ...
Article
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.
Article
Open Mind Common Sense is a knowledge acquisition system designed to acquire commonsense knowledge from the general public over the web. We describe and evaluate our first fielded system, which enabled the construction of a 450,000 assertion commonsense knowledge base. We then discuss how our second-generation system addresses weaknesses discovered in the first. The new system acquires facts, descriptions, and stories by allowing participants to construct and fill in natural language templates. It employs word-sense disambiguation and methods of clarifying entered knowledge, analogical inference to provide feedback, and allows participants to validate knowledge and in turn each other.
Article
The KDD-Cup 2005 Competition was held in conjunction with the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. The task of the KDD-Cup 2005 competition was to classify 800,000 internet user search queries into 67 predefined categories. This task is easy to understand, but the lack of straightforward training set, subjective user intents of queries, poor information in short queries, and high noise level make the task very challenge.In this paper, we summarize the competition task, the evaluation method, and the results of the competition. Here we only highlight some key techniques used in submitted solutions. The technical details of the solutions from the three award winning teams are available in their papers separately in this issue of SIGKDD Explorations. At the end, we also share the results of a survey conducted with this year's Cup participants. To facilitate research in this area, the task description, data, answer set, and related information of this KDD-Cup are published at the KDD-Cup 2005 web site: http://www.acm.org/sigs/sigkdd/kdd2005/kddcup.html.
Article
For both people and machines, each in their own way, there is a serious problem in common of making sense out of what they hear, see, or are told about the world. The conceptual apparatus necessary to perform even a partial feat of understanding is formidable and fascinating. Our analysis of this apparatus is what this book is about. —Roger C. Schank and Robert P. Abelson from the Introduction (http://www.psypress.com/scripts-plans-goals-and-understanding-9780898591385)
Article
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.
Article
With the advent of the Web and the explosion of available textual data, it is key for modern natural language processing systems to access, represent and reason over large amounts of knowledge in semantic repositories. Separately, the knowledge representation and natural language processing communities have been developing representations/engines for reasoning over knowledge and algorithms for automatically harvesting knowledge from textual data, respectively. There is a pressing need for collaboration between the two communities to provide large-scale robust reasoning capabilities for knowledge rich applications like question answer- ing. In this chapter, we propose one small step by presenting algorithms for har- vesting semantic relations from text and then automatically linking the knowledge into existing semantic repositories. Experimental results show better than state of the art performance on both relation harvesting and ontologizing tasks.
Article
A method of determining the similarity of nouns on the basis of a metric derived from the distribution of subject, verb and object in a large text corpus is described. The resulting quasi-semantic classification of nouns demonstrates the plausibility of the distributional hypothesis, and has potential application to a variety of tasks, including automatic indexing, resolving nominal compounds, and determining the scope of modification.
Article
The Open Mind Common Sense project has been collecting common-sense knowledge from volun-teers on the Internet since 2000. This knowledge is represented in a machine-interpretable seman-tic network called ConceptNet. We present ConceptNet 3, which improves the acquisition of new knowledge in ConceptNet and facilitates turning edges of the network back into natural language. We show how its modular de-sign helps it adapt to different data sets and languages. Finally, we evaluate the content of ConceptNet 3, showing that the information it contains is comparable with WordNet and the Brandeis Semantic Ontology.
Article
In this paper, we report ongoing efforts in a large scale research project to develop methods for profiling individual Web search engine users by leveraging data recorded in the transaction logs of search engines. Our research aim is to investigate how completely one can profile a Web searcher using log data. Taking a broad brush approach, we present an array of profiling attributes to illustrate the spectrum of user characteristics possible from log data. Specifically, we present ongoing research for determining a user 's location, geographical interest, topic of interest, level of interest, the degree of commercial intent, whether the user plans to make a purchase, and whether the user will click a link. We present the state of our ongoing research in user profiling along with that of other researchers. Our findings show that one can develop a fairly robust profile of a Web searcher using log data. We also discuss issues of determining the specific identity of the user. We conclude with a discussion of the implications for the areas of system development, online advertising, privacy, and policies concerning the use of such profiling.
Conference Paper
Conceptual modeling has been fundamental to the management of structured data. However, its value is increasingly being recognized for knowledge management in general. In trying to develop suitable conceptual models for unstructured information, issues such as the level of representation and complexity of processing techniques arise. Here, we investigate the use of a conceptual model that is simple enough to allow efficient automatic extraction from documents. Our model focused on the problem-solution relationship that is central to the analysis of scientific papers. It also consists of supporting relationships such as benefits and drawbacks, assumptions, methods, extensions, and claims. Our study considered two kinds of documents - scientific research papers and patents. We evaluated the utility of the approach by building a prototype system and our user evaluation shows promising results.
Article
The KnowItAll system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KnowItAll's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KnowItAll extracted over 50,000 class instances, but suggested a challenge: How can we improve KnowItAll's recall and extraction rate without sacrificing precision?This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., “chemist” and “biologist” are identified as sub-classes of “scientist”). List Extraction locates lists of class instances, learns a “wrapper” for each list, and extracts elements of each list. Since each method bootstraps from KnowItAll's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KnowItAll a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Article
In this research, we investigated whether a learning process has unique information searching characteristics. The results of this research show that information searching is a learning process with unique searching characteristics specific to particular learning levels. In a laboratory experiment, we studied the searching characteristics of 72 participants engaged in 426 searching tasks. We classified the searching tasks according to Anderson and Krathwohl’s taxonomy of the cognitive learning domain. Research results indicate that applying and analyzing, the middle two of the six categories, generally take the most searching effort in terms of queries per session, topics searched per session, and total time searching. Interestingly, the lowest two learning categories, remembering and understanding, exhibit searching characteristics similar to the highest order learning categories of evaluating and creating. Our results suggest the view of Web searchers having simple information needs may be incorrect. Instead, we discovered that users applied simple searching expressions to support their higher-level information needs. It appears that searchers rely primarily on their internal knowledge for evaluating and creating information needs, using search primarily for fact checking and verification. Overall, results indicate that a learning theory may better describe the information searching process than more commonly used paradigms of decision making or problem solving. The learning style of the searcher does have some moderating effect on exhibited searching characteristics. The implication of this research is that rather than solely addressing a searcher’s expressed information need, searching systems can also address the underlying learning need of the user.
Conference Paper
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The di! erences are quantified as part of a large-scale study on extracting prominent attributes or quantifiable properties of classes (e.g., top speed, price and fuel consumption for CarModel) from unstructured text. In a head-to-head qualitative com- parison, a lightweight extraction method produces class at- tributes that are 45% more accurate on average, when ac- quired from query logs rather than Web documents.
Conference Paper
We survey many of the measures used to describe and evaluate the efficiency and effectiveness of large-scale search services. These measures, herein visualized versus verbalized, reveal a domain rich in complexity and scale. We cover six principle facets of search: the query space, users' query sessions, user behavior, operational requirements, the content space, and user demographics. While this paper focuses on measures, the measurements themselves raise questions and suggest avenues of further investigation.
Conference Paper
Many users are familiar with the interesting but limited functionality of Data Detector interfaces like Microsoft's Smart Tags and Google's AutoLink. In this paper we significantly expand the breadth and functionality of this type of user interface through the use of large-scale knowledge bases of semantic information. The result is a Web browser that is able to generate personalized semantic hypertext, providing a goal-oriented browsing experience. We present (1) Creo, a Programming by Example system for the Web that allows users to create a general-purpose procedure with a single example, and (2) Miro, a Data Detector that matches the content of a page to high-level user goals.
Conference Paper
Knowing the intent of a search query allows for more intelligent ways of retrieving relevant search results. Most of the recent work on automatic detection of query intent uses supervised learning methods that require a substantial amount of labeled data; manually collecting such data is often time-consuming and costly. Human computation is an active research area that includes studies of how to build online games that people enjoy playing, while in the process providing the system with useful data. In this work, we present the design principles behind a new game called Intentions, which aims to collect data about the intent behind search queries.
Conference Paper
In most previous work on personalized search algorithms, the results for all queries are personalized in the same manner. However, as we show in this paper, there is a lot of variation across queries in the benefits that can be achieved through personalization. For some queries, everyone who issues the query is looking for the same thing. For other queries, different people want very different results even though they express their need in the same way. We examine variability in user intent using both explicit relevance judgments and large-scale log analysis of user behavior patterns. While variation in user behavior is correlated with variation in explicit relevance judgments the same query, there are many other factors, such as result entropy, result quality, and task that can also affect the variation in behavior. We characterize queries using a variety of features of the query, the results returned for the query, and people's interaction history with the query. Using these features we build predictive models to identify queries that can benefit from personalization. Categories and Subject Descriptors