Chapter

Approximate Sub-graph Matching over Knowledge Graph

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With the rapid development of the mobile internet, the volume of data has grown exponentially, and the content of data become more complicated. It is hard for people to select useful information from such a large number of data. In this paper, we study the problem of approximate sub-graph matching over knowledge graph. We first propose two algorithms to reduce the scale of knowledge graph. Next, we use an efficient algorithm to find similarity sub-graphs. Thirdly, we use skyline technique to further select high quality sub-graphs from the matching results. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose RippleNet, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the water, RippleNet stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that RippleNet achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.
Conference Paper
Full-text available
Recently, with the emergence of event-based online social services(e.g. Meetup), there have been increasing online activities to create, distribute, and organize social events. In this paper, we take the first systematic step to discover influential event organizers from online social networks who are essential to the overall success of social events. Informally, such event organizers comprise a small group of people who not only have the relevant skills or expertise that are required for an event (e.g. conference) but they are also able to influence largest number of people to actively contribute to it. We formulate it as the problem of mining influential cover set (ICS) where we wish to find k users in a social network G that together have the required skills or expertise (modeled as attributes of nodes in G) to organize an event such that they can influence the greatest number of individuals to participate in the event. The problem is, however, NP-hard. Hence, we propose three algorithms to find approximate solutions to the problem. The first two algorithms are greedy; they run faster, but have no guarantees. The third algorithm is 2-approximate and guarantees to find a feasible solution if any. Our empirical study over several real-world networks demonstrates the superiority of our proposed solutions.
Article
Full-text available
Planning an itinerary before travelling to a city is one of the most important travel preparation activities. In this paper, we propose a novel framework called TRIPPLANNER, leveraging a combination of Location-based Social Network (i.e. LBSN) and taxi GPS digital footprints to achieve personalized, interactive, and traffic-aware trip planning. First, we construct a dynamic POI network model by extracting relevant information from crowdsourced LBSN and taxi GPS traces. Then, we propose a two-phase approach for personalized trip planning. In the route search phase, TRIPPLANNER works interactively with users to generate candidate routes with specified venues; In the route augmentation phase, TRIPPLANNER applies heuristic algorithms to add user’s preferred venues iteratively to the candidate routes, with the objective of maximizing the route score while satisfying both the venue visiting time and total travel time constraints. To validate the efficiency and effectiveness of the proposed approach, extensive empirical studies were performed on two real-world data sets from the city of San Francisco, which contain more than 391,900 passenger delivery trips generated by 536 taxis in a month, and 110,214 check-ins left by 15,680 Foursquare users in six months.
Article
Full-text available
We study the problem of group recommendation. Recommendation is an important information exploration paradigm that retrieves interesting items for users based on their profiles and past activities. Single user recommendation has received significant attention in the past due to its extensive use in Amazon and Netflix. How to recommend to a group of users who may or may not share similar tastes, however, is still an open problem. The need for group recommendation arises in many scenarios: a movie for friends to watch together, a travel destination for a family to spend a holiday break, and a good restaurant for colleagues to have a working lunch. Intuitively, items that are ideal for recommendation to a group may be quite different from those for individual members. In this paper, we analyze the desiderata of group recommendation and propose a formal semantics that accounts for both item relevance to a group and disagreements among group members. We design and implement algorithms for efficiently computing group recommendations. We evaluate our group recommendation method through a comprehensive user study conducted on Amazon Mechanical Turk and demonstrate that incorporating disagreements is critical to the effectiveness of group recommendation. We further evaluate the efficiency and scalability of our algorithms on the MovieLens data set with 10M ratings.
Article
Full-text available
The local ratio technique is a methodology for the design and analysis of algorithms for a broad range of optimization problems. The technique is remarkably simple and elegant, and yet can be applied to several classical and fundamental problems (including covering problems, packing problems, and scheduling problems). The local ratio technique uses elementary math and requires combinatorial insight into the structure and properties of the problem at hand. Typically, when using the technique, one has to invent a weight function for a problem instance under which every "reasonable" solution is "good." The local ratio technique is closely related to the primal-dual schema, though it is not based on weak LP duality (which is the basis of the primal-dual approach) since it is not based on linear programming.In this survey we, introduce the local ratio technique and demonstrate its use in the design and analysis of algorithms for various problems. We trace the evolution path of the technique since its inception in the 1980's, culminating with the most recent development, namely, fractional local ratio, which can be viewed as a new LP rounding technique.
Article
Full-text available
Presently, there are numerous bioinformatics databases available on different websites. Although RDF was proposed as a standard format for the web, these databases are still available in various formats. With the increasing popularity of the semantic web technologies and the ever growing number of databases in bioinformatics, there is a pressing need to develop mashup systems to help the process of bioinformatics knowledge integration. Bio2RDF is such a system, built from rdfizer programs written in JSP, the Sesame open source triplestore technology and an OWL ontology. With Bio2RDF, documents from public bioinformatics databases such as Kegg, PDB, MGI, HGNC and several of NCBI's databases can now be made available in RDF format through a unique URL in the form of http://bio2rdf.org/namespace:id. The Bio2RDF project has successfully applied the semantic web technology to publicly available databases by creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Bio2RDF is based on a three-step approach to build mashups of bioinformatics data. The present article details this new approach and illustrates the building of a mashup used to explore the implication of four transcription factor genes in Parkinson's disease. The Bio2RDF repository can be queried at http://bio2rdf.org.
Conference Paper
Question answering over knowledge graph (QA-KG) aims to use facts in the knowledge graph (KG) to answer natural language questions. It helps end users more efficiently and more easily access the substantial and valuable knowledge in the KG, without knowing its data structures. QA-KG is a nontrivial problem since capturing the semantic meaning of natural language is difficult for a machine. Meanwhile, many knowledge graph embedding methods have been proposed. The key idea is to represent each predicate/entity as a low-dimensional vector, such that the relation information in the KG could be preserved. The learned vectors could benefit various applications such as KG completion and recommender systems. In this paper, we explore to use them to handle the QA-KG problem. However, this remains a challenging task since a predicate could be expressed in different ways in natural language questions. Also, the ambiguity of entity names and partial names makes the number of possible answers large. To bridge the gap, we propose an effective Knowledge Embedding based Question Answering (KEQA) framework. We focus on answering the most common types of questions, i.e., simple questions, in which each question could be answered by the machine straightforwardly if its single head entity and single predicate are correctly identified. To answer a simple question, instead of inferring its head entity and predicate directly, KEQA targets at jointly recovering the question's head entity, predicate, and tail entity representations in the KG embedding spaces. Based on a carefully-designed joint distance metric, the three learned vectors' closest fact in the KG is returned as the answer. Experiments on a widely-adopted benchmark demonstrate that the proposed KEQA outperforms the state-of-the-art QA-KG methods.
Article
The newly emerging event-based social networks (EBSNs) connect online and offline social interactions, offering a great opportunity to understand behaviors in the cyber-physical space. While existing efforts have mainly focused on investigating user behaviors in traditional social network services (SNS), this paper aims to exploit individual behaviors in EBSNs, which remains an unsolved problem. In particular, our method predicts activity attendance by discovering a set of factors that connect the physical and cyber spaces and influence individual's attendance of activities in EBSNs. These factors, including content preference, context (spatial and temporal) and social influence, are extracted using different models and techniques. We further propose a novel Singular Value Decomposition with Multi-Factor Neighborhood (SVD-MFN) algorithm to predict activity attendance by integrating the discovered heterogeneous factors into a single framework, in which these factors are fused through a neighborhood set. Experiments based on real-world data from Douban Events demonstrate that the proposed SVDMFN algorithm outperforms the state-of-the-art prediction methods. Copyright
Conference Paper
Freebase is a practical, scalable tuple database used to struc- ture general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Free- base currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP- based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.
Conference Paper
Abstract We present YAGO, a light-weight and extensible ontology with high cov- erage and quality. YAGO builds on entities and relations and currently contains roughly 900,000 entities and 5,000,000 facts. This includes the Is- A hierarchy as well as non-taxonomic relations between entities (such as hasWonPrize). The facts have been automatically extracted from the uni- fication of Wikipedia and WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowl- edge about individuals like persons, organizations, products, etc. with their semantic relationships ‐ and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correct- ness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information