Conference Paper

Freebase: A collaboratively created graph database for structuring human knowledge

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Freebase is a practical, scalable tuple database used to struc- ture general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Free- base currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP- based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We use the metrics of Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and Mean Recall (MR) to evaluate their performance. Results show that TOKTER achieved 0.743 in MAP, 0.824 in MRR, and 0.804 in MR for the top-10 recommendation results and significantly outperformed the baselines by more than 36.6% in MAP, 19.6% in MRR, and 1.9% in MR. ...
... There are two kinds of knowledge graphs: cross-domain (or generic) ones, e.g., Freebase [19], DBpedia [20], and YAGO [21], and domain-specific ones, e.g., MovieLens [22]. The test function recommendation needs a domain-specific knowledge graph to capture the links between domain knowledge and a confined specific test project instead of these public cross-domain knowledge graphs. ...
Article
Full-text available
Application Programming Interfaces (APIs) have become common in contemporary software development. Many automated API recommendation methods have been proposed. However, these methods suffer from a deficit of using domain knowledge, giving rise to challenges like the “cold start” and “semantic gap” problems. Consequently, they are unsuitable for test function recommendation, which recommends test functions for test engineers to implement test cases formed with various test steps. This paper introduces an approach named TOKTER, which recommends test functions leveraging test-oriented knowledge graphs. Such a graph contains domain concepts and their relationships related to the system under test and the test harness, which is constructed from the corpus data of the concerned test project. TOKTER harnesses the semantic associations between test steps (or queries) and test functions by considering literal descriptions, test function parameters, and historical data. We evaluated TOKTER with an industrial dataset and compared it with three state-of-the-art approaches. Results show that TOKTER significantly outperformed the baseline by margins of at least 36.6% in mean average precision (MAP), 19.6% in mean reciprocal rank (MRR), and 1.9% in mean recall (MR) for the top-10 recommendations.
... In addition to the industry uptake (Noy et al., 2019), large-scale KGs and Knowledge Bases (KBs) such as DBpedia (Lehmann et al., 2015), Freebase (Bollacker et al., 2008), YAGO (Tanon et al., 2020), and Wikidata (Vrandeči c & Krötzsch, 2014) have been created in various ways. For example, Wikidata is community-driven, while DBpedia and YAGO are mapped from the semi-structured information present in Wikipedia. ...
Article
Full-text available
The exponential growth of textual data in the digital era underlines the pivotal role of Knowledge Graphs (KGs) in effectively storing, managing, and utilizing this vast reservoir of information. Despite the copious amounts of text available on the web, a significant portion remains unstructured, presenting a substantial barrier to the automatic construction and enrichment of KGs. To address this issue, we introduce an enhanced Doc‐KG model, a sophisticated approach designed to transform unstructured documents into structured knowledge by generating local KGs and mapping these to a target KG, such as Wikidata. Our model innovatively leverages syntactic information to extract entities and predicates efficiently, integrating them into triples with improved accuracy. Furthermore, the Doc‐KG model's performance surpasses existing methodologies by utilizing advanced algorithms for both the extraction of triples and their subsequent identification within Wikidata, employing Wikidata's Unified Resource Identifiers for precise mapping. This dual capability not only facilitates the construction of KGs directly from unstructured texts but also enhances the process of identifying triple mentions within Wikidata, marking a significant advancement in the domain. Our comprehensive evaluation, conducted using the renowned WebNLG benchmark dataset, reveals the Doc‐KG model's superior performance in triple extraction tasks, achieving an unprecedented accuracy rate of 86.64%. In the domain of triple identification, the model demonstrated exceptional efficacy by mapping 61.35% of the local KG to Wikidata, thereby contributing 38.65% of novel information for KG enrichment. A qualitative analysis based on a manually annotated dataset further confirms the model's excellence, outshining baseline methods in extracting high‐fidelity triples. This research embodies a novel contribution to the field of knowledge extraction and management, offering a robust framework for the semantic structuring of unstructured data and paving the way for the next generation of KGs.
... The problem size of WebQuestionSP (WebQSP) [16] is smaller, but the knowledge graph is larger. It contains thousands of natural language questions based on Freebase [26], which has millions of entities and triples. Its problem is either one or two jumps. ...
Preprint
Full-text available
Multi-hop knowledge graph question answering aims to find answer entities from the knowledge graph based on natural language questions. This is a challenging task as it requires precise reasoning about entity relationships at each step. When humans perform multi-hop reasoning, they usually focus on specific relations between different hops and determine the next entity. However, most algorithms often choose the wrong specific relations, causing the system to deviate from the correct reasoning path. In multi-hop question answering, the specific relation between each hop is crucial. The existing TransferNet model mainly relies on question representation for relational reasoning, but cannot accurately calculate the specific relational distribution, which will profoundly affect question answering performance. On this basis, this paper proposes an interpretable assiatance framework, which makes full use of relation embedding and question semantics, and uses the attention mechanism to cross-fuse the relevant information of them to assist in calculating the relation distribution of each hop. Extensive experiments are conducted on two English datasets, WebQSP and CWQ, demonstrating that the proposed model outperforms state-of-the-art models by a large margin.
... The distribution of relation patterns in FB15k-237 is more complex than in NELL995. Most from the data of FC17 is sourced from Freebase [4] and aligned with ClueWeb [7]. In our experiment, we selected 46 relations with the highest frequencies. ...
Article
Full-text available
Many knowledge representation models extract local patterns or semantic features using fact embeddings but often overlook path semantics. There is room for improvement in pathbased approaches that rely solely on single paths. A customized convolutional neural network (CNN) architecture is proposed to encode multiple paths generated by random walks into vector sequences. For each path, the feature sequence is then merged into a single vector using bidirectional long short-term memory (LSTM) by concatenating both forward and backward hidden states. Semantic relevance between different paths and candidate relations is computed using the attention mechanism. The state vectors of the relations are calculated using weighted paths. These paths help determine the probabilities of the candidate relations, which are then used to assess the validity of the triples. Link prediction experiments on two benchmark datasets, NELL995 and FB15k-237, demonstrate the advantages of our solution. Our model shows a 7.19% improvement at Hits@3 on FB15k-237 compared to Att-Model + Type, another advanced model. The model is further applied to a large complex dataset, FC17, as well as a sparse dataset, NELL-One, for few-shot reasoning.
... Its primary application scenarios include search engines [33], voice assistants [9], intelligent Q &A [35], and so on. Various large-scale knowledge graphs, such as FreeBase [2], DBpedia [13], YAGO [21], and WordNet [15], have been developed to efficiently store and use structured knowledge. Knowledge graph embedding projects semantic information of entities (or relations) into low-dimensional vector space. ...
Article
Full-text available
Long-tail distribution is a difficult challenge for knowledge graph embedding. We expect to solve the problem by complementing the information through the neighbor aggregation mechanism of GCN. However, the GCN method and its derivations are unable to learn the representation of edges. To address this problem, we propose RCGCN-TE, Relation Correlations-aware Graph Convolutional Network with Text-Enhanced for knowledge graph embedding, which is the first effort to enable GCN to learn the representation of relations directly. First, the pre-trained language model is used to extract semantic information. Then, the relation correlation graph is constructed by defining the relation relevance function based on the co-occurrence pattern and semantic similarity of relations. Finally, two GCNS are designed to learn entities and relations respectively. Experimental results on tasks such as triple classification and link prediction are better than the baseline. For example, Hits@10, Hits@3, and Hits@1 improved by 8.23%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, 37.49%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, and 46.94%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, respectively, on the entity prediction task.
... An example is the Google Knowledge Graph Singhal [2012], which is part of Google's search engine and also responsible for answering questions to conversational agents like Google Assistant. Google Knowledge Graph contains information gathered from other wellknown KGs such as Freebase Bollacker et al. [2008], which was later incorporated into Google Wikidata Google [2014,2019]. Other examples of large knowledge bases are DBpedia Lehmann et al. [2014] and NELL Mitchell et al. [2018]. ...
Article
Full-text available
Conversational systems like chatbots have emerged as powerful tools for automating interactive tasks traditionally confined to human involvement. Fundamental to chatbot functionality is their knowledge base, the foundation of their reasoning processes. A pivotal challenge resides in chatbots' innate incapacity to seamlessly integrate changes within their knowledge base, thereby hindering their ability to provide real-time responses. The increasing literature attention dedicated to effective knowledge base updates, which we term content update, underscores the significance of this topic. This work provides an overview of content update methodologies in the context of conversational agents. We delve into the state-of-the-art approaches for natural language understanding, such as language models and alike, which are essential for turning data into knowledge. Additionally, we discuss turning point strategies and primary resources, such as deep learning, which are crucial for supporting language models. As our principal contribution, we review and discuss the core techniques underpinning information extraction as well as knowledge base representation and update in the context of conversational agents.
... Knowledge graphs are structured representations of knowledge in a graph format, with nodes denoting entities (such as people, places, and things) and edges denoting the relationships between those entities. In recent years, there has been a proliferation of large-scale knowledge graphs, such as Freebase [1], WordNet [2], and Nell [3], that cover various domains. These knowledge graphs have become a popular tool for many applications, such as question-answering [4,5], recommendation systems [6,7], and information retrieval [8], enabling the development of more intelligent systems that can understand and reason about the world in a more human-like manner. ...
Article
Full-text available
Knowledge graph embedding is a widely used technique that represents entities and relations in a low-dimensional space to predict missing links in knowledge graphs. However, most existing knowledge graph embedding methods focus solely on modeling multiple relation patterns, such as symmetry/antisymmetric, inversion, and composition, while ignoring semantic hierarchies consisting in real-world scenes. This limitation leads to inaccurate embeddings of entities and relations, which, in turn, negatively affects downstream tasks. To address this issue, we present a novel model, named Dual Hierarchical scaling Knowledge graph Embedding (DHKE), which maps entities and relations into complex space to model semantic hierarchies and relation patterns simultaneously. DHKE treats the embeddings of entities as their semantic hierarchies, and allocates two scaling vectors to each relation to enable transformations between hierarchies. Furthermore, DHKE assigns a rotation vector to each relation to distinguish between entities at the same semantic hierarchy and to model multiple relation patterns. Our experimental results and analysis indicate that DHKE outperforms existing methods and that DHKE is capable of modeling both semantic hierarchies and multiple relation patterns simultaneously. Notably, DHKE can capture semantic hierarchies of entities without extra information about entities, which are more suitable for real-world data.
... To showcase the effectiveness of our model, we conduct evaluations on two widely recognized benchmarks, FreeBase [2] and WordNet [11]. Table 2 shows statistics of the two KGs. ...
Preprint
Full-text available
Knowledge Graphs have been widely used to represent facts in a structured format. Due to their large scale applications, knowledge graphs suffer from being incomplete. The relation prediction task obtains knowledge graph completion by assigning one or more possible relations to each pair of nodes. In this work, we make use of the knowledge graph node names to fine-tune a large language model for the relation prediction task. By utilizing the node names only we enable our model to operate sufficiently in the inductive settings. Our experiments show that we accomplish new scores on a widely used knowledge graph benchmark.
... The applications supported by the knowledge graph are increasingly diverse and have been successfully applied to tasks such as intelligent question answering [1][2][3], personalized recommendations [4][5][6], and interpretable tools [7][8][9]. With the increase in the application of the knowledge graph, people have built large-scale non-domain knowledge graphs, such as DBpedia [10] and Freebase [11], and also built various knowledge graphs around domain application [12][13][14]. The contents of these knowledge graphs are complementary or duplicate. ...
Article
Full-text available
Entity alignment is an important task in knowledge fusion, which aims to link entities that have the same real-world identity in two knowledge graphs. However, in the process of constructing a knowledge graph, some noise may inevitably be introduced, which must affect the results of the entity alignment tasks. The triple confidence calculation can quantify the correctness of the triples to reduce the impact of the noise on entity alignment. Therefore, we designed a method to calculate the confidence of the triples and applied it to the knowledge representation learning phase of entity alignment. The method calculates the triple confidence based on the pairing rates of the three angles between the entities and relations. Specifically, the method uses the pairing rates of the three angles as features, which are then fed into a feedforward neural network for training to obtain the triple confidence. Moreover, we introduced the triple confidence into the knowledge representation learning methods to improve their performance in entity alignment. For the graph neural network-based method GCN, we considered entity confidence when calculating the adjacency matrix, and for the translation-based method TransE, we proposed a strategy to dynamically adjust the margin value in the loss function based on confidence. These two methods were then applied to the entity alignment, and the experimental results demonstrate that compared with the knowledge representation learning methods without integrating confidence, the confidence-based knowledge representation learning methods achieved excellent performance in the entity alignment task.
... FB15K-237 [46] is a subset of the widely used Freebase [47]. Its entities consist of those mentioned more frequently than 100 times in FreeBase entities. ...
Article
Full-text available
A knowledge graph is a structured semantic network designed to describe physical entities and relations in the world. A comprehensive and accurate knowledge graph is essential for tasks such as knowledge inference and recommendation systems, making link prediction a popular problem for knowledge graph completion. However, existing approaches struggle to model complex relations among entities, which severely hampers their ability to complete knowledge graphs effectively. To address this challenge, we propose a novel hierarchical multi-head attention network embedding framework, called RiQ-KGC, which integrates different-grained contextual information of knowledge graph triples and models quaternion rotation relations between entities. Furthermore, we propose a relation instantiation method for alleviating the difficulty of expressing complex relations between entities. To enhance the expressiveness of relation representation, the relation is integrated by Transformer to obtain multi-hop neighbor information, so that one relation can be embedded into different embeddings according to different entities. Experimental results on four datasets demonstrate that RiQ-KGC exhibits strong competitiveness compared to state-of-the-art models in link prediction, while the ablation experiments reveal that the proposed relation instantiation method achieves great performance.
... Yet real-world KGs are typically large-scale and incomplete, such as Freebase [9], Yago [10], and NELL [11]. Thus, reasoning over such KGs is a challenging task [1,12]. ...
Article
Full-text available
Answering complex queries with First-order logical operators over knowledge graphs, such as conjunction (\(\wedge \)), disjunction (\(\vee \)), and negation (\(\lnot \)) is immensely useful for identifying missing knowledge. Recently, neural symbolic reasoning methods have been proposed to map entities and relations into a continuous real vector space and model logical operators as differential neural networks. However, traditional methodss employ negative sampling, which corrupts complex queries to train embeddings. Consequently, these embeddings are susceptible to divergence in the open manifold of \(\mathbb {R}^n\). The appropriate regularization is crucial for addressing the divergence of embeddings. In this paper, we introduces a Lie group as a compact embedding space for complex query embedding, enhancing ability to handle the intricacies of knowledge graphs the foundation model. Our method aims to solve the query of disjunctive and conjunctive problems. Entities and queries are represented as a region of a high-dimensional torus, where the projection, intersection, union, and negation of the torus naturally simulate entities and queries. After simulating the operations on the region of the torus we defined, we found that the resulting geometry remains unchanged. Experiments show that our method achieved a significant improvement on FB15K, FB15K-237, and NELL995. Through extensive experiments on datasets FB15K, FB15K-237, and NELL995, our approach demonstrates significant improvements, leveraging the strengths of knowledge graphs foundation model and complex query processing.
... Despite knowledge bases containing millions of triples, KGs remain incomplete. For example, in Freebase [5], approximately 71% of people lack a birthplace, and about 75% lack nationality, leading to suboptimal performance in downstream applications. Thus, knowledge graph completion (KGC) techniques have emerged with the aim of predicting whether triples are factually valid and further enhancing existing knowledge bases. ...
Article
Full-text available
Knowledge graph completion (KGC) infers missing knowledge triples based on the facts in the knowledge base. In recent years, many representation learning models for knowledge reasoning have achieved promising link prediction results, especially those based on graph attention networks and their derivatives. Such models usually utilize the local neighborhood information for each node to learn representation vectors of target entities. However, most existing work focuses on modeling symmetry/asymmetry/composition/inversion relations, with less emphasis on hierarchical relations. Thus, a new hierarchical hyperbolic attention network for knowledge graph completion (HHAN-KGC) based on hyperbolic geometry is proposed in this paper. In HHAN-KGC, the entities at different hierarchies are distinguished by computing the distances between embedded feature vectors in the hyperbolic space and the origin, simultaneously, integrating neighboring information in the tangent space through the semantic attention mechanism, effectively addressing the limitations of existing hyperbolic models in reasoning about complex relations. The methodology analysis and experimental results demonstrate that HHAN-KGC can effectively model semantic hierarchies in the hyperbolic space, further enhancing the semantic representations of entities and relations. The results on multiple knowledge graph datasets indicate that HHAN-KGC outperforms the state-of-the-art methods in knowledge graph link prediction.
... Technically, KGs are directed graphs with multiple relations, where the nodes and the edges stand for the various entities and the relations between the entities, respectively. The relation between two entities can be used to represent KGs, such as NELL Carlson et al. (2010), YAGO Suchanek et al. (2007), Freebase Bollacker et al. (2008) and Wikidata Vrandecic and Krtoetzsch (2014). Knowledge information in the KGs is stored and represented by triples. ...
Article
Full-text available
Although Knowledge Graphs (KGs) provide great value in many applications, they are often incomplete with many missing facts. KG Completion (KGC) is a popular technique for knowledge supplement. However, there are two fundamental challenges for KGC. One challenge is that few entity pairs are often available for most relations, and the other is that there exists complex relations, including one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N). In this paper, we propose a new model to accomplish Few-shot KG Completion (FKGC) under complex relations, which is called Relation representation based on Private and Shared features for Adaptive few-shot link prediction (RPSA). In this model, we utilize the hierarchical attention mechanism for extracting the essential and crucial hidden information regarding the entity’s neighborhood so as to improve its representation. To enhance the representation of few-shot relations, we extract the private features (i.e., unique feature of each entity pair that represents the few-shot relation) and shared features (i.e., one or more commonalities among a few entity pairs that represent the few-shot relation). Specifically, a private feature extractor is used to extract the private semantic feature of the few-shot relation in the entity pair. After that, we design a shared feature extractor to extract the shared semantic features among a few reference entity pairs in the few-shot relation. Moreover, an adaptive aggregator aggregates several representations of the few-shot relation about the query. We conduct experiments on three datasets, including NELL-One, CoDEx-S-One and CoDEx-M-One datasets. According to the experimental results, the RPSA’s performance is better than that of the existing FKGC models. In addition, the RPSA model can also handle complex relations well, even in the few-shot scenario.
... Nonetheless, a KB can be considered as a rich repository storing complex structured and unstructured information in the form of entities, related attributes and mutual relationships [201]. Among the notable examples of KBs, one can cite Wikidata [253], DBpedia [14], YAGO (Yet Another Great Ontology) [239], ReadTheWeb [35], Freebase [26], Probase [264], and KnowItAll [66]. It is possible that an entity mention does not map to a particular entity name in a given KB, and that is why knowledge base population and enrichment is an active research area [229]. ...
Thesis
Full-text available
Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improvements in information retrieval tasks by exploiting low-dimensional contextualized representations of the corpus. While dense retrievers are known for their relative effectiveness, they suffer from lower efficiency and lack of generalization issues, when compared to sparse retrievers. For a lightweight retrieval task, high computational resources and time consumption are major barriers encouraging the renunciation of dense models despite potential gains. In this work, I propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed. A zero-shot end-to-end dense entity linking system is employed for entity recognition and disambiguation to augment the corpus. By leveraging the advanced entity linking methods, I believe that the effectiveness gap between sparse and dense retrievers can be narrowed. Experiments are conducted on the MS MARCO passage dataset using the original qrel set, the re-ranked qrels favoured by MonoT5 and the latter set further re-ranked by DuoT5. Since I am concerned with the early stage retrieval in cascaded ranking architectures of large information retrieval systems, the results are evaluated using recall@1000. The suggested approach is also capable of retrieving documents for query subsets judged to be particularly difficult in prior work. In addition, it is demonstrated that the non-expanded and the expanded runs with both explicit and hashed entities retrieve complementary results. Consequently, run combination methods such as run fusion and classifier selection are experimented to maximize the benefits of entity linking. Due to the success of entity methods for sparse retrieval, the proposed approach is also tested on dense retrievers. The corresponding results are reported in MRR@10.
... A knowledge graph (KG) [1] is a network of concepts where the fundamental element is a triple in the form of (entity, relationship, entity). Many knowledge graphs, including WordNet [2], NELL [3], and Freebase [4], have been developed and successfully applied in intelligent service areas like information retrieval, recommendation systems, and question answering systems. Since these large-scale knowledge graphs are often incomplete and require constant supplementation, knowledge reasoning [5,6] involves deducing new entities or relationships from existing data, thus continually enhancing the knowledge graph. ...
Article
Full-text available
In recent years, the emergence of large-scale language models, such as ChatGPT, has presented significant challenges to research on knowledge graphs and knowledge-based reasoning. As a result, the direction of research on knowledge reasoning has shifted. Two critical issues in knowledge reasoning research are the algorithm of the model itself and the selection of paths. Most studies utilize LSTM as the path encoder and memory module. However, when processing long sequence data, LSTM models may encounter the problem of long-term dependencies, where memory units of the model may decay gradually with an increase in time steps, leading to forgetting earlier input information. This can result in a decline in the performance of the LSTM model in long sequence data. Additionally, as the data volume and network depth increase, there is a risk of gradient disappearance. This study improved and optimized the LSTM model to effectively address the problems of gradient explosion and gradient disappearance. An attention layer was employed to alleviate the issue of long-term dependencies, and ConvR embedding was used to guide path selection and action pruning in the reinforcement learning inference model. The overall model achieved excellent reasoning results.
... L Arge scale knowldge graphs (KG) like Freebase [1],YAGO [2] and WordNet [3], primarily consisting of facts in the form of triplets (head entity, relation, tail entity) have been used in various applications ranging from semantic search to recommendation systems [4] and question answering [5]. However, due to inevitable insufficiency during knowledge graph completion, knowledge graphs suffer from intrinsic incompleteness. ...
Article
Full-text available
This paper presents Integrated Semantics-Structure Analysis in Knowledge Graph Completion (ISA-KGC), a new framework for Knowledge Graph Completion (KGC) aimed at addressing the incompleteness of knowledge graphs (KGs). ISA-KGC integrates Graph Neural Networks (GNN) with Transformer-based models, effectively blending structural and semantic information within Knowledge Graphs. This fusion enhances comprehension of KGs beyond what traditional methods offer. The framework utilizes Knowledge Graph Embedding (KGE) models, with GNN employed to augment these models, thus enhancing the overall analysis and interpretation of Knowledge Graphs. The effectiveness of ISA-KGC is validated through benchmark datasets FB15K-237 and WN18RR, showing notable improvements in performance metrics like hit@10 compared to existing methods.
... KGs can be classified into two categories: general domain KGs and specific-domain knowledge graphs, depending on the fields they cover. Typical examples of general KGs are Freebase [1], WordNet [2], Yago [3], etc, which are mainly used to describe commonsense knowledge and universal laws. A large number of highquality domain-specific KGs have been released in recent years [4][5] [6]. ...
Article
Full-text available
Most existing knowledge graphs (KGs) in specific domains suffer from problems of insufficient structural knowledge mining, superficial constraint of rules, incomplete system of rule patterns and higher error rate in the process of automated rule generation. In this paper, we present an adversarial generative approach for rule mining based on generative adversarial networks (GANs). The method firstly extracted a rule set according to a specific rule pattern defined manually, the rule set is then used as the adversarial training dataset for the GAN, That is, the discriminator determines whether a rule is true or not by learning the pattern of the rule set, and the generator tricks the discriminator by forging rules and improves according to the feedback from the generator.Finally, a generator is obtained to generate new rules that conform to the rule pattern, and a discriminator is obtained to determine the confidence of the automatically constructed triples.
Article
This paper studies association rule discovery in a graph G 1 by referencing an external graph G 2 with overlapping information. The objective is to enrich G 1 with relevant properties and links from G 2 . As a testbed, we consider Graph Association Rules (GARs). We propose a notion of graph joins to enrich G 1 by aligning entities across G 1 and G 2 . We also introduce a graph filtering method to support graph joins, by fetching only the data of G 2 that pertains to the entities of G 1 , to reduce noise and the size of the fused data. Based on these we develop a parallel algorithm to discover GARs across G 1 and G 2 . Moreover, we provide an incremental GAR discovery algorithm in response to updates to G 1 and G 2 . We show that both algorithms guarantee to reduce parallel runtime when given more processors. Better yet, the incremental algorithm is bounded relative to the batch one. Using real-life and synthetic data, we empirically verify that the methods improve the accuracy of association analyses by 30.4% on average, and scale well with large graphs.
Article
In this technical survey, the latest advancements in the field of recommender systems are comprehensively summarized. The objective of this study is to provide an overview of the current state-of-the-art in the field and highlight the latest trends in the development of recommender systems. It starts with a comprehensive summary of the main taxonomy of recommender systems, including personalized and group recommender systems. In addition, the survey analyzes the robustness, data bias, and fairness issues in recommender systems, summarizing the evaluation metrics used to assess the performance of these systems. Finally, it provides insights into the latest trends in the development of recommender systems and highlights the new directions for future research in the field.
Article
Purpose This paper covers the development of a novel defect model for concrete highway bridges. The proposed defect model is intended to facilitate the identification of bridge’s condition information (i.e. defects), improve the efficiency and accuracy of bridge inspections by supporting practitioners and even machines with digitalised expert knowledge, and ultimately automate the process. Design/methodology/approach The research design consists of three major phases so as to (1) categorise common defect with regard to physical entities (i.e. bridge element), (2) establish internal relationships among those defects and (3) relate defects to their properties and potential causes. A mixed-method research approach, which includes a comprehensive literature review, focus groups and case studies, was employed to develop and validate the proposed defect model. Findings The data collected through the literature and focus groups were analysed and knowledge were extracted to form the novel defect model. The defect model was then validated and further calibrated through case study. Inspection reports of nearly 300 bridges in China were collected and analysed. The study uncovered the relationships between defects and a variety of inspection-related elements and represented in the form of an accessible, digitalised and user-friendly knowledge model. Originality/value The contribution of this paper is the development of a defect model that can assist inexperienced practitioners and even machines in the near future to conduct inspection tasks. For one, the proposed defect model can standardise the data collection process of bridge inspection, including the identification of defects and documentation of their vital properties, paving the path for the automation in subsequent stages (e.g. condition evaluation). For another, by retrieving rich experience and expert knowledge which have long been reserved and inherited in the industrial sector, the inspection efficiency and accuracy can be considerably improved.
Article
Full-text available
Question-answering systems are recognized as popular and frequently effective means of information seeking on the web. In such systems, information seekers can receive a concise response to their queries by presenting their questions in natural language. Interactive question answering is a recently proposed and increasingly popular solution that resides at the intersection of question answering and dialogue systems . On the one hand, the user can ask questions in normal language and locate the actual response to her inquiry; on the other hand, the system can prolong the question-answering session into a dialogue if there are multiple probable replies, very few, or ambiguities in the initial request. By permitting the user to ask more questions, interactive question answering enables users to interact with the system and receive more precise results dynamically. This survey offers a detailed overview of the interactive question-answering methods that are prevalent in current literature. It begins by explaining the foundational principles of question-answering systems, hence defining new notations and taxonomies to combine all identified works inside a unified framework. The reviewed published work on interactive question-answering systems is then presented and examined in terms of its proposed methodology, evaluation approaches, and dataset/application domain. We also describe trends surrounding specific tasks and issues raised by the community, so shedding light on the future interests of scholars. Our work is further supported by a GitHub page synthesising all the major topics covered in this literature study. https://sisinflab.github.io/interactive-question-answering-systems-survey/
Article
In recent years, Knowledge Graphs (KGs) have played a crucial role in the development of advanced knowledge-intensive applications, such as recommender systems and semantic search. However, the human sensory system is inherently multi-modal, as objects around us are often represented by a combination of multiple signals, such as visual and textual. Consequently, Multi-modal Knowledge Graphs (MMKGs), which combine structured knowledge representation with multiple modalities, represent a powerful extension of KGs. Although MMKGs can handle certain types of tasks (e.g., visual query answering) or queries that standard KGs cannot process, and they can effectively tackle some standard problems (e.g., entity alignment), we lack a widely accepted definition of MMKG. In this survey, we provide a rigorous definition of MMKGs along with a classification scheme based on how existing approaches address four fundamental challenges: representation, fusion, alignment, and translation, which are crucial to improving an MMKG. Our classification scheme is flexible and allows for easy incorporation of new approaches, as well as a comparison of two approaches in terms of how they address one of the fundamental challenges mentioned above. As the first comprehensive survey of MMKG, this article aims to inspire and provide a reference for relevant researchers in the field of Artificial Intelligence.
Article
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining three key principles of modality heterogeneity , connections , and interactions that have driven subsequent innovations, and propose a taxonomy of six core technical challenges: representation , alignment , reasoning , generation , transference , and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.
Article
Full-text available
Extracting named entities text forms the basis for many crucial tasks such as information retrieval and extraction, machine translation, opinion mining, sentiment analysis and question answering. This paper presents a survey of the research literature on named entity linking, including named entity recognition and disambiguation. We present 200 works by focusing on 43 papers (5 surveys and 38 research works). We also describe and classify 56 resources, including 25 tools and 31 corpora. We focus on the most recent papers, where more than 95% of the described research works are after 2015. To show the efficiency of our construction methodology and the importance of this state of the art, we compare it to other surveys presented in the research literature, which were based on different criteria (such as the domain, novelty and presented models and resources). We also present a set of open issues (including the dominance of the English language in the proposed studies and the frequent use of NER rather than the end-to-end systems proposing NED and EL) related to entity linking based on the research questions that this survey aims to answer.
Article
Freebase is as collaboratively created and edited database of general, structured information intended for broad public use. It is designed to scale to a large num-ber and diversity of users and data. Both an AJAX/Web based user interface and an HTTP/JSON based API are provided to allow use by and collaboration between both humans and software. In particular, these inter-faces allow for an "emergent workflow" of collabora-tive, distributed data structuring and entity reconcilia-tion, even if individual users are not intentionally coop-erating in such information integration tasks.
Developing Metaweb-enabled Web Applications . Metaweb Technologies
  • D Flanagan
  • Flanagan D.
D. Flanagan. Developing Metaweb-enabled Web Applications. Metaweb Technologies, 2007.