Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Open Knowledge Extraction (OKE) challenge, at its second edition, has the ambition to provide a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction), taking into account specific SW requirements. The OKE challenge defines two tasks: (1) Entity Recognition, Linking and Typing for Knowledge Base population; (2) Class Induction and entity typing for Vocabulary and Knowledge Base enrichment. Task 1 consists of identifying Entities in a sentence and create an OWL individual representing it, link to a reference KB (DBpedia) when possible and assigning a type to such individual. Task 2 consists in producing rdf:type statements, given definition texts. The participants will be given a dataset of sentences, each defining an entity (known a priori). The following systems participated to the challenge: WestLab to both Task 1 and 2, ADEL and Mannheim to Task 2 only. In this paper we describe the OKE challenge, the tasks, the datasets used for training and evaluating the systems, the evaluation method, and obtained results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this section, we present a thorough evaluation of ADEL over different benchmark datasets namely OKE2015 [36], OKE2016 [37], NEEL2014 [2], NEEL2015 [42], NEEL2016 [46] and AIDA [21]. Each of these datasets have its own characteristics detailed in Table 4. ...
... • OKE datasets were used in the OKE challenges [59][60][61]. ...
Article
Entity Linking (EL) consists of determinating the entities that best represent the mentions in a document. Mentions can be very ambiguous and can refer to different entities in different contexts. In this paper, we present ABACO, a semantic annotation system for Entity Linking (EL) which addresses name ambiguity assuming that the entity that annotates a mention should be coherent with the main topics of the document. ABACO extracts a sub-graph from a knowledge base which interconnects all the candidate entities to annotate each mention in the document. Candidate entities are scored according to their degree of centrality in the knowledge graph and their textual similarity with the topics of the document, and worst candidates are pruned from the sub-graph. The approach has been validated with 13 datasets and compared with other 11 annotation systems using the GERBIL platform. Results show that ABACO outperforms the other systems for medium/large documents.
... In 2014, the Entity Recognition and Disambiguation (ERD) challenge took place (Carmel et al., 2014). The Open Knowledge Extraction challenge series started in 2015 (Nuzzolese et al., 2015(Nuzzolese et al., , 2016Speck et al., 2017. ...
Preprint
Full-text available
The growing interest in making use of Knowledge Graphs for developing explainable artificial intelligence, there is an increasing need for a comparable and repeatable comparison of the performance of Knowledge Graph-based systems. History in computer science has shown that a main driver to scientific advances, and in fact a core element of the scientific method as a whole, is the provision of benchmarks to make progress measurable. This paper gives an overview of benchmarks used to evaluate systems that process Knowledge Graphs.
... Lifting approaches are usually assessed using tasks that do not focus on specific semantic web and KG aims. Of course, general tasks such as NER, NED, NEL, and relation extraction are important, but but they are usually designed without evaluating the output as knowledge graphs, Linked Data or OWL ontologies [151]. ...
Article
Full-text available
An enormous amount of digital information is expressed as natural-language (NL) text that is not easily processable by computers. Knowledge Graphs (KG) offer a widely used format for representing information in computer-processable form. Natural Language Processing (NLP) is therefore needed for mining (or lifting) knowledge graphs from NL texts. A central part of the problem is to extract the named entities in the text. The paper presents an overview of recent advances in this area, covering: Named Entity Recognition (NER), Named Entity Disambiguation (NED), and Named Entity Linking (NEL).We comment that many approaches to NED and NEL are based on older approaches to NER and need to leverage the outputs of state-of-the-art NER systems. There is also a need for standard methods to evaluate and compare named-entity extraction approaches. We observe that NEL has recently moved from being stepwise and isolated into an integrated process along two dimensions: the first is that previously sequential steps are now being integrated into end-to-end processes, and the second is that entities that were previously analysed in isolation are now being lifted in each other’s context. The current culmination of these trends are the deep-learning approaches that have recently reported promising results.
... The work available on knowledge extraction (KE) has been historically and initially fostered by many contests which facilitated the evaluation of Information Extraction systems. These started with the task of extracting ''Named Entities'' in the Message Understanding Conferences (MUC) series [26] evolved with many other Knowledge Population tasks, such as CoNLL [27,28], the Automatic Content Extraction (ACE) program [29], the Knowledge Base Population (KBP) task 3 at the Text Analysis Conference (TAC), 4 the Knowledge Base Acceleration (KBA) track at the TREC conference 5 and the Open Knowledge Extraction Challenge [30,31]. Finally, the Semantic Web Challenge has been organized in conjunction with the International Semantic Web Conference since 2003 and it is the longest-running competition in the semantic web area. ...
Article
The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such Knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.
... OKE2015 [15] and OKE2016 [16] are two datasets used during the SemEval at ESWC conferences. They contain short sentences (less than 200 sentences each dataset) which quickly describe one subject (e.g., short DBpedia abstracts). ...
Conference Paper
Full-text available
Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.
Article
Full-text available
The semantic web is an emerging technology that helps to connect different users to create their content and also facilitates the way of representing information in a manner that can be made understandable for computers. As the world is heading towards the fourth industrial revolution, the implicit utilization of artificial-intelligence-enabled semantic web technologies paves the way for many real-time application developments. The fundamental building blocks for the overwhelming utilization of semantic web technologies are ontologies, and it allows sharing as well as reusing the concepts in a standardized way so that the data gathered from heterogeneous sources receive a common nomenclature, and it paves the way for disambiguating the duplicates very easily. In this context, the right utilization of ontology capabilities would further strengthen its presence in many web-based applications such as e-learning, virtual communities, social media sites, healthcare, agriculture , etc. In this paper, we have given the comprehensive review of using the semantic web in the domain of healthcare, some virtual communities, and other information retrieval projects. As the role of semantic web is becoming pervasive in many domains, the demand for the semantic web in healthcare, virtual communities, and information retrieval has been gaining huge momentum in recent years. To obtain the correct sense of the meaning of the words or terms given in the textual content, it is deemed necessary to apply the right ontology to fix the ambiguity and shun any deviations that persist on the concepts. In this review paper, we have highlighted all the necessary information for a good understanding of the semantic web and its ontological frameworks.
Preprint
Full-text available
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.
Chapter
The fourth edition of the Open Knowledge Extraction Challenge took place at the 15th Extended Semantic Web Conference in 2018. The aim of the challenge was to bring together researchers and practitioners from academia as well as industry to compete of pushing further the state of the art in knowledge extraction from text for the Semantic Web. This year, the challenge reused two tasks from the former challenge and defined two new tasks. Thus, the challenge consisted of tasks such as Named Entity Identification, Named Entity Disambiguation and Linking as well as Relation Extraction. To ensure an objective evaluation of the performance of participating systems, the challenge ran on a version the FAIR benchmarking platform Gerbil integrated in the HOBBIT platform. The performance was measured on manually curated gold standard datasets with Precision, Recall, F1-measure and the runtime of participating systems.
Conference Paper
Full-text available
The Open Knowledge Extraction (OKE) challenge is aimed at promoting research in the automatic extraction of structured content from textual data and its representation and publication as Linked Data. We designed two extraction tasks: (1) Entity Recognition, Linking and Typing and (2) Class Induction and entity typing. The challenge saw the participations of four systems: CETUS-FOX and FRED participating to both tasks, Adel participating to Task 1 and OAK@Sheffield participat- ing to Task 2. In this paper we describe the OKE challenge, the tasks, the datasets used for training and evaluating the systems, the evaluation method, and obtained results.
Conference Paper
Full-text available
The need to bridge between the unstructured data on the Document Web and the structured data on the Web of Data has led to the development of a considerable number of annotation tools. However, these tools are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diag-nostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Conference Paper
Full-text available
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.
Article
Full-text available
The term "Linked Data" refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Conference Paper
Full-text available
In this paper we introduce the DOLCE upper level ontology, the first module of a Foundational Ontologies Library being developed within the WonderWeb project. DOLCE is presented here in an intuitive way; the reader should refer to the project deliverable for a detailed axiomatization. A comparison with WordNet's top-level taxonomy of nouns is also provided, which shows how DOLCE, used in addition to the OntoClean methodology, helps isolating and understanding some major WordNet's semantic limita- tions. We suggest that such analysis could hopefully lead to an "ontologi- cally sweetened" WordNet, meant to be conceptually more rigorous, cogni- tively transparent, and efficiently exploitable in several applications.
Conference Paper
Full-text available
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Full-text available
Ontology learning is the process of acquiring (constructing or integrating) an ontology (semi-) automatically. Being a knowledge acquisition task, it is a complex activity, which becomes even more complex in the context of the BOEMIE project, due to the management of multimedia resources and the multi-modal semantic interpretation that they require. The purpose of this chapter is to present a survey of the most relevant methods, techniques and tools used for the task of ontology learning. Adopting a practical perspective, an overview of the main activities involved in ontology learning is presented. This breakdown of the learning process is used as a basis for the comparative analysis of existing tools and approaches. The comparison is done along dimensions that emphasize the particular interests of the BOEMIE project. In this context, ontology learning in BOEMIE is treated and compared to the state of the art, explaining how BOEMIE addresses problems observed in existing systems and contributes to issues that are not frequently considered by existing approaches.
Chapter
Complexity and emergence are introduced here in relation with the self-organization of systems in levels of reality. Evolvability defined as the ability to evolve is the projected way to confront and surpass the successive levels of complexity. Polystochastic models allow refocusing from adaptable to evolvable, from a low dimensional to a higher dimensional insight. Significant concepts for evolvability as level of reality, circularity, semantic closure, functional circle, circular schema and integrative closure are presented. The correlation with organic computing or autonomic computing research areas is highlighted.
Conference Paper
In this paper we present a system for the 2016 edition of the Open Knowledge Extraction (OKE) Challenge. The OKE challenge promotes research in automatic extraction of structured content from textual data and its representation and publication as Linked Data. The proposed system addresses the second task of the challenge, namely “Class Induction and entity typing for Vocabulary and Knowledge Base enrichment” and combines state-of-the-art lexically-based Natural Language Processing (NLP) techniques with lexical and semantic knowledge bases to first extract hypernyms from definitional sentences and second select the most suitable class of the extracted hypernyms from those available in the DOLCE foundational ontology.
Conference Paper
Numerous entity linking systems are addressing the entity recognition problem by using off-the-shelf NER systems. It is, however, a difficult task to select which specific model to use for these systems, since it requires to judge the level of similarity between the datasets which have been used to train models and the dataset at hand to be processed in which we aim to properly recognize entities. In this paper, we present the newest version of ADEL, our adaptive entity recognition and linking framework, where we experiment with an hybrid approach mixing a model combination method to improve the recognition level and to increase the efficiency of the linking step by applying a filter over the types. We obtain promising results when performing a 4-fold cross validation experiment on the OKE 2016 challenge training dataset. We also demonstrate that we achieve better results that in our previous participation on the OKE 2015 test set. We finally report the results of ADEL on the OKE 2016 test set and we present an error analysis highlighting the main difficulties of this challenge.
Conference Paper
The automatic extraction of entities and their types from text, coupled with entity linking to LOD datasets, are fundamental challenges for the evolution of the Semantic Web. In this paper, we describe an approach to automatically process natural language definitions to (a) extract entity types and (b) align those types to the DOLCE+DUL ontology. We propose SPARQL patterns based on recurring dependency representations between entities and their candidate types. For the alignment subtask, we essentially rely on a pipeline of strategies that exploit the DBpedia knowledge base and we discuss some limitations of DBpedia in this context.
Conference Paper
In this paper we present the WESTLAB system, the winner of the 2016 OKE challenge Task 1. Our approach combines the output of a semantic annotator with the output of a named entity recognizer, and applies some heuristics for merging and filtering the detected mentions. The approach also applies a collective disambiguation method that relies on all the previously linked entities to choose between multiple candidate entities for a given mention. Using this approach, we greatly improve the performance of all the semantic annotators that are used as baselines in our experiments and also outperform the best system of the OKE Challenge 2015.
Conference Paper
The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.
Article
The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate. Data sources include audio and image data in addition to pure text, and Arabic and Chinese in addition to English. The effort involves defining the research tasks in detail, collecting and annotating data needed for training, development, and evaluation, and supporting the research with evaluation tools and research workshops. This program began with a pilot study in 1999. The next evaluation is scheduled for September 2004.
Article
The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of Data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.
Conference Paper
We have recently completed the sixth in a series of "Message Understanding Conferences" which are designed to promote and evaluate research in information extraction. MUC-6 introduced several innovations over prior MUCs, most notably in the range of different tasks for which evaluations were conducted. We describe some of the motivations for the new format and briefly discuss some of the results of the evaluations.
Article
We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.
Article
We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.
The Semantic Web: ESWC Challenges
  • H Sack
  • S Dietze
  • A Tordai
  • C Lange
Text2onto: A framework for ontology learning and data-driven change discovery
  • P Cimiano
  • J Völker
P. Cimiano and J. Völker. Text2onto: A framework for ontology learning and data-driven change discovery. In Proceedings of the 10th International Conference on Natural Language Processing and Information Systems, NLDB'05, pages 227-238, Berlin, Heidelberg, 2005. Springer-Verlag.
Open information extraction: The second generation
  • O Etzioni
  • A Fader
  • J Christensen
  • S Soderland
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: The second generation. In IJCAI, pages 3-10. IJCAI/AAAI, 2011.
Patty: A taxonomy of relational patterns with semantic types
  • N Nakashole
  • G Weikum
  • F Suchanek
N. Nakashole, G. Weikum, and F. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1135-1145, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
Open knowledge extraction challenge
  • A Nuzzolese
  • A Gentile
  • V Presutti
  • A Gangemi
  • D Garigliotti
  • R Navigli
A. Nuzzolese, A. Gentile, V. Presutti, A. Gangemi, D. Garigliotti, and R. Navigli. Open knowledge extraction challenge. In F. Gandon, E. Cabrio, M. Stankovic, and A. Zimmermann, editors, Semantic Web Evaluation Challenges, volume 548 of Communications in Computer and Information Science, pages 3-15. Springer International Publishing, 2015.
Ontology population and enrichment: State of the art
  • G Petasis
  • V Karkaletsis
  • G Paliouras
  • A Krithara
  • E Zavitsanos
G. Petasis, V. Karkaletsis, G. Paliouras, A. Krithara, and E. Zavitsanos. Ontology population and enrichment: State of the art. In Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, volume 6050 of Lecture Notes in Computer Science, pages 134-166. Springer, 2011.