Semantic Web Challenges: 4th SemWebEval Challenge at ESWC 2017, Portoroz, Slovenia, May 28 - June 1, 2017, Revised Selected Papers
Abstract
This book constitutes the thoroughly refereed post conference proceedings of the 4th edition of the Semantic Web Evaluation Challenge, SemWebEval 2017, co-located with the 14th European Semantic Web conference, held in Portoroz, Slovenia, in May/June 2017.
This book includes the descriptions of all methods and tools that competed at SemWebEval 2017, together with a detailed description of the tasks, evaluation procedures and datasets. The 11 revised full papers presented in this volume were carefully reviewed and selected from 21 submissions. The contributions are grouped in the areas: the mighty storage challenge; open knowledge extraction challenge; question answering over linked data challenge; semantic sentiment analysis.
Chapters (15)
The aim of the Mighty Storage Challenge (MOCHA) at ESWC 2017 was to test the performance of solutions for SPARQL processing in aspects that are relevant for modern applications. These include ingesting data, answering queries on large datasets and serving as backend for applications driven by Linked Data. The challenge tested the systems against data derived from real applications and with realistic loads. An emphasis was put on dealing with data in form of streams or updates.
Native RDF (http://www.w3.org/RDF/) stores have been making enormous progress in closing the performance gap compared to relational database management systems (RDBMS). But this small gap, however, still prevents the adoption of RDF stores in scenarios for large-scale enterprise applications. We solve this problem with our native RDF store QUAD and its fundamental design principles. It is based on a vector database schema for quadruples and it is realized by facilitating various index data structures. QUAD also comprises approaches to optimize the SPARQL query execution plan by using heuristic transformations. In this short paper, we briefly introduce QUAD and sketch in which tasks of the Mighty Storage Challenge we will attend to benchmark the current performance capabilities.
The Mighty Storage Challenge (MOCHA) aims to test the performance of solutions for SPARQL processing, in several aspects relevant for modern Linked Data applications. Virtuoso, by OpenLink Software, is a modern enterprise-grade solution for data access, integration, and relational database management, which provides a scalable RDF Quad Store. In this paper, we present a short overview of Virtuoso with a focus on RDF triple storage and SPARQL query execution. Furthermore, we showcase the final results of the MOCHA 2017 challenge and its tasks, along with a comparison between the performance of our system and the other participating systems.
The Open Knowledge Extraction Challenge invites researchers and practitioners from academia as well as industry to compete to the aim of pushing further the state of the art of knowledge extraction from text for the Semantic Web. The challenge has the ambition to provide a reference framework for research in this field by redefining a number of tasks typically from information and knowledge extraction by taking into account Semantic Web requirements and has the goal to test the performance of knowledge extraction systems. This year, the challenge goes in the third round and consists of three tasks which include named entity identification, typing and disambiguation by linking to a knowledge base depending on the task. The challenge makes use of small gold standard datasets that consist of manually curated documents and large silver standard datasets that consist of automatically generated synthetic documents. The performance measure of a participating system is twofold base on (1) Precision, Recall, F1-measure and on (2) Precision, Recall, F1-measure with respect to the runtime of the system.
In this paper we report the participation of ADEL to the OKE 2017 challenge. In particular, an adaptive entity recognition and linking framework that combines various extraction methods for improving the recognition level and implements an efficient knowledge base indexing process to increase the performance of the linking step. We detail how we deal with fine-grained entity types, either generic (e.g. Activity, Competition, Animal for Task 2) or domain specific (e.g. MusicArtist, SignalGroup, MusicalWork for Task 3). We also show how ADEL can flexibly link entities from different knowledge bases (DBpedia and MusicBrainz). We obtain promising results on the OKE 2017 challenge test dataset for the first three tasks.
The past years have seen a growing amount of research on question answering (QA) over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards while, at the same time, hiding their complexity behind an intuitive and easy-to-use interface. On the other hand, the growing amount of data has led to a heterogeneous data landscape where QA systems struggle to keep up with the volume, variety and veracity of the underlying knowledge.
In this paper we present a knowledge base question answering system for participation in Task 4 of the QALD-7 shared task. Our system is an end-to-end neural architecture for constructing a structural semantic representation of a natural language question. We define semantic representations as graphs that are generated step-wise and can be translated into knowledge base queries to retrieve answers. We use a convolutional neural network (CNN) model to learn vector encodings for the questions and the semantic graphs and use it to select the best matching graph for the input question. We show on two different datasets that our system is able to successfully generalize to new data.
We describe and present a new Question Answering (QA) component that can be easily used by the QA research community. It can be used to answer questions over DBpedia and Wikidata. The language support over DBpedia is restricted to English, while it can be used to answer questions in 4 different languages over Wikidata namely English, French, Ger-man and Italian. Moreover it supports both full natural language queries as well as keyword queries. We describe the interfaces to access and reuse it and the services it can be combined with. Moreover we show the evaluation results we achieved on the QALD-7 benchmark.
While SPARQL is a powerful way of accessing linked data, using natural language is more intuitive for most users. A few question answering systems already exist for English, but none focus specifically on French. Our system allows a user to query the DBpedia knowledge by asking questions in French separated in specific types, which are automatically translated into SPARQL queries. To our knowledge, this is the first French-based question answering system in the QALD competition.
Sentiment Analysis is a widely studied research field in both research and industry, and there are different approaches for addressing sentiment analysis related tasks. Sentiment Analysis engines implement approaches spanning from lexicon-based techniques, to machine learning, or involving syntactical rules analysis. Such systems are already evaluated in international research challenges. However, Semantic Sentiment Analysis approaches, which take into account or rely also on large semantic knowledge bases and implement Semantic Web best practices, are not under specific experimental evaluation and comparison by other international challenges. Such approaches may potentially deliver higher performance, since they are also able to analyze the implicit, semantics features associated with natural language concepts. In this paper, we present the fourth edition of the Semantic Sentiment Analysis Challenge, in which systems implementing or relying on semantic features are evaluated in a competition involving large test sets, and on different sentiment tasks. Systems merely based on syntax/word-count or just lexicon-based approaches have been excluded by the evaluation. Then, we present the results of the evaluation for each task and show the winner of the most innovative approach award, that combines several knowledge bases for addressing the sentiment analysis task.
Sentiment analysis in the financial domain is quickly becoming a prominent research topic as it provides a powerful method to predict market dynamics. In this work, we leverage advances in Semantic Web area to develop a fine-grained approach to predict real-valued sentiment scores. We compare several classifiers trained on two different datasets. The first dataset consists of microblog messages focusing on stock market events, while the second one consists of financially relevant news headlines crawled from different sources on the Internet. We test our approach using several feature sets including lexical features, semantic features and a combination of lexical and semantic features. Experimental results show that the proposed approach allows achieving an accuracy level of more than \(72\%\).
This paper describes theShukran Sentiment Analysis system. TheShukran is a social network micro-blogging service that allows users posting photos or videos and descriptions of their daily life activities. This social network rapidly gained a large amount of users. It provides people from different cultures and countries the possibility to share in different languages their stories, ideas, opinions, and news from their real life, and makes the cultural diversity the center of relationships between its users. Sentiment analysis aims to extract the opinion of the public about some topic by processing text data. One of its several tasks, the polarity detection, aims at categorizing the elements in a dataset (sentences, posts, etc.) into classes such as positive, negative and neutral. In the system we propose, and that represents the sentiment analysis core engine of theShukran social network, we will detect the original language of users posts, translate them into English and evaluate their sentiment (whether positive, negative or neutral). We propose the use of a Naive Bayes classifier and SentiWordNet and SenticNet for the sentiment evaluation. The language detection and translation are performed using TextBlob, a Python library for processing textual data.
In the last decade, the focus of the Opinion Mining field moved to detection of the pairs “aspect-polarity” instead of limiting approaches in the computation of the general polarity of a text. In this work, we propose an aspect-based opinion mining system based on the use of semantic resources for the extraction of the aspects from a text and for the computation of their polarities. The proposed system participated at the third edition of the Semantic Sentiment Analysis (SSA) challenge took place during ESWC 2017 achieving the runner-up place in the Task #2 concerning the aspect-based sentiment analysis. Moreover, a further evaluation performed on the SemEval 2015 benchmarks demonstrated the feasibility of the proposed approach.
Multi-Domain opinion mining consists in estimating the polarity of a document by exploiting domain-specific information. One of the main issue of the approaches discussed in literature is their poor capability of being applied on domains that have not been used for building the opinion model. In this paper, we present an approach exploiting the linguistic overlap between domains for building models enabling the estimation of polarities for documents belonging to any other domain. The system implementing such an approach has been presented at the third edition of the Semantic Sentiment Analysis Challenge co-located with ESWC 2017. Fuzzy representation of features polarity supports the modeling of information uncertainty learned from training set and integrated with knowledge extracted from two well-known resources used in the opinion mining field, namely Sentic.Net and the General Inquirer. The proposed technique has been validated on a multi-domain dataset and the results demonstrated the effectiveness of the proposed approach by setting a plausible starting point for future work.
With different social media and commercial platforms, users express their opinion about products in a textual form. Automatically extracting the polarity (i.e. whether the opinion is positive or negative) of a user can be useful for both actors: the online platform incorporating the feedback to improve their product as well as the client who might get recommendations according to his or her preferences. Different approaches for tackling the problem, have been suggested mainly using syntactic features. The “Challenge on Semantic Sentiment Analysis” aims to go beyond the word-level analysis by using semantic information. In this paper we propose a novel approach by employing the semantic information of grammatical unit called preposition. We try to derive the target of the review from the summary information, which serves as an input to identify the proposition in it. Our implementation relies on the hypothesis that the proposition expressing the target of the summary, usually containing the main polarity information.
... Another large collection of QA datasets can be found in the Question Answer- ing over Linked Data (QALD) challenges [6], a series of evaluation campaigns on Question Answering over Linked Data which have been organized since 2011. For each challenge, the QALD committee provides a training as well as test dataset created by experts, leading to a vast collection of question-answer pairs. ...
The role of Question Answering is central to the fulfillment of the Semantic Web. Recently, several approaches relying on artificial neural networks have been proposed to tackle the problem of question answering over knowledge graphs. Such techniques are however known to be data-hungry and the creation of training sets requires a substantial manual effort. We thus introduce Dbnqa, a comprehensive dataset of 894,499 pairs of questions and SPARQL queries based on templates which are specifically designed on the DBpedia knowledge base. We show how the method used to generate our dataset can be easily reused for other purposes. We report the successful adoption of Dbnqa in an experimental phase and present how it compares with existing question-answering corpora.
The approach described in this paper explores the use of semantic structured representation of sentences extracted from texts for multi-domain sentiment analysis purposes. The presented algorithm is built upon a domain-based supervised approach using index-like structured for representing information extracted from text. The algorithm extracts dependency parse relationships from the sentences containing in a training set. Then, such relationships are aggregated in a semantic structured together with either polarity and domain information. Such information is exploited in order to have a more fine-grained representation of the learned sentiment information. When the polarity of a new text has to be computed, such a text is converted in the same semantic representation that is used (i) for detecting the domain to which the text belongs to, and then (ii), once the domain is assigned to the text, the polarity is extracted from the index-like structure. First experiments performed by using the Blitzer dataset for training the system demonstrated the feasibility of the proposed approach.
Multi-domain sentiment analysis consists in estimating the polarity of a given text by exploiting domain-specific information. One of the main issues common to the approaches discussed in the literature is their poor capabilities of being applied on domains which are different from those used for building the opinion model. In this paper, we will present an approach exploiting the linguistic overlap between domains to build sentiment models supporting polarity inference for documents belonging to every domain. Word embeddings together with a deep learning architecture have been implemented for enabling the building of multi-domain sentiment model. The proposed technique is validated by following the Dranziera protocol in order to ease the repeatability of the experiments and the comparison of the results. The outcomes demonstrate the effectiveness of the proposed approach and also set a plausible starting point for future work.
With different social media and commercial platforms, users express their opinion about products in a textual form. Automatically extracting the polarity(i.e. whether the opinion is positive or negative) of a user can be useful for both actors: the online platform incorporating the feedback to improve their product as well as the client who might get recommendations according to his or her preferences. Different approaches for tackling the problem, have been suggested mainly using syntactic features. The “Challenge on Semantic Sentiment Analysis” aims to go beyond the word-level analysis by using semantic information. In this paper we propose a novel approach by employing the semantic information of grammatical unit called preposition. We try to derive the target of the review from the summary information, which serves as an input to identify the proposition in it. Our implementation relies on the hypothesis that the proposition expressing the target of the summary, usually containing the main polarity information.
Multi-domain opinion mining consists in estimating the polarity of a document by exploiting domain-specific information. One of the main issue of the approaches discussed in literature is their poor capability of being applied on domains that have not been used for building the opinion model. In this paper, we present an approach exploiting the linguistic overlap between domains for building models enabling the estimation of polarities for documents belonging to any other domain. The system implementing such an approach has been presented at the third edition of the Semantic Sentiment Analysis Challenge co-located with ESWC 2018. Fuzzy representation of features polarity supports the modeling of information uncertainty learned from training set and integrated with knowledge extracted from two well-known resources used in the opinion mining field, namely Sentic.Net and the General Inquirer. The proposed technique has been validated on a multi-domain dataset and the results demonstrated the effectiveness of the proposed approach by setting a plausible starting point for future work.
In the last decade, the focus of the Opinion Mining field moved to detection of the pairs “aspect-polarity” instead of limiting approaches in the computation of the general polarity of a text. In this work, we propose an aspect-based opinion mining system based on the use of semantic resources for the extraction of the aspects from a text and for the computation of their polarities. The proposed system participated at the third edition of the Semantic Sentiment Analysis (SSA) challenge took place during ESWC 2018 achieving the runner-up place in the Task #2 concerning the aspect-based sentiment analysis. Moreover, a further evaluation performed on the SemEval 2015 benchmarks demonstrated the feasibility of the proposed approach.
Sentiment Analysis is a widely studied research field in both research and industry, and there are different approaches for addressing sentiment analysis related tasks. Sentiment Analysis engines implement approaches spanning from lexicon-based techniques, to machine learning, or involving syntactical rules analysis. Such systems are already evaluated in international research challenges. However, Semantic Sentiment Analysis approaches, which take into account or rely also on large semantic knowledge bases and implement Semantic Web best practices, are not under specific experimental evaluation and comparison by other international challenges. Such approaches may potentially deliver higher performance, since they are also able to analyze the implicit, semantics features associated with natural language concepts. In this paper, we present the fifth edition of the Semantic Sentiment Analysis Challenge, in which systems implementing or relying on semantic features are evaluated in a competition involving large test sets, and on different sentiment tasks. Systems merely based on syntax/word-count or just lexicon-based approaches have been excluded by the evaluation. Then, we present the results of the evaluation for each task.
ResearchGate has not been able to resolve any references for this publication.