Article

The Semantic Web Challenge, 2009

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Navigational features have been largely recognized as fundamental for graph database query languages. This fact has motivated several authors to propose RDF query languages with navigational capabilities. In this paper, we propose the query language ...

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Using reasoners, semantic web applications are able to do reasoning on semantic web data. However, the amount of data in semantic web supporting notations has already grown to vast amounts [3], and the reasoning of these large data sets is one of the challenges facing semantic web reasoners [4]. ...
... Tests show us that even a computer with 8 GB memory cannot process data sets with tens of millions of triples. Given that there are already data sets with billions of triples [3], it is clear that there is a scalability problem to be solved with the reasoning of semantic web data. ...
Article
Full-text available
Reasoning is a vital ability for semantic web applications since they aim to understand and interpret the data on the World Wide Web. However, reasoning of large data sets is one of the challenges facing semantic web applications. In this paper, we present new approaches for scalable Resource Description Framework Schema (RDFS) reasoning. Our RDFS specific term-based partitioning algorithm determines required schema elements for each data partition while eliminating the data partitions that will not produce any inferences. With the two-level partitioning approach, we are able to carry out reasoning with limited resources. In our hybrid approach, we integrate two previously mentioned methods to benefit from the advantages of both. In the experimental tests we achieve linear speedups for reasoning times with the proposed hybrid approach. These algorithms and methods presented in the paper enable RDFS-level reasoning of large data sets with limited resources, and they together build up a scalable distributed reasoning approach.
... Therefore, we needed a collection of documents that would be a realistically large approximation to the amount of RDF data available 'live' on the Web and that contained relevant information for the queries, while simultaneously of a size that could be manageable by the resources of a research groups. We chose the 'Billion Triples Challenge' 2009 data set, a data-set created for the Semantic Web Challenge [3] in 2009. The dataset was created by crawling data from the web as well as combining the indexes from several semantic web search engines. ...
Article
Full-text available
The search for entities is the most common search type on the web beside navigational searches. Whereas most com-mon search techniques are based on the textual descriptions of web pages, semantic search approaches exploit the in-creasing amount of structured data on the Web in the form of annotations to web-pages and Linked Data. In many technologies, this structured data can consist of factual as-sertions about entities in which URIs are used to identify entities and their properties. The hypothesis is that this kind of structured data can improve entity search on the web. In order to test this hypothesis and to consistently progress in this field, a standardized evaluation is neces-sary. In this work, we discuss an evaluation campaign that specifically targets entity search over Linked Data by the means of keyword queries, including both queries that di-rectly mention the entity as well as those that only describe the entities. We also discuss how crowd-sourcing was used to obtain relevance assessments from non-expert web users, the participating systems and the factors that contributed to positive results, and how the competition generalizes results from a previous crowd-sourced entity search evaluation.
... and the Semantic Mediawiki [23] was used to empower a number of portal sites, such as the Institute of Applied Informatics and Formal Description Methods (AIFB, aifb.kit.edu) and Tetherless World Constellation (tw.rpi.edu). Meanwhile, there are many web portals that semantic domain-specified that come from the winners of the "challenges of Semantic Web" [24] including the CS AKTive Space [25], Museum Finland [26],[17] Multimedia E-Culture demonstrator [27], HealthFinland [28] and TrialX [29]. Although the Semantic Web portal site is well developed, most of the sites are too difficult to be imitated by those who are not specialists. ...
Article
Full-text available
One way overcome the weakness of semantic web to make it more user friendly is by displaying, browsing and semantically query data. In this research, we propose Semantic Web Research Community Portal at Faculty of Information Science and Technology – Universiti Kebangsaan Malaysia (FTSM RC) as the lightest platform of Semantic Web. This platform assists the users in managing the content and making visualization of relevant semantic data by applying meaningful periodically research. In such a way it will strengthen the research information related to research, publications, departments, organizations, events, and groups of researchers. Moreover, it will streamline the issuance process, making it easier for academic staff, support staff, and faculty itself to publish information of faculty and studies research information. By the end, this will provide end users with a better view of the structure of research at the university, allowing users to conduct cross-communication between faculty and study groups by using the search information.
Chapter
Six datasets have been published under the title of Billion Triple Challenge (BTC) since 2008. Each such dataset contains billions of triples extracted from millions of documents crawed from hundreds of domains. While these datasets were originally motivated by the annual ISWC competition from which they take their name, they would become widely used in other contexts, forming a key resource for a variety of research works concerned with managing and/or analysing diverse, real-world RDF data as found natively on the Web. Given that the last BTC dataset was published in 2014, we prepare and publish a new version – BTC-2019 – containing 2.2 billion quads parsed from 2.6 million documents on 394 pay-level-domains. This paper first motivates the BTC datasets with a survey of research works using these datasets. Next we provide details of how the BTC-2019 crawl was configured. We then present and discuss a variety of statistics that aim to gain insights into the content of BTC-2019. We discuss the hosting of the dataset and the ways in which it can be accessed, remixed and used.
Chapter
This chapter introduces the different types of data sources, from unstructured to structured, that will be used in the remainder of the book. Specifically, we discuss the web, Wikipedia, and knowledge bases. We further introduce standard datasets and provide pointers to tools and resources.
Article
Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines. © 2014 Karlsruher Institut fur Technologie (KIT). All rights reserved.
Article
The paper explores web reasoning based on Resolution with Partial Intersection and Truncation (PT-resolution). Instead of the traditional reasoning mechanism which is based on back-tracking and pattern matching, PT-resolution reasons based on set calculations. It prevents a derivation on a finite logic program from infinite looping and therefore, is ideal for web reasoning.
Article
RDFS reasoning is carried out via joint terms of triples; accordingly, a distributed reasoning approach should bring together triples that have terms in common. To achieve this, term-based partitioning distributes triples to partitions based on the terms they include. However, skewed distribution of Semantic Web data results in unbalanced load distribution. A single peer should be able to handle even the largest partition, and this requirement limits scalability. This approach also suffers from data replication since a triple is sent to multiple partitions. In this paper, we propose a two-step method to overcome above limitations. Our RDFS specific term-based partitioning algorithm applies a selective distribution policy and distributes triples with minimum replication. Our schema-sensitive processing approach eliminates non-productive partitions, and enables processing of a partition regardless of its size. Resulting partitions reach full closure without repeating the global schema or without fix-point iteration as suggested by previous studies.
Article
An increasing amount of structured data on the Web has attracted industry attention and renewed research interest in what is collectively referred to as semantic search. These solutions exploit the explicit semantics captured in structured data such as RDF for enhancing document representation and retrieval, or for finding answers by directly searching over the data. These data have been used for different tasks and a wide range of corresponding semantic search solutions have been proposed in the past. However, it has been widely recognized that a standardized setting to evaluate and analyze the current state-of-the-art in semantic search is needed to monitor and stimulate further progress in the field. In this paper, we present an evaluation framework for semantic search, analyze the framework with regard to repeatability and reliability, and report on our experiences on applying it in the Semantic Search Challenge 2010 and 2011.
Article
Full-text available
Electronic government (e-government) procurement is one of the most important activities in China. The paper consists of three parts. First, the paper introduces current situation of China's e-government procurement system which in-cludes the overall technical level, application level, and the existing problems. Based on the problem brought forward from the first part, the paper considers that a better solution is to apply business component theory and business com-ponent framework in the construction of e-government procurement, as it can solve the problems that block the develop-ment of e-government procurement in a more convenient way. The paper constructs the Business Component (BC) framework for the e-government procurement, analyzes the superiority of BC framework and describes a methodology for the application of BCs in e-government procurement. The paper utilizes semantic model for workflow by using on-tology modeling tool Protégé, uses ontology model database to store and manage workflow model, and builds a per-mission-based and user-involved workflow. At last, the paper takes public bidding, a main e-procurement method in China as an example and uses Appfuse and Osworkflow to prove the validity of the framework and methodology.
Conference Paper
Full-text available
One of the main shortcomings of Semantic Web technologies is that there are few user-friendly ways for displaying, browsing and querying semantic data. In fact, the lack of effective interfaces for end users significantly hinders further adoption of the Semantic Web. In this paper, we propose the Semantic Web Portal (SWP) as a light-weight platform that unifies off-the-shelf Semantic Web tools helping domain users organize, browse and visualize relevant semantic data in a meaningful manner. The proposed SWP has been demonstrated, tested and evaluated in several different use cases, such as a middle-sized research group portal, a government dataset catalog portal, a patient health center portal and a Linked Open Data portal for bio-chemical data. SWP can be easily deployed into any middle-sized domain and is also useful to display and visualize Linked Open Data bubbles. KeywordsSemantic Web data-browsing-visualization
Article
The Web of Data (WoD) is an Internet-based network of data resources and their relations. It has recently taken flight and combines over a hundred interlinked data sources with more than 15 billion edges. A consequence of this recent success is that a paradigm shift has taken place: up to now the Web of Data could be studied, searched and maintained like a classical database; nowadays it has turned into a Complex System and needs to be studied as such. In this paper, we introduce the Web of Data as a challenging object of study and provide initial results on two network scales: the pure data-layer, and the global connection between groups data items. In this analysis, we show that the "official" abstract representation of the WoD does not fit the real distribution we derive from the lower scale. As interesting as these results are, bigger challenges for analysis await in the form of the highly dynamic character of the WoD, and the typed, and implicit, character of the edges which is, to the best of our knowledge, hitherto unstudied.
ResearchGate has not been able to resolve any references for this publication.