Journal of Web Semantics

Published by Elsevier
Online ISSN: 1570-8268
Publications
Article
Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a whole arsenal of ontology-evaluation techniques that investigate the quality of ontologies as a product. In this paper, we aim to shed light on the process of ontology engineering construction by introducing and applying a set of measures to analyze hidden social dynamics. We argue that especially for ontologies which are constructed collaboratively, understanding the social processes that have led to its construction is critical not only in understanding but consequently also in evaluating the ontology. With the work presented in this paper, we aim to expose the texture of collaborative ontology engineering processes that is otherwise left invisible. Using historical change-log data, we unveil qualitative differences and commonalities between different collaborative ontology engineering projects. Explaining and understanding these differences will help us to better comprehend the role and importance of social factors in collaborative ontology engineering projects. We hope that our analysis will spur a new line of evaluation techniques that view ontologies not as the static result of deliberations among domain experts, but as a dynamic, collaborative and iterative process that needs to be understood, evaluated and managed in itself. We believe that advances in this direction would help our community to expand the existing arsenal of ontology evaluation techniques towards more holistic approaches.
 
Article
Effective debugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on everyday users to create and maintain knowledge bases, such as the Semantic Web. In such systems ontologies capture formalized vocabularies of terms shared by its users. However in many cases users have different local views of the domain, i.e. of the context in which a given term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. To identify the best query we propose two query selection strategies: a simple "split-in-half" strategy and an entropy-based strategy. The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our evaluation showed that the entropy-based method significantly reduces the number of required queries compared to the "split-in-half" approach. We experimented with different probability distributions of user errors and different qualities of the a priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection even in cases where all fault probabilities are equal, i.e. where no information about typical user errors is available.
 
Article
While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question. (1) What distinctions of user motivations are identified by previous research, and in what ways are the motivations of users amenable to quantitative analysis? (2) To what extent does tagging motivation vary across different social tagging systems? (3) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply these measures to datasets from seven different tagging systems. Our results show that (a) users' motivation for tagging varies not only across, but also within tagging systems, and that (b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (1) the development of tag-based user interfaces, (2) the analysis of tag semantics and (3) the design of search algorithms for social tagging systems.
 
Article
The power of the Web is enhanced through the network effect produced as resources link to each other with the value determined by Metcalfe's law. In Web 2.0 applications, much of that effect is delivered through social linkages realized via social networks online. Unfortunately, the associated semantics for Web 2.0 applications, delivered through tagging, is generally minimally hierarchical and sparsely linked. The Semantic Web suffers from the opposite problem. Semantic information, delivered through ontologies of varying amounts of expressivity, is linked to other terms (within or between resources) creating a link space in the semantic realm. However, the use of the Semantic Web has yet to fully realize the social schemes that provide the network of users. In this article, we discuss putting these together, with linked semantics coupled to linked social networks, to deliver a much greater effect.
 
Article
Semantic Web technologies must integrate with Web 2.0 services for both to leverage each others strengths. We argue that the REST-based design methodologies [R.T. Fielding, R.N. Taylor, Principled design of the modern web architecture, ACM Trans. Internet Technol. (TOIT) 2 (2) (2002) 115–150] of the web present the ideal mechanism through which to align the publication of semantic data with the existing web architecture. We present the design and implementation of two solutions that combine REST-based design and RDF [D. Beckett (Ed.), RDF/XML Syntax Specification (Revised), W3C Recommendation, February 10, 2004] data access: one solution for integrating existing web services and one server-side solution for creating RDF REST services. Both of these solutions enable SPARQL [E. Prud’hommeaux, A. Seaborne (Eds.), SPARQL Query Language for RDF, W3C Working Draft, March 26, 2007] to be a unifying data access layer for aligning the Semantic Web and Web 2.0.
 
Article
As web users disseminate more of their personal information on the web, the possibility of these users becoming victims of lateral surveillance and identity theft increases. Therefore web resources containing this personal information, which we refer to as identity web references must be found and disambiguated to produce a unary set of web resources which refer to a given person. Such is the scale of the web that forcing web users to monitor their identity web references is not feasible, therefore automated approaches are required. However, automated approaches require background knowledge about the person whose identity web references are to be disambiguated. Within this paper we present a detailed approach to monitor the web presence of a given individual by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation processes. We present a methodology for generating this background knowledge by exporting data from multiple Web 2.0 platforms as RDF data models and combining these models together for use as seed data. We present two disambiguation techniques; the first using a semi-supervised machine learning technique known as Self-training and the second using a graph-based technique known as Random Walks, we explain how the semantics of data supports the intrinsic functionalities of these techniques. We compare the performance of our presented disambiguation techniques against several baseline measures including human processing of the same data. We achieve an average precision level of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks in both cases outperforming several baselines measures.
 
Article
Trust is an integral component in many kinds of human interaction, allowing people to act under uncertainty and with the risk of negative consequences. For example, exchanging money for a service, giving access to your property, and choosing between conflicting sources of information all may utilize some form of trust. In computer science, trust is a widely used term whose definition differs among researchers and application areas. Trust is an essential component of the vision for the Semantic Web, where both new problems and new applications of trust are being studied. This paper gives an overview of existing trust research in computer science and the Semantic Web.
 
Article
Web annotation is crucial for providing machine-understandable descriptions of Web resources, and has a number of applications such as discovery, qualification, and adaptation of Web documents. While annotations are often embedded into a Web document, annotations can be associated externally by means of addressing expressions represented with the XPath language. However, creation of external annotation solely with a conventional editor is not easy because annotation authoring involves the maintenance and elaboration of addressing expressions as well as annotation contents. In addition, there has been little empirical study of robust pointing by XPath expressions, in spite of the increasing prevalence of the XPath language for use in emerging content adaptation systems. This paper proposes a classification of annotation tool design, taking account of differences in authoring methods and roles of annotation. On the basis of the classification, tools for generating external annotations are briefly explained along with applications of Web document adaptation for small-screen devices and portal site development. Robustness of the addressing expressions is then investigated, and practical implications to the reliable use of external annotation are drawn from empirical evaluation with evolving real-life Web documents.
 
Article
We describe the design of a policy-based spectrum access control system for the Defense Advanced Research Projects Agency (DARPA) NeXt Generation (XG) communications program to overcome harmful interference caused by a malfunctioning device or a malicious user. In tandem with signal-detection-based interference-avoidance algorithms employed by cognitive software-defined radios (SDR), we design a set of policy-based components, tightly integrated with the accredited kernel on the radio device. The policy conformance and enforcement components ensure that a radio does not violate machine understandable policies, which are encoded in a declarative language and which define stakeholders’ goals and requirements. We report on our framework experimentation, illustrating the capability offered to radios for enforcing policies and the capability for managing radios and securing access control to interfaces changing the radios’ policies.
 
Article
This paper takes as its premise that the web is a place of action, not just information, and that the purpose of global data is to serve human needs. The paper presents several component technologies, which together work towards a vision where many small micro-applications can be threaded together using automated assistance to enable a unified and rich interaction. These technologies include data detector technology to enable any text to become a start point of semantic interaction; annotations for web-based services so that they can link data to potential actions; spreading activation over personal ontologies, to allow modelling of context; algorithms for automatically inferring ‘typing’ of web-form input data based on previous user inputs; and early work on inferring task structures from action traces. Some of these have already been integrated within an experimental web-based (extended) bookmarking tool, Snip!t, and a prototype desktop application On Time, and the paper discusses how the components could be more fully, yet more openly, linked in terms of both architecture and interaction. As well as contributing to the goal of an action and activity-focused web, the work also exposes a number of broader issues, theoretical, practical, social and economic, for the Semantic Web.
 
Article
Ontology mapping seeks to find semantic correspondences between similar elements of different ontologies. It is a key challenge to achieve semantic interoperability in building the Semantic Web. This paper proposes a new generic and adaptive ontology mapping approach, called the PRIOR+, based on propagation theory, information retrieval techniques and artificial intelligence. The approach consists of three major modules, i.e., the IR-based similarity generator, the adaptive similarity filter and weighted similarity aggregator, and the neural network based constraint satisfaction solver. The approach first measures both linguistic and structural similarity of ontologies in a vector space model, and then aggregates them using an adaptive method based on their harmonies, which is defined as an estimator of performance of similarity. Finally to improve mapping accuracy the interactive activation and competition neural network is activated, if necessary, to search for a solution that can satisfy ontology constraints. The experimental results show that harmony is a good estimator of f-measure; the harmony based adaptive aggregation outperforms other aggregation methods; neural network approach significantly boosts the performance in most cases. Our approach is competitive with top-ranked systems on benchmark tests at OAEI campaign 2007, and performs the best on real cases in OAEI benchmark tests.
 
Article
Although OWL is rather expressive, it has a very serious limitation on datatypes; i.e., it does not support customised datatypes. It has been pointed out that many potential users will not adopt OWL unless this limitation is overcome, and the W3C Semantic Web Best Practices and Development Working Group has set up a task force to address this issue. This paper makes the following two contributions: (i) it provides a brief summary of OWL-related datatype formalisms, and (ii) it provides a decidable extension of OWL DL, called OWL-Eu, that supports customised datatypes. A detailed proof of the decidability of OWL-Eu is presented.
 
Article
The Web Services world consists of loosely-coupled distributed systems which adapt to changes by the use of service descriptions that enable ad-hoc, opportunistic service discovery and reuse. At present, these service descriptions are semantically impoverished, being concerned with describing the functional signature of the services rather than characterising their meaning. In the Semantic Web community, the DAML Services effort attempts to rectify this by providing a more expressive way of describing Web Services using ontologies. However, this approach does not separate the domain-neutral communicative intent of a message (considered in terms of speech acts) from its domain-specific content, unlike similar developments from the multi-agent systems community.We describe our experiences of designing and building an ontologically motivated Web Services system for situational awareness and information triage in a simulated humanitarian aid scenario. In particular, we discuss the merits of using techniques from the multi-agent systems community for separating the intentional force of messages from their content, and the implementation of these techniques within the DAML Services model.
 
Article
The current infrastructure for Web services supports service discovery based on a common repository. However, the key challenge is not discovery but selection: ultimately, the service user must select one good provider. Whereas service descriptions are given from the perspective of providers, service selection must take the perspective of users. In this way, service selection involves pragmatics, which builds on but is deeper than semantics.Current approaches provide no support for this challenge. Importantly, service selection differs significantly from product selection, which is the problem addressed by traditional product recommender approaches. The assumptions underlying product recommender approaches do not hold for services. For example a vendor site knows of all product purchases made at it, whereas a service registry does not know of the service episodes that may involve services discovered from it. Also, traditional approaches assume that users are willing to reveal their evaluations to each vendor site.This paper formulates the problem of service selection. It reformulates two traditional recommender approaches for service selection and proposes a new agent-based approach in which agents cooperate to evaluate service providers. In this approach, the agents rate each other, and autonomously decide how much to weigh each other’s recommendations. The underlying algorithm with which the agents reason is developed in the context of a concept lattice, which enables finding relevant agents.Since large service selection datasets do not yet exist, for the purposes of evaluation, we reformulate the well-known product evaluations dataset MovieLens as a services dataset. We use it to compare the various approaches. Despite limiting the flow of information, the proposed approach compares well with the existing approaches in terms of some accuracy metrics defined within.
 
Article
This paper describes a novel approach to the description and discovery of Semantic Web services. We propose SPARQL as a formal language to describe the preconditions and postconditions of services, as well as the goals of agents. In addition, we show that SPARQL query evaluation can be used to check the truth of preconditions in a given context, construct the postconditions that will result from the execution of a service in a context, and determine whether a service execution with those results will satisfy the goal of an agent. We also show how certain optimizations of these tasks can be implemented in our framework.
 
Article
The semantic grid is the result of semantic web and grid researchers building bridges in recognition of the shared vision and research agenda of both fields. This paper builds on prior experiences with both agents and grids to illustrate the benefits of bringing agents into the mix. Because semantic grids represent and reason about knowledge declaratively, additional capabilities typical of agents are then possible including learning, planning, self-repair, memory organization, meta-reasoning, and task-level coordination. These capabilities would turn semantic grids into cognitive grids. Only a convergence of these technologies will provide the ingredients to create the fabric for a new generation of distributed intelligent systems.
 
Linkage Points between the Selected Bioinformatics Data Sets. 
Bioinformatics Data Sources Utilized in the Case Study
Article
The integration of disparate biomedical data continues to be a challenge for drug discovery efforts. Semantic Web technologies provide the capability to more easily aggregate data and thus can be utilized to improve the efficiency of drug discovery. We describe an implementation of a Semantic Web infrastructure that utilizes the scalable Oracle Resource Description Framework (RDF) Data Model as the repository and Seamark Navigator for browsing and searching the data. The paper presents a use case that identifies gene biomarkers of interest and uses the Semantic Web infrastructure to annotate the data.
 
Article
This paper describes a novel approach for obtaining semantic interoperability in a bottom–up, semi-automatic manner without relying on pre-existing, global semantic models. We assume that large amounts of data exist that have been organized and annotated according to local schemas. Seeing semantics as a form of agreement, our approach enables the participating data sources to incrementally develop global agreements in an evolutionary and completely decentralized process that solely relies on pair-wise, local interactions.
 
Article
We describe CS AKTive Space, an integrated semantic web application and winner of the 2003 Semantic Web Challenge [http://www.challenge.semanticweb.org/]. A demonstration of the application is available at http://cs.aktivespace.org/. CS AKTive Space represents and integrates a wide range of heterogenous resources representing the computer science domain in the UK; it supports the exploration of patterns and implications inherent in the content and exploits a variety of services, visualisations and multidimensional representations to support questions like who is working with whom, where are there geographical concentrations in funding or research area, who are the most significant researchers in an area. We briefly show how this demonstration illustrates a number of substantial challenges for the Semantic Web. These include problems of referential integrity, tractable inference and interaction support. We review our approaches to these issues and discuss relevant related work.
 
Article
The IPAP Schizophrenia Algorithm was originally designed in the form of a flow-chart to help physicians optimise the treatment of schizophrenic patients in the spirit of guideline-based medicine. We take this algorithm as our starting point in investigating how artifacts of this sort can benefit from the facilities of high-quality ontologies. The IPAP algorithm exists thus far only in a form suitable for use by human beings. We draw on the resources of Basic Formal Ontology (BFO) in order to show how such an algorithm can be enhanced in such a way that it can be used in Semantic Web and related applications. We found that BFO provides a framework that is able to capture in a rigorous way all the types of entities represented in the IPAP Schizophrenia Algorithm in way which yields a computational tool that can be used by software agents to perform monitoring and control of schizophrenic patients. We discuss the issues involved in building an application ontology for this purpose, issues which are important for any Semantic Web application in the life science and healthcare domains.
 
Article
Preprint available at http://www.ida.liu.se/~patla00/publications.shtml ----------------------- Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web is ontologies. In recent years many biomedical ontologies have been developed and many of these ontologies contain overlapping information. To be able to use multiple ontologies they have to be aligned or merged. In this paper we propose a framework for aligning and merging ontologies. Further, we developed a system for aligning and merging biomedical ontologies (SAMBO) based on this framework. The framework is also a first step towards a general framework that can be used for comparative evaluations of alignment strategies and their combinations. In this paper we evaluated different strategies and their combinations in terms of quality and processing time and compared SAMBO with two other systems.
 
Article
For the effective alignment of ontologies, the subsumption mappings between the elements of the source and target ontologies play a crucial role, as much as equivalence mappings do. This paper presents the “Classification-Based Learning of Subsumption Relations” (CSR) method for the alignment of ontologies. Given a pair of two ontologies, the objective of CSR is to learn patterns of features that provide evidence for the subsumption relation among concepts, and thus, decide whether a pair of concepts from these ontologies is related via a subsumption relation. This is achieved by means of a classification task, using state of the art supervised machine learning methods. The paper describes thoroughly the method, provides experimental results over an extended version of benchmarking series of both artificially created and real world cases, and discusses the potential of the method.
 
Article
It has been a formidable task to achieve efficiency and scalability for the alignment between two massive, conceptually similar ontologies. Here we assume, an ontology is typically given in RDF (Resource Description Framework) or OWL (Web Ontology Language) and can be represented by a directed graph. A straightforward approach to the alignment of two ontologies entails an O(N2) computation by comparing every combination of pairs of nodes from given two ontologies, where N denotes the average number of nodes in each ontology. Our proposed algorithm called Anchor-Flood algorithm, boasting of computation on the average, starts off with an anchor, a pair of “look-alike” concepts from each ontology, gradually exploring concepts by collecting neighboring concepts, thereby taking advantage of locality of reference in the graph data structure. It outputs a set of alignments between concepts and properties within semantically connected subsets of two entire graphs, which we call segments. When similarity comparison between a pair of nodes in the directed graph has to be made to determine whether two given ontologies are aligned or not, we repeat the similarity comparison between a pair of nodes, within the neighborhood pairs of two ontologies surrounding the anchor iteratively until the algorithm meets that “either all the collected concepts are explored, or no new aligned pair is found”. In this way, we can significantly reduce the computational time for the alignment. Moreover, since we only focus on segment-to-segment comparison, regardless of the entire size of ontologies, our algorithm not only achieves high performance, but also resolves the scalability problem in aligning ontologies. Our proposed algorithm reduces the number of seemingly-aligned but actually misaligned pairs. Through several examples with large ontologies, we will demonstrate the features of our Anchor-Food algorithm.
 
Article
SWAN – a Semantic Web Application in Neuromedicine – is a project to develop an effective, integrated scientific knowledge infrastructure for Alzheimer Disease (AD) researchers, enabled by Semantic Web technology and deployed on Alzforum (www.alzforum.org), a scientific web community for AD research. This infrastructure may later be deployed for research communities in other neuromedical disorders. SWAN incorporates the full biomedical research knowledge lifecycle in its ontological model, including support for personal data organization, hypothesis generation, experimentation, lab data organization, and digital pre-publication collaboration. Community, laboratory, and personal digital resources may all be organized and interconnected using SWAN's common semantic framework.
 
Article
We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. We formalise the annotated language, the corresponding deductive system and address the query answering problem. Previous contributions on specific RDF annotation domains are encompassed by our unified reasoning formalism as we show by instantiating it on (i) temporal, (ii) fuzzy, and (iii) provenance annotations. Moreover, we provide a generic method for combining multiple annotation domains allowing to represent, e.g. temporally-annotated fuzzy RDF. Furthermore, we address the development of a query language -- AnQL -- that is inspired by SPARQL, including several features of SPARQL 1.1 (subqueries, aggregates, assignment, solution modifiers) along with the formal definitions of their semantics.
 
Article
Emergent knowledge does not come from a particular document or a particular knowledge source, but comes from a collection of documents or knowledge sources. This paper proposes a system which combines social web content and semantic web technology to process the emergent knowledge from the blogosphere. The proposed system regards blog postings as experiences of people on particular topics. By annotating postings in the selected domains with ontology vocabularies, the system collects experiences from various people into an ontology about people and experiences. The system processes this ontology with semantic rules to find the emergent knowledge. Users can access previously unavailable facts, concepts and trends which are emerging from social web content by using the proposed system.
 
Article
This paper describes CONFOTO, a browsing and annotation service1 for conference photos which combines current web trends such as data sharing and collaborative tagging with Resource Description Framework (RDF) system advantages like unrestricted aggregation and ontology re-use. Interactive user interface components ease information exploring and editing, W3C-recommended import and export interfaces and formats facilitate data integration and re-purposing.
 
Article
In this article we describe a Semantic Web application for semantic annotation and search in large virtual collections of cultural-heritage objects, indexed with multiple vocabularies. During the annotation phase we harvest, enrich and align collection metadata and vocabularies. The semantic-search facilities support keyword-based queries of the graph (currently 20 M triples), resulting in semantically grouped result clusters, all representing potential semantic matches of the original query. We show two sample search scenario’s. The annotation and search software is open source and is already being used by third parties. All software is based on established Web standards, in particular HTML/XML, CSS, RDF/OWL, SPARQL and JavaScript.
 
Article
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large-scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.
 
Article
The Semantic Web realization depends on the availability of a critical mass of metadata for the web content, associated with the respective formal knowledge about the world. We claim that the Semantic Web, at its current stage of development, is in a state of a critically need of metadata generation and usage schemata that are specific, well-defined and easy to understand. This paper introduces our vision for a holistic architecture for semantic annotation, indexing, and retrieval of documents with regard to extensive semantic repositories. A system (called KIM), implementing this concept, is presented in brief and it is used for the purposes of evaluation and demonstration. A particular schema for semantic annotation with respect to real-world entities is proposed. The underlying philosophy is that a practical semantic annotation is impossible without some particular knowledge modelling commitments. Our understanding is that a system for such semantic annotation should be based upon a simple model of real-world entity classes, complemented with extensive instance knowledge. To ensure the efficiency, ease of sharing, and reusability of the metadata
 
Article
The way that web services are currently being developed places them beside rather than within the existing World Wide Web. In this paper, we present an approach that combines the strength of the World Wide Web, viz. interlinked HTML pages for presentation and human consumption, with the strength of semantic web services, viz. support for semi-automatic composition and invocation of web services that have semantically heterogeneous descriptions. The objective we aim at eventually is that a human user e.g. a consultant or an administrator can seamlessly browse the existing World Wide Web and the emerging web services and that he can easily compose and invoke Web services on the fly.This paper presents our framework, OntoMat-Service, which trades off between having a reasonably easy to use interface for web services and the complexity of web service workflows. It is not our objective that everybody can produce arbitrarily complex workflows of web services with our tool, the OntoMat-Service-Browser. However, OntoMat-Service aims at a service web, where simple service flows are easily possible—even for the persons with not much technical background, while still allowing for difficult flows for the expert engineer.
 
Article
The success of the Semantic Web crucially depends on the easy creation, integration, and use of semantic data. For this purpose, we consider an integration scenario that defies core assumptions of current metadata construction methods. We describe a framework of metadata creation where Web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web. This leads us to the deep annotation of the database—directly by annotation of the logical database schema or indirectly by annotation of the Web presentation generated from the database contents. From this annotation, one may execute data mapping and/or migration steps, and thus prepare the data for use in the Semantic Web. We consider deep annotation as particularly valid because: (i) dynamic Web pages generated from databases outnumber static Web pages, (ii) deep annotation may be a very intuitive way to create semantic data from a database, and (iii) data from databases should remain where it can be handled most efficiently—in its databases. Interested users can then query this data directly or choose to materialize the data as RDF files.
 
Article
This paper discusses the issues involved in designing a query language for the Semantic Web and presents the OWL query language (OWL-QL) as a candidate standard language and protocol for query–answering dialogues among Semantic Web computational agents using knowledge represented in the W3Cs ontology web language (OWL). OWL-QL is a formal language and precisely specifies the semantic relationships among a query, a query answer, and the knowledge base(s) used to produce the answer. Unlike standard database and Web query languages, OWL-QL supports query–answering dialogues in which the answering agent may use automated reasoning methods to derive answers to queries, as well as dialogues in which the knowledge to be used in answering a query may be in multiple knowledge bases on the Semantic Web, and/or where those knowledge bases are not specified by the querying agent. In this setting, the set of answers to a query may be of unpredictable size and may require an unpredictable amount of time to compute.
 
Article
The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.
 
Conference Paper
Both OWL-DL and function-free Horn rules are decidable logics with interesting, yet orthogonal expressive power: from the rules perspective, OWL-DL is restricted to tree-like rules, but provides both existentially and universally quantified variables and full, monotonic negation. From the description logic perspective, rules are restricted to universal quantification, but allow for the interaction of variables in arbitrary ways. Clearly, a combination of OWL-DL and rules is desirable for building Semantic Web ontologies, and several such combinations have already been discussed. However, such a combination might easily lead to the undecidability of interesting reasoning problems. Here, we present a decidable such combination which is, to the best of our knowledge, more general than similar decidable combinations proposed so far. Decidability is obtained by restricting rules to so-called DL-safe ones, requiring each variable in a rule to occur in a non-DL-atom in the rule body. We show that query answering in such a combined logic is decidable, and we discuss its expressive power by means of a non-trivial example. Finally, we present an algorithm for query answering in SHIQ(D)\mathcal{SHIQ}(\mathbf{D}) extended with DL-safe rules based on the reduction to disjunctive datalog.
 
Article
The Semantic Web lacks support for explaining answers from web applications. When applications return answers, many users do not know what information sources were used, when they were updated, how reliable the source was, or what information was looked up versus derived. Many users also do not know how implicit answers were derived. The Inference Web (IW) aims to take opaque query answers and make the answers more transparent by providing infrastructure for presenting and managing explanations. The explanations include information concerning where answers came from (knowledge provenance) and how they were derived (or retrieved). In this article we describe an infrastructure for IW explanations. The infrastructure includes: IWBase — an extensible web-based registry containing details about information sources, reasoners, languages, and rewrite rules; PML — the Proof Markup Language specification and API used for encoding portable proofs; IW browser — a tool supporting navigation and presentations of proofs and their explanations; and a new explanation dialogue component. Source information in the IWBase is used to convey knowledge provenance. Representation and reasoning language axioms and rewrite rules in the IWBase are used to support proofs, proof combination, and Semantic Web agent interoperability. The Inference Web is in use by four Semantic Web agents, three of them using embedded reasoning engines fully registered in the IW. Inference Web also provides explanation infrastructure for a number of DARPA and ARDA projects.
 
Article
In this paper, we introduce a general overview of Falcon-AO: a practical ontology matching system with acceptable to good performance and a number of remarkable features. Furthermore, Falcon-AO is one of the best systems in all kinds of tests in the latest three years’ OAEI campaigns. Falcon-AO is written in Java, and is open source.
 
Article
The Grid's vision, of sharing diverse resources in a flexible, coordinated and secure manner through dynamic formation and disbanding of virtual communities, strongly depends on metadata. Currently, Grid metadata is generated and used in an ad hoc fashion, much of it buried in the Grid middleware's code libraries and database schemas. This ad hoc expression and use of metadata causes chronic dependency on human intervention during the operation of Grid machinery, leading to systems which are brittle when faced with frequent syntactic changes in resource coordination and sharing protocols.The Semantic Grid is an extension of the Grid in which rich resource metadata is exposed and handled explicitly, and shared and managed via Grid protocols. The layering of an explicit semantic infrastructure over the Grid Infrastructure potentially leads to increased interoperability and greater flexibility.In recent years, several projects have embraced the Semantic Grid vision. However, the Semantic Grid lacks a Reference Architecture or any kind of systematic framework for designing Semantic Grid components or applications. The Open Grid Service Architecture (OGSA) aims to define a core set of capabilities and behaviours for Grid systems. We propose a Reference Architecture that extends OGSA to support the explicit handling of semantics, and defines the associated knowledge services to support a spectrum of service capabilities. Guided by a set of design principles, Semantic-OGSA (S-OGSA) defines a model, the capabilities and the mechanisms for the Semantic Grid.We conclude by highlighting the commonalities and differences that the proposed architecture has with respect to other Grid frameworks.
 
Article
Latest research efforts on the semi-automatic coordination of ontologies “touch” on the mapping/merging of ontologies using the whole breadth of available knowledge. Addressing this issue, this paper presents the HCONE-merge approach, which is further extended towards automating the merging process. HCONE-merge makes use of the intended informal meaning of concepts by mapping them to WordNet senses using the Latent Semantic Indexing (LSI) method. Based on these mappings and using the reasoning services of description logics, HCONE-merge automatically aligns and then merges ontologies. Since the mapping of concepts to their intended meaning is an essential step of the HCONE-merge approach, this paper explores the level of human involvement required for mapping concepts of the source ontologies to their intended meanings. We propose a series of methods for ontology mapping (towards merging) with varying degrees of human involvement and evaluate them experimentally. We conclude that, although an effective fully automated process is not attainable, we can reach a point where ontology merging can be carried out efficiently with minimum human involvement.
 
Article
With the growing interests in semantic web services and context-aware computing, the importance of ontologies, which enable us to perform context-aware reasoning, has been accepted widely. While domain-specific and general-purpose ontologies have been developed, few attempts have been made for a situation ontology that can be employed directly to support activity-oriented context-aware services. In this paper, we propose an approach to automatically constructing a large-scale situation ontology by mining large-scale web resources, eHow and wikiHow, which contain an enormous amount of how-to instructions (e.g., “How to install a car amplifier”). The construction process is guided by a situation model derived from the procedural knowledge available in the web resources. Two major steps involved are: (1) action mining that extracts pairs of a verb and its ingredient (i.e., objects, location, and time) from individual instructional steps (e.g., <disconnect, ground cable>) and forms goal-oriented situation cases using the results and (2) normalization and integration of situation cases to form the situation ontology. For validation, we measure accuracy of the action mining method and show how our situation ontology compares in terms of coverage with existing large-scale ontology-like resources constructed manually. Furthermore, we show how it can be utilized for two applications: service recommendation and service composition.
 
myCampus architecture: a user's perspective -the smiley faces represent agents.  
Main steps involved in processing a query submitted to an e-Wallet. The main steps are as follows: 1. Asserting the query's context: As a first step, facts about the context of the query are asserted – namely they are loaded into the e-Wallet's inference engine for possible use as part of inferences to be made in processing the query. In our example, one such assertion is that " the sender of the query is Norman " . 2. Asserting elementary information needs and the need to go through an authorization process: Here the query is translated into an aggregate goal that includes (a) a combination of elementary information needs – in our example the need to find " Fabien's location " , along with (b) a requirement to go through an authorization process. The authorization process, which is distributed across some of the following steps, results in the request being either denied or cleared, the latter possibly following the application of obfuscation rules. In our example, the authorization goal requires checking that Norman is entitled to having access to Fabien's location and that the level of resolution at which the query is answered is compatible with Fabien's privacy preferences. 3. Pre-checking whether the query is allowable: A first check is performed to see whether the query is allowable based on access rights considerations. In our example, the e-Wallet checks whether Norman is allowed to inquire about Fabien's location. Fabien's e-Wallet might include a privacy preference specifying that his colleagues at work can see the building that he is in, when he is on campus, but that no one else should be given access to his location. In this first check, the e-Wallet might be able to determine that Norman is indeed a colleague of Fabien's – e.g. based on organizational knowledge stored in the static knowledge base of Fabien's e- Wallet. At this stage, because it has not yet determined whether Fabien is on campus or not, the  
Three-layer architecture  
High-level flows and processes in the e-Wallet
Article
Increasingly, application developers are looking for ways to provide users with higher levels of personalization that capture different elements of a user's operating context, such as her location, the task that she is currently engaged in, who her colleagues are, etc. While there are many sources of contextual information, they tend to vary from one user to another and also over time. Different users may rely on different location tracking functionality provided by different cell phone operators; they may use different calendar systems, etc. In this article, we describe work on a Semantic e-Wallet aimed at supporting automated identification and access of personal resources, each represented as a Semantic Web Service. A key objective is to provide a Semantic Web environment for open access to a user's contextual resources, thereby reducing the costs associated with the development and maintenance of context-aware applications. A second objective is, through Semantic Web technologies, to empower users to selectively control who has access to their contextual information and under which conditions. This work has been carried out in the context of myCampus, a context-aware environment aimed at enhancing everyday campus life. Empirical results obtained on Carnegie Mellon's campus are discussed.
 
Article
The Semantic Web Initiative envisions a Web wherein information is offered free of presentation, allowing more effective exchange and mixing across web sites and across web pages. But without substantial Semantic Web content, few tools will be written to consume it; without many such tools, there is little appeal to publish Semantic Web content. To break this chicken-and-egg problem, thus enabling more flexible information access, we have created a web browser extension called Piggy Bankthat lets users make use of Semantic Web content within Web content as users browse the Web. Wherever Semantic Web content is not available, Piggy Bank can invoke screenscrapers to restructure information within web pages into Semantic Web format. Through the use of Semantic Web technologies, Piggy Bank provides direct, immediate benefits to users in their use of the existing Web. Thus, the existence of even just a few Semantic Web-enabled sites or a few scrapers already benefits users. Piggy Bank thereby offers an easy, incremental upgrade path to users without requiring a wholesale adoption of the Semantic Web’s vision. To further improve this Semantic Web experience, we have created Semantic Bank, a web server application that lets Piggy Bank users share the Semantic Web information they have collected, enabling collaborative efforts to build sophisticated Semantic Web information repositories through simple, everyday’s use of Piggy Bank.
 
Article
We describe our method for benchmarking Semantic Web knowledge base systems with respect to use in large OWL applications. We present the Lehigh University Benchmark (LUBM) as an example of how to design such benchmarks. The LUBM features an ontology for the university domain, synthetic OWL data scalable to an arbitrary size, 14 extensional queries representing a variety of properties, and several performance metrics. The LUBM can be used to evaluate systems with different reasoning capabilities and storage mechanisms. We demonstrate this with an evaluation of two memory-based systems and two systems with persistent storage.
 
Article
The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces (NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this paper focuses on usability and investigates if NLIs and natural language query languages are useful from an enduser’s point of view. To that end, we introduce four interfaces each allowing a different query language and present a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality.
 
Article
Ontology mapping is the key point to reach interoperability over ontologies. In semantic web environment, ontologies are usually distributed and heterogeneous and thus it is necessary to find the mapping between them before processing across them. Many efforts have been conducted to automate the discovery of ontology mapping. However, some problems are still evident. In this paper, ontology mapping is formalized as a problem of decision making. In this way, discovery of optimal mapping is cast as finding the decision with minimal risk. An approach called Risk Minimization based Ontology Mapping (RiMOM) is proposed, which automates the process of discoveries on 1:1, n:1, 1:null and null:1 mappings. Based on the techniques of normalization and NLP, the problem of instance heterogeneity in ontology mapping is resolved to a certain extent. To deal with the problem of name conflict in mapping process, we use thesaurus and statistical technique. Experimental results indicate that the proposed method can significantly outperform the baseline methods, and also obtains improvement over the existing methods.
 
Article
Search on PCs has become less efficient than searching the Web due to the increasing amount of stored data. In this paper we present an innovative Desktop search solution, which relies on extracted metadata, context information as well as additional background information for improving Desktop search results. We also present a practical application of this approach—the extensible Beagle++ toolbox. To prove the validity of our approach, we conducted a series of experiments. By comparing our results against the ones of a regular Desktop search solution – Beagle – we show an improved quality in search and overall performance.
 
Article
In this paper, we describe TAP, an experimental system for identifying and researching many different of the different technical issues that lie on the path to achieving the vision of the Semantic Web. In particular, we address the issues of scalable query languages, sharing vocabularies, bootstrap knowledge bases, automated extraction of RDF from text and applications of the Semantic Web.
 
Article
The SLIF project combines text-mining and image processing to extract structured information from biomedical literature.SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., micrograph or gel). Fluorescence microscopy images are further processed and classified according to the depicted subcellular localization.The results of this process can be queried online using either a user-friendly web-interface or an XML-based web-service. As an alternative to the targeted query paradigm, SLIF also supports browsing the collection based on latent topic models which are derived from both the annotated text and the image data.The SLIF web application, as well as labeled datasets used for training system components, is publicly available at http://slif.cbi.cmu.edu.
 
Article
The FungalWeb Ontology seeks to support various data integration needs of enzyme biotechnology from inception to product roll. Serving as a knowledgebase for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme related intellectual property, enzyme retail and applications. The ontology, developed in the OWL language, is the result of the integration of numerous biological database schemas, web accessible text resources and components of existing ontologies. We assess the quantity of implicit knowledge in the FungalWeb Ontology by analyzing the range of tags in the OWL files and along with other description logic (DL) computable metrics of the ontology, contrast it with other publicly available bio-ontologies. Thereafter, we demonstrate how the FungalWeb Ontology supports its broad remit required in fungal biotechnology by (i) presenting application scenarios (ii) presenting the conceptualizations of the ontological frame able to support these scenarios and (iii) suggesting semantic queries typical of a fungal enzymologist involved in product development. Recognizing the complexity of the ontology query process for the non-technical manager we introduce a simplified query tool, Ontoligent Interactive Query (OntoIQ) that allows the user to browse and build queries from a selection of query patterns and ontology content. The OntoIQ interface supports users not familiar with writing DL syntax allowing them access to the ontology with expressive description logic reasoning tools. Finally we discuss the challenges encountered during the development of semantic infrastructure for fungal enzyme biotechnologists.
 
Article
In this paper we give an overview of the Foafing the Music system. The system uses the Friend of a Friend (FOAF) and RDF Site Summary (RSS) vocabularies for recommending music to a user, depending on the user’s musical tastes and listening habits. Music information (new album releases and reviews, podcast sessions, audio from MP3 blogs, related artists’ news, and upcoming gigs) is gathered from thousands of RSS feeds. The presented system provides music discovery by means of: user profiling (defined in the user’s FOAF description), context-based information (extracted from music related RSS feeds) and content-based descriptions (extracted from the audio itself), based on a common ontology (OWL DL) that describes the music recommendation domain. The system is available at: http://foafing-the-music.iua.upf.edu.
 
Top-cited authors
Bijan Parsia
  • The University of Manchester
Aditya Kalyanpur
Christian Bizer
  • Universität Mannheim
Frank Van Harmelen
  • Vrije Universiteit Amsterdam
Jens Lehmann
  • University of Bonn