Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Collaborative vocabulary development in the context of data integration is the process of finding consensus between experts with different backgrounds, system understanding and domain knowledge. The complexity of this process increases with the number of people involved, the variety of the systems to be integrated and the dynamics of their domain. In this paper, we advocate that the usage of a powerful version control system is the heart of the problem. Driven by this idea and the success of the version control system Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences, there are still important obstacles. These need to be considered in the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we are faced with during the collaborative creation of vocabularies and discusses its distinction to software development. Drawing from these findings, we present Git4Voc which comprises guidelines on how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the other hand, Version Control Systems (VCS), such as Subversion (SVN) or Git, are becoming increasingly popular for vocabulary development. In our previous work, we proposed Git4Voc [6], a set of best practices which transfer concepts of VCSs to vocabulary development, on the example of Git. We discovered that several aspects of vocabulary development—in particular with regard to revision management, access control, and some governance issues—are already well covered by Git-based version control, especially if developers follow the proposed best practices. ...
... In order to develop an integrated environment that supports the described round-trip development of vocabularies, corresponding requirements have to be identified and addressed accordingly. In our previous work on Git4Voc [6] , we identified eleven requirements that are crucial for the successful adaptation of Git to vocabulary development. We gathered these requirements by aggregating insights from the state of the art and our own experiences with developing the vocabularies MobiVoc and SCORVoc on GitHub. ...
... In the following, we briefly summarize the categories and requirements. For a more detailed description, please refer to the Git4Voc paper [6] and referenced works (i.e. [4, 8, 11, 13, 14, 17, 20] ). ...
Conference Paper
Full-text available
Vocabularies are increasingly being developed on platforms for hosting version-controlled repositories, such as GitHub. However, these platforms lack important features that have proven useful in vocabulary development. We present VoCol, an integrated environment that supports the development of vocabularies using Version Control Systems. VoCol is based on a fundamental model of vocabulary development, consisting of the three core activities modeling, population, and testing. We implemented VoCol using a loose coupling of validation, querying, analytics, visualization, and documentation generation components on top of a standard Git repository. All components, including the version-controlled repository, can be configured and replaced with little effort to cater for various use cases. We demonstrate the applicability of VoCol with a real-world example and report on a user study that confirms its usability and usefulness.
... a Besides providing sophisticated version control (e.g., based on Git), these platforms already integrate a number of features that are useful for distributed ontology development, such as user management, access control, change and issue tracking, comments, no- tifications, wiki pages for documentation, as well as branching support for parallel development processes. Many aspects of ontology development are therefore already well covered by these platforms, especially when following guidelines and best prac- tices as, for instance, those we have proposed with Git4Voc [6]. ...
... While most of these approaches support history tracking, their implementations only cover a subset of the functionalities that are offered by sophisticated version control systems like Git and that have proved useful in distributed ontology devel- opment [6]. Instead of building an isolated solution, TurtleEditor aims at enhancing the existing features of repository hosting platforms. ...
Article
Ontologies are increasingly being developed on web-based repository hosting platforms such as GitHub. Accordingly, there is a demand for ontology editors which can be easily connected to the hosted repositories. TurtleEditor is a web-based RDF editor that provides this capability and supports the distributed development of ontologies on repository hosting platforms. It offers features such as syntax checking, syntax highlighting, and auto completion, along with a SPARQL endpoint to query the ontology. Furthermore, TurtleEditor integrates a visual editing view that allows for the graphical manipulation of the RDF graph and includes some basic clustering functionality. The text and graph views are constantly synchronized so that all changes to the ontology are immediately propagated and the views are updated accordingly. The results of a user study and performance tests show that TurtleEditor can indeed be effectively used to support the distributed development of ontologies on repository hosting platforms.
... The BCO is developed in an iterative manner by involving technical experts on phases for requirements collection and domain conceptualization. Best practices for collaborative development [21] and guidelines [22] such as quality assurance, role definition, version labeling and naming conventions are utilized. Currently, BCO contains 29 classes, 15 object, 8 datatype, and 6 annotation properties, respectively. ...
Conference Paper
Full-text available
Current search engines are heavily optimized and excel on retrieving information based on a given set of keywords. The more sophisticated ones are extended to support searches based on the sentences or full text by calculation the similarity of the given query with the already stored entries. However, accessing the domain information using natural language queries is a challenging task. This is due to the myriad variety of "technical" terminology used to describe things. In this paper, we present a framework to automatically construct a knowledge graph based on the given domain specific information. It comprises various mechanisms to extract the relevant entities and their relations from the business cases. Graph- and learning-based methods are incorporated with the aim of improving the information retrieval. This allows users to quickly access desired result even using non-explicit terms or synonyms. Moreover, they are able to discover new links between business cases which may have not been directly encoded.
... Additionally, tagged versions are materialised as well. Halilaj et al. (2016) observe that some communitydriven datasets have already adopted GIT for their version management needs. The perspective of these authors is on vocabulary development, thus they propose a set of best practices to extend (when necessary) GIT to meet the requirements of collaborative vocabulary development. ...
... The development process usually requires significant efforts and knowledge, and the participation of different stakeholders who are geographically distributed [21]. One of the main challenges for the involved ontology engineers is to work collaboratively on a shared objective in a harmonic and efficient way, while avoiding misunderstandings, uncertainty, and ambiguity [14]. It is crucial in this process, tracking and propagating ontology changes to all contributors, and users should be able to synchronize changes with their work. ...
Chapter
The development of domain-specific ontologies requires joint efforts among different groups of stakeholders, such as knowledge engineers and domain experts. During the development processes, ontology changes need to be tracked and propagated across developers. Version Control Systems (VCSs) collect metadata describing changes and allow for the synchronization of different versions of the same ontology. Commonly, VCSs follow optimistic approaches to enable the concurrent modification of ontology artifacts, as well as conflict detection and resolution. For conflict detection, VCSs usually apply techniques where files are compared line by line. However, ontology changes can be serialized in different ways during the development process. As a consequence, existing VCSs may detect a large number of false-positive conflicts, i.e., conflicts that do not result from ontology changes but from the fact that two ontology versions are differently serialized. We developed SerVCS in order to enhance VCSs to cope with different serializations of the same ontology, following the principle of prevention is better than cure. SerVCS resorts on unique ontology serializations and minimizes the number of false-positive conflicts. It is implemented on top of Git, utilizing tools such as Rapper and RDF-toolkit for syntax validation and unique serialization, respectively. We conducted an empirical evaluation to determine the conflict detection accuracy of SerVCS whenever simultaneous changes to an ontology are performed using different ontology editors. Experimental results suggest that SerVCS allows VCSs to conduct more effective synchronization processes by preventing false-positive conflicts.
... Additionally, tagged versions are materialized as well. In [18], Halilaj et al. observe that some community-driven datasets have already adopted GIT for their version management needs. The perspective of these authors is on vocabulary development, thus they propose a set of best practices to extend (when necessary) GIT to meet the requirements of collaborative vocabulary development. ...
Conference Paper
Full-text available
The dynamic and distributed nature of the Semantic Web demands for methodologies and systems fostering collective participation to the evolution of datasets. In collaborative and iterative processes for dataset development, it is important to keep track of individual changes for provenance. Different scenarios may require mechanisms to foster consensus, resolve conflicts between competing changes, reversing or ignoring changes etc. In this paper, we perform a landscape analysis of version control for RDF datasets, emphasizing the importance of change reversion to support validation. Firstly, we discuss different representations of changes in RDF datasets and introduce higher-level perspectives on change. Secondly, we analyze diverse approaches to version control. We conclude by focusing on validation, characterizing it as a separate need from the mere preservation of different versions of a dataset.
... On the other hand, ​ Version Control Systems (VCS), such as ​ Subversion (SVN) or Git, are becoming increasingly popular for vocabulary development. In our previous work, we proposed Git4Voc [2], a set of best practices which transfer concepts of VCSs to vocabulary development, on the example of Git. Several aspects of vocabulary development, in particular with regard to revision management, access control, and some governance issues are already well covered by Git-based version control, especially if developers follow the proposed best practices. ...
Conference Paper
Full-text available
Vocabularies are increasingly being developed on platforms for hosting version-controlled repositories, such as GitHub. However, these platforms lack important features that have proven useful in vocabulary development. We present VoCol, an integrated environment that supports the development of vocabularies using Version Control Systems . VoCol is based on a fundamental model of vocabulary development, consisting of the three core activities modeling, population, and testing. It uses a loose coupling of validation, querying, analytics, visualization, and documentation generation components on top of a standard Git repository. All components, including the version-controlled repository, can be configured and replaced with little effort to cater for various use cases.
... This process, which requires significant efforts and knowledge, is often a collaborative one, involving many people, or even different teams, who are geographically distributed (Palma et al., 2011). The main challenge for the involved ontology engineers is to work collaboratively on a shared objective in a harmonic and efficient way, while avoiding misunderstandings , uncertainty, and ambiguity (Halilaj et al., 2016a). Tracking and propagating the changes made to the ontology to all contributors and thus allowing them to be synchronized with the work of each other is crucial in this process. ...
Conference Paper
Full-text available
A Version Control System (VCS) is usually required for successful ontology development in distributed settings. VCSs enable tracking and propagation of ontology changes, as well as collecting metadata to describe changes, e.g., who made a change at which point in time. Modern VCSs implement an optimistic approach that allows for simultaneous changes of the same artifact and provides mechanisms for automatic as well as manual conflict resolution. However, various ontology development tools may serialize the ontology artifacts differently. As a consequence, existing VCSs may identify a huge number of false-positive conflicts during the merging process, i.e., conflicts that do not result from code changes but the fact that two ontology versions are differently serialized. Following the principle of prevention is better than cure, we designed SerVCS, an approach that enhances VCSs to cope with different serializations of the same ontology. SerVCS is based on a unique serialization of ontologies to reduce the number of false-positive conflicts produced whenever different serializations of the same ontology are compared. We implemented SerVCS on top of Git, utilizing tools such as Rapper and RDF-Toolkit for syntax validation and unique serialization, respectively. We have conducted an empirical evaluation to determine the conflict detection accuracy of SerVCS whenever simultaneous changes to an ontology are performed using different ontology editors. The evaluation results suggest that SerVCS empowers VCSs by preventing them from wrongly identifying serialization related conflicts.
... The factory ontology is not supposed to be a fixed, monolithic schema, but rather a flexible, evolving and interlinked knowledge fabric. For this purpose, we have developed the collaborative vocabulary development methodology and support environment VoCol [6]. ...
Conference Paper
Full-text available
By connecting devices, people, vehicles and infrastructures everywhere in a city, governments and their partners can improve community wellbeing and other economic and financial aspects (e.g., cost and energy savings). Nonetheless, smart cities are complex ecosystems that comprise many different stakeholders (network operators, managed service providers, logistic centers...) who must work together to provide the best services and unlock the commercial potential of the IoT. This is one of the major challenges that faces today's smart city movement, and more generally the IoT as a whole. Indeed, while new smart connected objects hit the market every day, they mostly feed "vertical silos" (e.g., vertical apps, siloed apps...) that are closed to the rest of the IoT, thus hampering developers to produce new added value across multiple platforms. Within this context, the contribution of this paper is twofold: (i) present the EU vision and ongoing activities to overcome the problem of vertical silos; (ii) introduce recent IoT standards used as part of a recent Horizon 2020 IoT project to address this problem. The implementation of those standards for enhanced sporting event management in a smart city/government context (FIFA World Cup 2022) is developed, presented, and evaluated as a proof-of-concept.
Conference Paper
Full-text available
We introduce VocBench, an open source web application for editing thesauri complying with the SKOS and SKOS-XL standards. VocBench has a strong focus on collaboration, supported by workflow management for content validation and publication. Dedicated user roles provide a clean separation of competences, addressing different specificities ranging from management aspects to vertical competences on content editing, such as conceptualization versus terminology editing. Extensive support for scheme management allows editors to fully exploit the possibilities of the SKOS model, as well as to fulfill its integrity constraints. We discuss thoroughly the main features of VocBench, detail its architecture, and evaluate it under both a functional and user-appreciation ground, through a comparison with state-of-the-art and user questionnaires analysis, respectively. Finally, we provide insights on future developments.
Conference Paper
Full-text available
A major bottleneck for a wider deployment and use of ontologies and knowledge engineering techniques is the lack of established conventions along with cumbersome and inefficient support for vocabulary and ontology authoring. We argue, that the pragmatic development by convention paradigm well-accepted within software engineering, can be successfully applied for ontology engineering, too. However, the definition of a valid set of conventions requires broadly-accepted best-practices. In this regard, we empirically analyzed a number of popular vocabularies and ontology development efforts with respect to their use of guidelines and common practices. Based on this analysis, we identified the following main aspects of common practices: documentation, internationalization, naming, structure, reuse, validation and authoring. In this paper, these aspects are presented and discussed in detail. We propose a set of practices for each aspect and evaluate their relevance in a study with vocabulary developers. The overall goal is to pave the way for a new paradigm of vocabulary development similar to Software Development by Convention, which we name Vocabulary Development by Convention.
Conference Paper
Full-text available
The detection and presentation of changes between OWL ontologies (in the form of a diff) is an important service for ontology engineering, being an active research topic. In this paper, we present a diff tool that incorporates structural and semantic techniques in order to, firstly, distinguish effectual and ineffectual changes between ontolo-gies and, secondly, align and categorise those changes according to their impact. Such a categorisation of changes is shown to facilitate the nav-igation through, and analysis of change sets. The tool is made available as a web-based application, as well as a standalone command-line tool. Both of these output an XML change set file and a transformation into HTML, which allows users to browse through and focus on those changes of utmost interest using a web-browser.
Article
Full-text available
Building ontologies in a collaborative and increasingly community-driven fashion has become a central paradigm of modern ontology engineering. This understanding of ontologies and ontology engineering processes is the result of intensive theoretical and empirical research within the Semantic Web community, supported by technology developments such as Web 2.0. Over 6 years after the publication of the first methodology for collaborative ontology engineering, it is generally acknowledged that, in order to be useful, but also economically feasible, ontologies should be developed and maintained in a community-driven manner, with the help of fully-fledged environments providing dedicated support for collaboration and user participation. Wikis, and similar communication and collaboration platforms enabling ontology stakeholders to exchange ideas and discuss modeling decisions are probably the most important technological components of such environments. In addition, process-driven methodologies assist the ontology engineering team throughout the ontology life cycle, and provide empirically grounded best practices and guidelines for optimizing ontology development results in real-world projects. The goal of this article is to analyze the state of the art in the field of collaborative ontology engineering. We will survey several of the most outstanding methodologies, methods and techniques that have emerged in the last years, and present the most popular development environments, which can be utilized to carry out, or facilitate specific activities within the methodologies. A discussion of the open issues identified concludes the survey and provides a roadmap for future research and development in this lively and promising field.
Conference Paper
Full-text available
With the wider use of ontologies in the Semantic Web and as part of production systems, multiple scenarios for ontology maintenance and evo- lution are emerging. For example, successive ontology versions can be posted on the (Semantic) Web, with users discovering the new versions serendipitously; ontology-developmentinacollaborative environmentcanbesynchronousorasyn- chronous; managers of projects may exercise quality control, examining changes from previous baseline versions and accepting or rejecting them before a new baseline is published, and so on. In this paper, we present different scenarios for ontology maintenance and evolution that we have encountered in our own projects and in those of our collaborators. We define several features that cate- gorize these scenarios. For each scenario, we discuss the high-level tasks that an editing environment must support. We then present a unified comprehensive set of tools to support different scenarios in a single framework, allowing users to switch between different modes easily.
Conference Paper
Full-text available
Recently, the benefits of modular representations of ontolo- gies has been recognized by the semantic web community. Existing meth- ods for splitting up models into modules either optimize for complete- ness of local or for the eciency of distributed reasoning. In our work on semantics-based P2P systems, we are also concerned with the additional criteria of robustness or reasoning in cases where peers are unavailable and with ease of maintenance. We define a number of structural criteria for modularized ontologies and argue why these criteria are suitable for estimating eciency, robustness and maintainability. We apply the crite- ria to a number of modularization approaches and discuss the trade-os made. Based on the discussion we propose a general quality measure for modular representations in the context of our use case.
Conference Paper
Full-text available
Problems with large monolithical ontologies in terms of reusability, scalability and maintenance have led to an increasing interest in modularization techniques for ontologies. Currently, existing work suffers from the fact that the notion of modularization is not as well understood in the context of ontologies as it is in software engineering. In this paper, we experiment on applying state-of-the-art tools for ontology modularization in the context of a concrete application: the automatic selection of knowledge components to be used for Web page annotation and semantic browsing. We conclude that, in a broader context, an evaluation framework is required to guide the choice of a modularization tool, in accordance with the requirements of the considered application.
Conference Paper
Full-text available
We present ContentCVS, a system that implements a novel approach to facilitate the collaborative development of ontologies. Our approach adapts Concurrent Versioning, a successful paradigm in collaborative software development, to allow several developers to make changes concurrently to an ontology. Conflict detection and resolution are based on novel techniques that take into account the structure and semantics of the ontology versions to be reconciled by using precisely-defined notions of structural and semantic differences between ontologies and by extending existing ontology debugging and repair techniques. 1
Article
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding (i) linguistic information for data and vocabularies in different languages, (ii) mappings between data with labels in different languages, and (iii) services to dynamically access and traverse Linked Data across different languages. In this article, we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.
Conference Paper
Branching plays a major role in the development process of large software. Branches provide isolation so that multiple pieces of the software system can be modified in parallel without affecting each other during times of instability. However, branching has its own issues. The need to move code across branches introduces additional overhead and branch use can lead to integration failures due to conflicts or unseen dependencies. Although branches are used extensively in commercial and open source development projects, the effects that different branch strategies have on software quality are not yet well understood. In this paper, we present the first empirical study that evaluates and quantifies the relationship between software quality and various aspects of the branch structure used in a software project. We examine Windows Vista and Windows 7 and compare components that have different branch characteristics to quantify differences in quality. We also examine the effectiveness of two branching strategies – branching according to the software architecture versus branching according to organizational structure. We find that, indeed, branching does have an effect on software quality and that misalignment of branching structure and organiza-tional structure is associated with higher post-release failure rates.
Conference Paper
Scientists and researchers often use ontologies to describe their data, to share and integrate this data from heterogeneous sources. Ontologies are formal computer models that describe the main concepts and their relationships in a particular domain. Ontologies are usually authored by a community of users with different roles and levels of expertise. To support collaboration among distributed teams and to provision for distinct authoring requirements of each of the user roles and of individual users, we designed a configurable Web-based ontology editor, WebProtege. WebProtege extends Protege, a widely popular ontology editor with more than 150,000 registered users. The user interface layout and configuration for WebProtege is model-based and declarative: we represent it in a knowledge base, with an ontology defining its structure, and linking the interface configuration to the users, their roles, and access policies. We will discuss how the knowledge base driven configuration of the user interface supports the reuse and modularization of layout configurations. Such configuration is also highly flexible and extensible, and is easier to manage than many traditional approaches.
Conference Paper
Collaborative ontology engineering enables the creation of large complex ontologies. However, few projects successfully perform such multi-user ontology modeling. This paper gives an overview of ten dierent ontology engineering projects' infrastructure, architecture and workows. It especially focuses on issues regarding collaborative ontology modeling. The survey leads on to a discussion of the relative advantages and disadvantages of asynchronous and synchronous modalities of multi- user editing. This discussion highlights issues, trends and problems in the eld of multi-user ontology development.
Article
This paper describes our methodological and technological approach for collaborative ontology development in inter-organizational settings. It is based on the formalization of the collaborative ontology development process by means of an explicit editorial workflow, which coordinates proposals for changes among ontology editors in a flexible manner. This approach is supported by new models, methods and strategies for ontology change management in distributed environments: we propose a new form of ontology change representation, organized in layers so as to provide as much independence as possible from the underlying ontology languages, together with methods and strategies for their manipulation, version management, capture, storage and maintenance, some of which are based on existing proposals in the state of the art. Moreover, we propose a set of change propagation strategies that allow keeping distributed copies of the same ontology synchronized. Finally, we illustrate and evaluate our approach with a test case in the fishery domain from the United Nations Food and Agriculture Organisation (FAO). The preliminary results obtained from our evaluation suggest positive indication on the practical value and usability of the work here presented.
Collaborative ontology development on the (semantic) web, in Symbiotic Relationships between Semantic Web and Knowledge Engineering
  • N F Noy
  • T Tudorache
N. F. Noy and T. Tudorache, Collaborative ontology development on the (semantic) web, in Symbiotic Relationships between Semantic Web and Knowledge Engineering, Papers from the 2008 AAAI Spring Symposium, Technical report SS-08-07, March 26-28, 2008, Stanford, California, USA, 2008, pp. 63-68.
The neon methodology for ontology engineering
  • M C Suarez-Figueroa
  • A Omez-P Erez
  • M Fernandez-Lopez
M. C. Suarez-Figueroa, A. G omez-P erez and M. Fernandez-Lopez, The neon methodology for ontology engineering, in Ontology Engineering in a Networked World, 2012, pp. 9-34.
Branching and merging: An investigation into current version control practices
  • S Phillips
  • J Sillito
  • R Walker
S. Phillips, J. Sillito and R. Walker, Branching and merging: An investigation into current version control practices, in Proc. 4th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2011), May 21, 2011, Waikiki, Honolulu, HI, USA, 2011, pp. 9-15.
Characterizing modular ontologies
  • S B Abb Es
  • A Scheuermann
  • T Meilender
  • M D'aquin
S. B. Abb es, A. Scheuermann, T. Meilender and M. d'Aquin, Characterizing modular ontologies, in Proceedings of the 6th International Workshop on Modular Ontologies, July 24, 2012, Graz, Austria, 2012, pp. 13-25.
Versioning owl ontologies using temporal tags
  • P Bedi
  • S Marwaha
P. Bedi and S. Marwaha, Versioning owl ontologies using temporal tags, International Journal of Computer, Control, Quantum and Information Engineering, 2007.
Versioning vocabularies in a linked data world, IFLA
  • D I Hillmann
  • G Dunsire
  • J Phipps
D. I. Hillmann, G. Dunsire and J. Phipps, Versioning vocabularies in a linked data world, IFLA, 2014.
Owl2vcs: Tools for distributed ontology development
  • I Zaikin
  • A Tuzovsky
I. Zaikin and A. Tuzovsky, Owl2vcs: Tools for distributed ontology development, in Proceedings of the 10th International Workshop on OWL: Experiences and Directions (OWLED 2013) co-located with 10th Extended Semantic Web Conference (ESWC 2013), May 26-27, 2013, Montpellier, France, 2013.