Article

How Ontologies are Made: Studying the Hidden Social Dynamics Behind Collaborative Ontology Engineering Projects

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a whole arsenal of ontology-evaluation techniques that investigate the quality of ontologies as a product. In this paper, we aim to shed light on the process of ontology engineering construction by introducing and applying a set of measures to analyze hidden social dynamics. We argue that especially for ontologies which are constructed collaboratively, understanding the social processes that have led to its construction is critical not only in understanding but consequently also in evaluating the ontology. With the work presented in this paper, we aim to expose the texture of collaborative ontology engineering processes that is otherwise left invisible. Using historical change-log data, we unveil qualitative differences and commonalities between different collaborative ontology engineering projects. Explaining and understanding these differences will help us to better comprehend the role and importance of social factors in collaborative ontology engineering projects. We hope that our analysis will spur a new line of evaluation techniques that view ontologies not as the static result of deliberations among domain experts, but as a dynamic, collaborative and iterative process that needs to be understood, evaluated and managed in itself. We believe that advances in this direction would help our community to expand the existing arsenal of ontology evaluation techniques towards more holistic approaches.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In particular, one of the gaps in the literature regards the processes leading to the creation of knowledge. In their work about collaborative ontology development e↵orts (Strohmaier et al., 2013) distinguish the approaches evaluating the quality of ontologies as a product from those from those investigating the processes leading to their creation. An analysis of the latter, especially if combined with an evaluation of an ontology as product, is beneficial to the understanding of its quality and helps single out the areas that are likely to be contentious or problematic. ...
... In the remainder of this chapter, we provide an overview of the main areas of study concerning open knowledge collaboration and online communities. Following the categorisations proposed by Benkler et al. (2015) and Malinen (2015) and the aspects of online collaboration highlighted by Strohmaier et al. (2013), we identified the following areas: ...
... We include in the social and organisational characteristics of online communities all aspects that concern participation, governance, collaboration, and distribution of work within a system (Strohmaier et al., 2013). An extensive body of literature has been dedicated to the di↵erent facets of the subject. ...
Thesis
Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users. These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality. Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy. This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms.
... These studies have used the data provided by logs of user activity in collaborative ontology development tools. Strohmaier et al. [12] conducted an empirical investigation using user activity logs to measure the impact of collaboration on ontology-engineering projects. The authors developed several new metrics to quantify different aspects of the hidden social dynamics that take place in these collaborative ontology-engineering projects from the biomedical domain. ...
... Due to In the PolygOnto visualization (Figure 9), studies that are annotated with 2 or more EFO classes are represented using polygons, whereas studies that are annotated with a single EFO class are summarized as the adjacent (orange) circles. We observe that no study has been annotated with classes in the lower levels (≈ 1, 000 classes from levels [12][13][14][15][16], or upper levels (that may originate from the Basic Formal Ontology) of the EFO hierarchy. This observation also reflects the results in Section 5.3, where we found that the lower levels of the ontological hierarchy in some of the highly-accessed ontologies are rarely (if at all) explored by users. ...
Article
Full-text available
Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies are explored, queried, reused, and used in downstream applications by biomedical researchers. We present an exploratory empirical analysis of user activities in the BioPortal ontology repository by analyzing BioPortal interaction logs across different access modes over several years. We investigate how users of BioPortal query and search for ontologies and their classes, how they explore the ontologies, and how they reuse classes from different ontologies. Additionally, through three real-world scenarios, we not only analyze the usage of ontologies for annotation tasks but also compare it to the browsing and querying behaviors of BioPortal users. For our investigation, we use several different visualization techniques. To inspect large amounts of interaction, reuse, and real-world usage data at a glance, we make use of and extend PolygOnto, a visualization method that has been successfully used to analyze reuse of ontologies in previous work. Our results show that exploration, query, reuse, and actual usage behaviors rarely align, suggesting that different users tend to explore, query and use different parts of an ontology. Finally, we highlight and discuss differences and commonalities among users of BioPortal.
... Jelen tanulmányhoz összegyűjtöttem 119 darab (1. ábra) jelen-leg kapható és jellemzően alkalmazott hőszigetelő anyag anyagtulajdonságait (a gyártók hivatalos honlapjáról származó műszaki adatlapokról [17,18,19,20,21,22,23] ...
... A Java környezetre épített, jól kezelhető grafikus felülettel rendelkező tudásreprezentációs szoftver az OWL (Web Ontology Language) szabványra épül. Az applikáció főbb attribútumait és szélesebb körben alkalmazott ontológiákat tárgyalják a [19][20][21]. Az építéstudományi kutatások során is előtérbe kerültek a különféle közösségi hálók tudásreprezentációs és -megosztó képességei [22,23]. ...
... Jelen tanulmányhoz összegyűjtöttem 119 darab (1. ábra) jelen-leg kapható és jellemzően alkalmazott hőszigetelő anyag anyagtulajdonságait (a gyártók hivatalos honlapjáról származó műszaki adatlapokról [17,18,19,20,21,22,23] ...
... A Java környezetre épített, jól kezelhető grafikus felülettel rendelkező tudásreprezentációs szoftver az OWL (Web Ontology Language) szabványra épül. Az applikáció főbb attribútumait és szélesebb körben alkalmazott ontológiákat tárgyalják a [19][20][21]. Az építéstudományi kutatások során is előtérbe kerültek a különféle közösségi hálók tudásreprezentációs és -megosztó képességei [22,23]. ...
Conference Paper
Full-text available
Szakirodalmi információk alapján ismeretes, hogy a nano-kerámia hőszigetelő bevonatokban a hő terjedésének folyamata nem úgy megy végbe, ahogy hagyományos anyagok esetében a bennük található nano méretű kerámiagömböcskék belső felületének hővisszaverő képességének köszönhetően. A gyártók termékismertetői és a tudományos szakirodalom különféle műszaki paramétereket közöl ezen anyagokkal kapcsolatban, melyek közül elsősorban a hőtechnikai jellemzőkre találunk ellentmondásos adatokat. A győri Széchenyi István Egyetem Építész-, Építő és Közlekedésmérnöki Karának Építészeti és Épületszerkezettani Tanszékéhez tartozó Építőanyagvizsgálati és Épületfizikai Laboratóriumban kísérleteket végeztem, hogy ellenőrizzem ezeket a paramétereket, különös tekintettel a testsűrűségre, hőszigetelő képességre és vízfelvételre. A testsűrűség-vizsgálatok eredményei (átlagos testsűrűség nedves állapotban 533,01 kg/m3, légszáraz állapotban 370,28 kg/m3) illeszkedtek a megadott értéktartományokba. A hőszigetelő képesség vizsgálata azonban több nehézségbe ütközött (pl. mérőműszer méréstartománya), ezért a jó hőszigetelő képesség és a hővisszaverő képesség bizonyítására két módszert is alkalmaztam. Az egyik módszert alkalmazva, mely különféle hagyományos hőszigetelő anyagok felületére felhordott nano-kerámia hőszigetelő bevonatos minták vizsgálatain alapult, nem sikerült bizonyítékot találni az anyag ezen tulajdonságára. Az MSZ EN 12667:2001 alapján azonban sikerült közvetlenül megmérni az anyag hővezetési tényezőjét, mely érték (0,0690 W/mK) teljes mértékben eltért a gyártók és a szakirodalom által közölt adatoktól. Az anyag jó hőszigetelő képességét tehát nem az alacsony hővezetési tényezője okozza, hanem valami más jelenség (felületi hőátadási tényező). Ennek tanulmányozására irányuló vizsgálatok már vannak, de további kísérletekre van szükség. A hőszigetelő képesség és a nedvességtartalom közti összefüggést vizsgálva megállapítást nyert, hogy létezik egy 12 m/m% nedvességtartalom határ, amely érték alatt az anyag hőszigetelő képessége nem változik, felette viszont a nedvességtartalom és a hőszigetelő képesség egyenes arányban nő. Az MSZ EN 12087:2013 szabvány alapján végzett hosszú idejű, teljes vízbemerítéses vízfelvétel vizsgálatok alapján megállapíthatjuk, hogy az anyag 28 napos vízfelvétele 28,81 m/m%. A vízfelvétel időbeli alakulását is figyelembe véve megállapíthatjuk, hogy 24 óra után az anyag vízfelvétele teljesen egyenletes. 28 nap után nem telítődik, hanem ugyanolyan intenzitással tovább növekszik és még 105 nap után sem éri el a súlyállandóságig telített állapotot.
... A knowledge domain represented by formally in a conceptualization: the objects, concepts and other entities that presumed to exist in some area of interest and the relationships between them. In the Semantic Web, ontologies are a key component for knowledge modeling through the interoperability between different systems and reuse of existing knowledge in new systems [18]. On the other hand, in the area of Artificial Intelligence, a repository of information is used with a structure that allows interpretation to the data to make inferences within a learning process allowing the acquisition of know-ledge automatically. ...
... The proposed methodology integrates provided by Noy and McGuinness [18], which state that you must first determine the scope, and extent of ontology considering whether there is a similar ontology that reused. Subsequently shall list the important terms to define classes and their properties and their hierarchy. ...
Article
Full-text available
The process of identifying the attributes and relationships considered in an ontology is a complex task because there are many factors involved in the deterioration of environmental quality, the diversity of sources and data dispersion. This work presents an ontology that integrates the data required by an Environmental Quality Synoptic System (EQSS), which to date scatters in different Internet sites and concentrates by different agencies for example INEGI, CONABIO, SEMARNAT, CNA, among others. The methodology process consists of the collection of environmental information in Mexico through the application of computational techniques resulting ontology with environmental knowledge that will be processed by the system EQSS. Among the main advantages is than the selection and structure of information allow the automated generation of results in an environmental statement. The ontology proposal is based on knowledge of EQSS system that is based on the architecture of expert systems and through this important information for decision-making in regard to environmental quality and interaction with Geographic Information System (GIS) is obtained.
... Research often focus on projects in which people with a dedicated education and organisational involvement are allowed to make changes to knowledge bases. Although these knowledge bases are large (e.g. the ICD-11 ontology consists of over 33,000 classes), the number of people involved in the development is small (ranging from 5 to 76 contributors [27]). In addition, the process that leads to the creation of these knowledge bases is admittedly collaborative, but not everybody can participate. ...
... The best way to understand the semantic contribution of edits is to match them with a RDF representation from Wikidata (as created by Erxleben et al. [8]). Strohmaier et al. [27] show that existing semantic relations in a structured knowledge base influence the way in which this knowledge base is edited. Context information such as this can also be used to reconstruct contributor sessions in order to determine semantic editing paths. ...
Conference Paper
Full-text available
Wikidata promises to reduce factual inconsistencies across all Wikipedia language versions. It will enable dynamic data reuse and complex fact queries within the world's largest knowledge database. Studies of the existing participation patterns that emerge in Wikidata are only just beginning. What delineates most of the contributions in the system has not yet been investigated. Is it an inheritance from the Wiki-pedia peer-production system or the proximity of tasks in Wikidata that have been studied in collaborative ontology engineering? As a first step to answering this question, we performed a cluster analysis of participants' content editing activities. This allowed us to blend our results with typical roles found in peer-production and collaborative ontology engineering projects. Our results suggest very specialised contributions from a majority of users. Only a minority, which is the most active group, participate all over the project. These users are particularly responsible for developing the conceptual knowledge of Wikidata. We show the alignment of existing algorithmic participation patterns with these human patterns of participation. In summary, our results suggest that Wikidata rather supports peer-production activities caused by its current focus on data collection. We hope that our study informs future analyses and developments and, as a result, allows us to build better tools to support contributors in peer-production-based ontology engineering.
... Research often focus on projects in which people with a dedicated education and organisational involvement are allowed to make changes to knowledge bases. Although these knowledge bases are large (e.g. the ICD-11 ontology consists of over 33,000 classes), the number of people involved in the development is small (ranging from 5 to 76 contributors [27]). In addition, the process that leads to the creation of these knowledge bases is admittedly collaborative, but not everybody can participate. ...
... The best way to understand the semantic contribution of edits is to match them with a RDF representation from Wikidata (as created by Erxleben et al. [8]). Strohmaier et al. [27] show that existing semantic relations in a structured knowledge base influence the way in which this knowledge base is edited. Context information such as this can also be used to reconstruct contributor sessions in order to determine semantic editing paths. ...
Conference Paper
Full-text available
Wikidata promises to reduce factual inconsistencies across all Wikipedia language versions. It will enable dynamic data reuse and complex fact queries within the world’s largest knowledge database. Studies of the existing participation patterns that emerge in Wikidata are only just beginning. What delineates most of the contributions in the system has not yet been investigated. Is it an inheritance from the Wikipedia peer-production system or the proximity of tasks in Wikidata that have been studied in collaborative ontology engineering? As a first step to answering this question, we performed a cluster analysis of participants’ content editing activities. This allowed us to blend our results with typical roles found in peer-production and collaborative ontology engineering projects. Our results suggest very specialised contributions from a majority of users. Only a minority, which is the most active group, participate all over the project. These users are particularly responsible for developing the conceptual knowledge of Wikidata. We show the alignment of existing algorithmic participation patterns with these human patterns of participation. In summary, our results suggest that Wikidata rather supports peer-production activities caused by its current focus on data collection. We hope that our study informs future analyses and developments and, as a result, allows us to build better tools to support contributors in peer-production-based ontology engineering.
... Ontology change logs provide an extremely rich source of information. We and other investigators have used change data from ontologies to measure the level of community activities in biomedical ontologies [14], to migrate data from an old version of an ontology to a new one [12], and to analyze user roles in the process of collaboration [6,7,23,26]. ...
... For example, we have demonstrated that we can use the change data to assess the level of stabilization in ontology content [26], to find implicit user roles [7], and to describe the collaboration qualitatively [23]. For example, we found that changes to ICD-11 tend to propagate along the class hierarchy: A user who alters a property value for a class is significantly more likely to make a change to a property value for a subclass of that class than to make an edit anywhere else in the ontology [19]. ...
Article
Full-text available
The development of real-world ontologies is a complex undertaking, commonly involving a group of domain experts with different expertise that work together in a collaborative setting. These ontologies are usually large scale and have complex structures. To assist in the authoring process, ontology tools are key at making the editing process as streamlined as possible. Being able to predict confidently what the users are likely to do next as they edit an ontology will enable us to focus and structure the user interface accordingly and to facilitate more efficient interaction and information discovery. In this paper, we use data mining, specifically the association rule mining, to investigate whether we are able to predict the next editing operation that a user will make based on the change history. We simulated and evaluated continuous prediction across time using sliding window model. We used the association rule mining to generate patterns from the ontology change logs in the training window and tested these patterns on logs in the adjacent testing window. We also evaluated the impact of different training and testing window sizes on the prediction accuracies. At last, we evaluated our prediction accuracies across different user groups and different ontologies. Our results indicate that we can indeed predict the next editing operation a user is likely to make. We will use the discovered editing patterns to develop a recommendation module for our editing tools, and to design user interface components that better fit with the user editing behaviors.
... Pöschko et al. [16], and Walk et al. [17] have created Prag-matiX, a tool to browse an ontology and aspects of its history visually, which provides quantitative insights into the creation process, and applied it to the ICD-11 project. Strohmaier et al. [18] investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provide new metrics to quantify various aspects of the collaborative engineering processes. ...
... Pesquita and Couto [20] investigated if the location and specific structural features can be used to determine if and where the next change is going to occur in the Gene Ontology 3 . Strohmaier et al. [18] investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provides new metrics to quantify various aspects of the collaborative engineering processes. Wang et al. [21] have used association-rule mining to analyze user editing patterns in collaborative ontology-engineering projects. ...
Article
Full-text available
With the growing popularity of large-scale biomedical collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, new methods and insights are needed to help project- and community-managers to cope with the constantly growing complexity of such projects. In this paper we present a novel application of Markov Chains on the change-logs of collaborative ontology-engineering projects to extract and analyze sequential patterns. This method also allows to investigate memory and structure in human activity patterns when collaboratively creating an ontology by leveraging Markov Chain models of varying orders. We describe all necessary steps for applying the methodology to collaborative ontology-engineering projects and provide first results for the International Classification of Diseases in its 11th revision. Furthermore, we show that the collected sequential-patterns provide actionable information for community- and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that the adaption of the presented methodology will spur a new line of ontology-development tools and evaluation-techniques, which concentrate on the interactive nature of the collaborative ontology-engineering process.
... One of the first implementations of this type of approach was PromptDiff [16], where changes are computed by inputting the two versions of the ontology and applying a set of rules. The outcomes are presented as instances of the Changes and Annotation Ontology (ChAO) [20], but this ontology was designed to facilitate collaborations between humans and to generate a manual annotation of users' actions (e.g. whether s/he accepted a change and providing a way of commenting on why), a lot of information was expressed in the comments, and types of changes were limited to just a few classes. ...
Conference Paper
Full-text available
Ontologies have been widely adopted to represent domain knowledge. The dynamic nature of knowledge requires frequent changes in the ontologies to keep them up-to-date. Understanding and managing these changes and their impact on other artefacts become important for the semantic web community, due to the growing volume of data annotated with ontologies and the limited documentation describing their changes. In this paper, we present a method to automatically detect and classify the changes between different versions of ontologies. We also built an ontology of changes (DynDiffOnto) that we use to classify the changes and provide a context that makes them more comprehensible for machines and humans. We evaluate the algorithm with different ontologies from the biomedical domain (i.e. ICD9-CM, MeSH, NCIt, SNOMED-CT, GO, IOBC, and CIDO) and we compare the results with COnto-Diff. We observed that for small ontologies, COnto-Diff computes the Diff faster, but the opposite is observed for larger ontologies. The higher granularity of DynDiffOnto requires more rules to compute the Diff. It can partially justify the lower performance for small ontologies, but DynDiff provides a richer documentation for end-users.
... As more ontology projects have been set up, researchers have begun to investigate collaborative ontology engineering empirically [23]. Initially, most work involved small user or case studies that aimed to validate or collect feedback on a specific methodology or tool [18]. ...
Chapter
Full-text available
We present a systematic analysis of participation and interactions within the community behind schema.org, one of the largest and most relevant ontology engineering projects in recent times. Previous work conducted in this space has focused on ontology collaboration tools, and the roles that different contributors play within these projects. This paper takes a broader view and looks at the entire life cycle of the collaborative process to gain insights into how new functionality is proposed and accepted, and how contributors engage with one another based on real-world data. The analysis resulted in several findings. First, the collaborative ontology engineering roles identified in previous studies with a much stronger link to ontology editors apply to community interaction contexts as well. In the same time, the participation inequality is less pronounced than the 90-9-1 rule for Internet communities. In addition, schema.org seems to facilitate a form of collaboration that is friendly towards newcomers, whose concerns receive as much attention from the community as those of their longer-serving peers.
... While identifying PHI for removal or anonymization remains an open challenge, simply redacting texts overlooks one of the more fundamental aspects of recent biomedical informatics, which has incorporated a focus on ontology-driven development (Mortensen et al., 2012;Ye et al., 2009;Tao et al., 2013;Sari et al., 2013;Omran et al., 2009;Lumsden et al., 2011;Pathak et al., 2009). In a domain like healthcare -where information is dense, diverse, and specialized -an ontology allows representing knowledge in a usable manner, because it describes a framework for clearly defining known terms and their relationships (Hakimpour and Geppert, 2005;Lee et al., 2006;Pieterse and Kourie, 2014;Strohmaier et al., 2013;Kapoor and Sharma, 2010). Once the data has been formally described via an ontology, new applications become apparent. ...
... To learn more about the impact and quality of collaboration in collaborative ontology-engineering projects, Strohmaier et al. [26] investigated the hidden social dynamics that take place in such projects from the biomedical domain and provided new metrics to quantify various aspects of the collaborative engineering processes. User roles have been also studied from many different angles. ...
Article
Full-text available
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository containing more than 500 biomedical ontologies. In 2016, BioPortal represents one of the largest portals for exploration of semantic biomedical vocabularies and terminologies, which is used by many researchers and practitioners. While usage of this portal is high, we know very little about how exactly users search and explore ontologies and what kind of usage patterns or user groups exist in the first place. Deeper insights into user behavior on such portals can provide valuable information to devise strategies for a better support of users in exploring and finding existing ontologies, and thereby enable better ontology reuse. To that end, we study and group users according to their browsing behavior on BioPortal and use data mining techniques to characterize and compare exploration strategies across ontologies. In particular, we were able to identify seven distinct browsing types, all relying on different functionality provided by BioPortal. For example, Search Explorers extensively use the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy for exploring ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website. Our results may guide the development of more user-oriented systems for ontology exploration on the Web.
... Over the last decade, special attention has been focused on ontologies and their use in applications in the field of education, knowledge management end data integration. Ontology engineering is a set of actions directed toward the development of (Strohmaier et al. 2013). On that basis, the ontology engineering is a key aspect for improving existing tutoring system. ...
Chapter
Regardless of used methodology, central problem in creating web-based educational systems and taking benefits from their wide use is the fact that the current approaches are rather inflexible and inefficient. Design of such systems must be directed to allow reuse or sharing of content, knowledge, and functional components of those systems. According to techniques and methodologies presented in previous chapters, it is possible to develop modern personalized educational system and fully use benefits that Semantic Web technologies offer. In this chapter, general tutoring model is presented that allows building the personalized courses from various domains. This chapter presents architecture of a general tutoring system whose components are modelled and implemented using Semantic Web technologies. Presented tutoring system framework offers options to build, organize and update specific learning resources (educational materials, learner profiles, learning path through materials, and so on.).
... Walk et al. (2014) observed the edit sequences in such processes. Strohmaier et al. (2013) investigated the way ontologies are collaboratively created analyzing dynamism, social aspects, lexis and behaviour. They found weak forms of collaboration among users and identified that the users with higher degree of contribution were the most central users. ...
Article
Full-text available
Our goal with this research manifesto is to define a roadmap to guide the evolution of the new research field that is emerging at the intersection between crowdsourcing and the Semantic Web. We analyze the confluence of these two disciplines by exploring their relationship. First, we focus on how the application of crowdsourcing techniques can enhance the machine-driven execution of Semantic Web tasks. Second, we look at the ways in which machine-processable semantics can benefit the design and management of crowdsourcing projects. As a result, we are able to describe a list of successful or promising scenarios for both perspectives, identify scientific and technological challenges, and compile a set of recommendations to realize these scenarios effectively. This research manifesto is an outcome of the Dagstuhl Seminar 14282: Crowdsourcing and the Semantic Web.
... Strohmaier et al. [40] investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provided new metrics to quantify various aspects of the collaborative engineering processes. Falconer et al. [41] investigated the change-logs of collaborative ontology-engineering projects, showing that contributors exhibit specific roles, which can be used to group and classify these users, when contributing to the ontology. ...
Article
With the growing popularity of large-scale collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, we need new methods and insights to help project- and community-managers to cope with the constantly growing complexity of such projects. In this paper, we present a novel application of Markov chains to model sequential usage patterns that can be found in the change-logs of collaborative ontology-engineering projects. We provide a detailed presentation of the analysis process, describing all the required steps that are necessary to apply and determine the best fitting Markov chain model. Amongst others, the model and results allow us to identify structural properties and regularities as well as predict future actions based on usage sequences. We are specifically interested in determining the appropriate Markov chain orders which postulate on how many previous actions future ones depend on. To demonstrate the practical usefulness of the extracted Markov chains we conduct sequential pattern analyses on a large-scale collaborative ontology-engineering dataset, the International Classification of Diseases in its 11th revision. To further expand on the usefulness of the presented analysis, we show that the collected sequential patterns provide potentially actionable information for user-interface designers, ontology-engineering tool developers and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that presented work will spur a new line of ontology-development tools, evaluation-techniques and new insights, further taking the interactive nature of the collaborative ontology-engineering process into consideration.
... Protégé at the beginning was viewed merely as a means to an end. More recent grants, however, have supported our original research to study collaborative ontology development (Tudorache et al., 2011;Strohmaier et al., 2013), methods for ontology alignment Ghazvinian et al., 2009), ontology evaluation through crowdsourcing (Mortensen et al., 2014), ontology visualization (Storey et al., 2001), and methods for the very early stages of ontology conceptualization (Zhang et al., 2015). These academic pursuits are what make our laboratory an exciting place for students and post-docs, and they are what keep the research funding flowing. ...
Article
Full-text available
... Strohmaier et al. [14] conducted an empirical analysis to investigate the hidden social dynamics that take place when editors develop an ontology, and provided new metrics to quantify various aspects of the engineering processes. Falconer et al. [5] did a change-log analysis of different ontology-engineering projects, showing that contributors exhibit specific roles, which can be used to group and classify these users. ...
Conference Paper
Full-text available
Ontologies are complex intellectual artifacts and creating them requires significant expertise and effort. While existing ontology-editing tools and methodologies propose ways of building ontologies in a normative way, empirical investigations of how experts actually construct ontologies " in the wild " are rare. Yet, understanding actual user behavior can play an important role in the design of effective tool support. Although previous empirical investigations have produced a series of interesting insights, they were exploratory in nature and aimed at gauging the problem space only. In this work, we aim to advance the state of knowledge in this domain by systematically defining and comparing a set of hypotheses about how users edit ontologies. Towards that end, we study the user editing trails of four real-world ontology-engineering projects. Using a coherent research framework, called HypTrails, we derive formal definitions of hypotheses from the literature, and systematically compare them with each other. Our findings suggest that the hierarchical structure of an ontology exercises the strongest influence on user editing behavior, followed by the entity similarity, and the semantic distance of classes in the ontology. Moreover, these findings are strikingly consistent across all ontology-engineering projects in our study, with only minor exceptions for one of the smaller datasets. We believe that our results are important for ontology tools builders and for project managers, who can potentially leverage this information to create user interfaces and processes that better support the observed editing patterns of users.
... The ontology engineering process is examined in Strohmaier et al. [29]. The researchers describe four aspects of ontology development dynamic, social, lexical and behavioral. ...
Article
Full-text available
Benefits Management provides an established approach for decision making and value extraction for IT/IS investments and, can be used to examine cloud computing investments. The motivation for developing an upper ontology for Benefits Management is that the current Benefits Management approaches do not provide a framework for capturing and representing semantic information. There is also a need to capture benefits for cloud computing developments to provide existing and future users of cloud computing with better investment information for decision making. This paper describes the development of an upper ontology to capture greater levels of knowledge from stakeholders and IS professionals in cloud computing procurement and implementation. Complex relationships are established between cloud computing enablers, enabling changes, business changes, benefits and investment objectives.
... Strohmaier et al. [23] investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provides new metrics to quantify various aspects of the collaborative engineering processes. Wang et al. [24] have used association-rule mining to analyze user editing patterns in collaborative ontologyengineering projects. ...
Article
Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.
Conference Paper
Last several years have seen the growing shift towards collaborative development of ontologies. Collaborative ontology development has become important particularly for large-scale projects involving multilingual contributors from different countries. Collaborators propose, discuss, create and modify ontologies and this whole process must be understood. In this article, Wikidata has been taken as an example to understand how community-driven approach is used to develop a multilingual ontology and in the subsequent building of a knowledge base.
Conference Paper
Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository containing more than 500 biomedical ontologies. In 2016, BioPortal represents one of the largest portals for exploration of semantic biomedical vocabularies and terminologies, which is used by many researchers and practitioners. While usage of this portal is high, we know very little about how exactly users search and explore ontologies and what kind of usage patterns or user groups exist in the first place. Deeper insights into user behavior on such portals can provide valuable information to devise strategies for a better support of users in exploring and finding existing ontologies, and thereby enable better ontology reuse. To that end, we study and group users according to their browsing behavior on BioPortal and use data mining techniques to characterize and compare exploration strategies across ontologies. In particular, we were able to identify seven distinct browsing types, all relying on different functionality provided by BioPortal. For example, Search Explorers extensively use the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy for exploring ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website. Our results may guide the development of more user-oriented systems for ontology exploration on the Web.
Chapter
Social content collection sites allow regular netizens to create communities of interest and share information at unprecedented scale. As a point of reference, MediaWiki (the wiki that powers Wikipedia) has millions of installations that allow non-programmers to contribute content. Because the content has very little structure, the information cannot be easily aggregated to answer simple questions. In recent years, several approaches have emerged for social knowledge collection, allowing a community of contributors to structure content so that information can be aggregated to answer reasonably interesting albeit simple factual queries. This chapter gives an overview of existing social knowledge collection research, ranging from intelligent interfaces for collection of semi-structured repositories of common knowledge, semantic wikis for organizing community resources, and collaborative ontology editors to create consensus taxonomies with classes and properties. The chapter ends with a reflection on open research problems in this area.
Conference Paper
Understanding the processes and dynamics behind the collaborative construction of ontologies will enable the development of quality ontologies in distributed settings. In this paper, we investigate the collaborative processes behind ontology development with two Web-based modeling tools, WebProtégé and MoKi. We performed a quantitative analysis of user activity logs from both tools. This analysis sheds light on (i) the way people edit an ontology in collaborative settings, and (ii) the role of discussion activities in collaborative ontology development. To explore whether the ontology tool influences the collaboration processes, we conducted five investigations using the collaborative data from both tools and we found that users tend to collaborate in similar ways, even if the tools and their collaboration support differ. We believe these findings are valuable because they advance our understanding of collaboration processes in ontology development, and they can serve as a guide for developers of collaborative tools.
Article
Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patent, and project reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer the further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a RDB-to-Triples transformer is implemented. Lastly, we show an experiment on R&D data integration using the lightweight ontology, triples generation, and visualization and navigation of the knowledge map.
Conference Paper
Contributors in hundreds of semantic wiki sites are creating structured information in RDF every day, thus growing the semantic content of the Web in spades. Although wikis have been analyzed extensively, there has been little analysis of the use of semantic wikis. The Provenance Bee Wiki was created to gather and aggregate data from these sites, show how this content is growing over time, and to make all this detailed data readily available to the research community. We also present a high-level analysis of the almost 600 wikis indexed in Provenance Bee Wiki that have less than 5,000 pages.
Article
Semantic MediaWikis represent shared and discretionary databases that allow a community of contributors to capture knowledge and to specify semantic features, such as properties for articles, relationships between articles, or concepts that filter articles for certain property values. Today, Semantic MediaWikis have received a lot of attention by a range of different groups that aim to organize an array of different subjects and domain knowledge. However, while some Semantic MediaWiki projects have been thriving, others have failed to reach critical mass. We have collected and analyzed a total of 79 publicly available Semantic MediaWiki instances to learn more about these projects and how they differ from each other. Further, we conducted an empirical analysis using critical mass theory on Semantic MediaWiki communities to investigate whether activity or the number of registered users (or a mixture of both) are important for achieving critical mass. In addition, we conduct experiments aiming to predict user activity and the number of registered users at certain points in time. Our work provides new insights into Semantic MediaWiki communities, how they evolve and first insights into how they can be studied using critical mass theory.
Article
Within the last few years the importance of collaborative ontology-engineering projects, especially in the biomedical domain, has drastically increased. This recent trend is a direct consequence of the growing complexity of these structured data representations, which no single individual is able to handle anymore. For example, the World Health Organization is currently actively developing the next revision of the International Classification of Diseases (ICD), using an OWL-based core for data representation and Web 2.0 technologies to augment collaboration. This new revision of ICD consists of roughly 50,000 diseases and causes of death and is used in many countries around the world to encode patient history, to compile health-related statistics and spendings. Hence, it is crucial for practitioners to better understand and steer the underlying processes of how users collaboratively edit an ontology. Particularly, generating predictive models is a pressing issue as these models may be leveraged for generating recommendations in collaborative ontology-engineering projects and to determine the implications of potential actions on the ontology and community. In this paper we approach this task by (i) exploring whether regularities and common patterns in user action sequences, derived from change-logs of five different collaborative ontology-engineering projects from the biomedical domain, exist. Based on this information we (ii) model the data using Markov chains of varying order, which are then used to (iii) predict user actions in the sequences at hand.
Article
Full-text available
The exchange of information among organizational employees is a vital component of the knowledge-management process. Modern information and telecommunication technology is available to support such exchanges across time and distance barriers. However, organizations investing in this type of technology often face difficulties in encouraging their employees to use the system to share their ideas. This paper elaborates on previous research, suggesting that sharing personal insights with one's co-workers may carry a cost for some individuals which may yield, at the aggregate level, a co-operation dilemma, similar to a public-good dilemma. A review of the research on different types of public-good dilemmas provides some indications of the specific interventions that may help organizations encourage the kind of social dynamics that will increase overall knowledge sharing. These interventions can be classified into three categories: interventions aimed at restructuring the pay-offs for contributing, those that try to increase efficacy perceptions, and those that make employees' sense of group:identity and personal responsibility more salient.
Article
Full-text available
With the emergence of tools for collaborative ontology engineering, more and more data about the creation process behind collaborative construction of ontologies is becoming available. Today, collaborative ontology engineering tools such as Collaborative Protégé offer rich and structured logs of changes, thereby opening up new challenges and opportunities to study and analyze the creation of collaboratively constructed ontologies. While there exists a plethora of visualization tools for ontologies, they have primarily been built to visualize aspects of the final product (the ontology) and not the collaborative processes behind construction (e.g. the changes made by contributors over time). To the best of our knowledge, there exists no ontology visualization tool today that focuses primarily on visualizing the history behind collaboratively constructed ontologies. Since the ontology engineering processes can influence the quality of the final ontology, we believe that visualizing process data represents an important stepping-stone towards better understanding of managing the collaborative construction of ontologies in the future. In this application paper, we present a tool - PragmatiX - which taps into structured change logs provided by tools such as Collaborative Protégé to visualize various pragmatic aspects of collaborative ontology engineering. The tool is aimed at managers and leaders of collaborative ontology engineering projects to help them in monitoring progress, in exploring issues and problems, and in tracking quality-related issues such as overrides and coordination among contributors. The paper makes the following contributions: (i) we present PragmatiX, a tool for visualizing the creation process behind collaboratively constructed ontologies (ii) we illustrate the functionality and generality of the tool by applying it to structured logs of changes of two large collaborative ontology-engineering projects and (iii) we conduct a heuristic evaluation of the tool with domain experts to uncover early design challenges and opportunities for improvement. Finally, we hope that this work sparks a new line of research on visualization tools for collaborative ontology engineering projects.
Article
Full-text available
The need for the establishment of evaluation methods that can measure respective improvements or degradations of ontological models-e.g. yielded by a precursory ontology population stage-is undisputed. We propose an evaluation scheme that allows to employ a number of different ontologies and to measure their performance on specific tasks. In this paper we present the resulting task-based approach for quantitative ontology evaluation, which also allows for a bootstrapping approach to ontology population. Benchmark tasks commonly feature a so-called gold-standard defining perfect performance. By selecting ontology-based approaches for the respective tasks, the ontology-dependent part of the performance can be measured. Following this scheme, we present the results of an experiment for testing and incrementally augmenting ontologies using a well-defined benchmark problem based on a evaluation gold-standard.
Article
Full-text available
In this paper we provide an overview and analysis of ap- proaches for dealing with inconsistencies in DL-based ontologies. We propose criteria for the comparison of the dierent approaches. These criteria facilitate the users to choose an appropriate approach to dealing with inconsistencies for their purpose.
Article
Full-text available
In this paper, we present WebProtégé-a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-based content creation and online collaboration for granted. WebProtégé integrates these features as part of the ontology development process itself. We tried to lower the entry barrier to ontology development by providing a tool that is accessible from any Web browser, has extensive support for collaboration, and a highly customizable and pluggable user interface that can be adapted to any level of user expertise. The declarative user interface enabled us to create custom knowledge-acquisition forms tailored for domain experts. We built WebProtégé using the existing Protégé infrastructure, which supports collaboration on the back end side, and the Google Web Toolkit for the front end. The generic and extensible infrastructure allowed us to easily deploy WebProtégé in production settings for several projects. We present the main features of WebProtégé and its architecture and describe briefly some of its uses for real-world projects. WebProtégé is free and open source. An online demo is available at http://webprotege.stanford.edu.
Article
Full-text available
"The best thinking in the agile development community brought to street-level in the form of implementable strategy and tactics. Essential reading for anyone who shares the passion for creating quality software."-Eric Olafson, CEO Tomax"Crystal Clear is beyond agile. This book leads you from software process hell to successful software development by practical examples and useful samples."-Basaki Satoshi, Schlumberger"A very powerful message, delivered in a variety of ways to touch the motivation and understanding of many points of view."-Laurie Williams, Assistant Professor, North Carolina State University"A broad, rich understanding of small-team software development based on observations of what actually works."-John Rusk"A superb synthesis of underlying principles and a clear description of strategies and techniques."-Géry Derbier, Project Manager, Solistic"Alistair Cockburn shows how small teams can be highly effective at developing fit-for-purpose software by following a few basic software development practices and by creating proper team dynamics. These small teams can be much more effective and predictable than much larger teams that follow overly bureaucratic and prescriptive development processes."-Todd Little, Sr. Development Manager, Landmark Graphics"I find Cockburn's writings on agile methods enlightening: He describes 'how to do,' of course, but also how to tell whether you're doing it right, to reach into the feeling of the project. This particular book's value is that actual project experiences leading to and confirming the principles and practices are so...well...clearly presented."-Scott Duncan, ASQ Software Division Standards Chair and representative to the US SC7 TAG and IEEE S2ESC Executive Committee and Management Board and Chair of IEEE Working Group 1648 on agile methods"Crystal Clear identifies principles that work not only for software development, but also for any results-centric activities. Dr. Cockburn follows these principles with concrete, practical examples of how to apply the principles to real situations and roles and to resolve real issues."-Niel Nickolaisen, COO, Deseret Book"All the successful projects I've been involved with or have observed over the past 19 or so years have had many of the same characteristics as described in Crystal Clear (even the big projects). And many of the failed projects failed because they missed something-such as expert end-user involvement or accessibility throughout the project. The final story was a great read. Here was a project that in my opinion was an overwhelming success-high productivity, high quality, delivery, happy customer, and the fact that the team would do it again. The differing styles in each chapter kept it interesting. I started reading it and couldn't put it down, and by the end, I just had to say 'Wow!'"-Ron Holliday, Director, Fidelity Management ResearchCarefully researched over ten years and eagerly anticipated by the agile community, Crystal Clear: A Human-Powered Methodology for Small Teams is a lucid and practical introduction to running a successful agile project in your organization. Each chapter illuminates a different important aspect of orchestrating agile projects.Highlights include Attention to the essential human and communication aspects of successful projects Case studies, examples, principles, strategies, techniques, and guiding properties Samples of work products from real-world projects instead of blank templates and toy problems Top strategies used by software teams that excel in delivering quality code in a timely fashion Detailed introduction to emerging best-practice techniques, such as Blitz Planning, Project 360ᄎ, and the essential Reflection Workshop Question-and-answer with the author about how he arrived at these recommendations, including where they fit with CMMI, ISO, RUP, XP, and other methodologies A detailed case study, including an ISO auditor's analysis of the projectPerhaps the most important contribution this book offers is the Seven Properties of Successful Projects. The author has studied successful agile projects and identified common traits they share. These properties lead your project to success; conversely, their absence endangers your project.© Copyright Pearson Education. All rights reserved.
Conference Paper
Full-text available
Biomedical ontologies such as the 11th revision of the International Classification of Diseases and others are increasingly produced with the help of collaborative ontology engineering platforms that facilitate cooperation and coordination among a large number of users and contributors. While collaborative approaches to engineering biomedical ontologies can be expected to yield a number of advantages, such as increased participation and coverage, they come with a number of novel challenges and risks. For example, they might suffer from low participation, lack of coordination, lack of control or other related problems that are neither well understood nor addressed by the current state of research. In this paper, we aim to tackle some of these problems by exploring techniques for recommending concepts to experts on collaborative ontology engineering platforms. In detail, this paper will (i) discuss different recommendation techniques from the literature (ii) map and apply these categories to the domain of collaboratively engineered biomedical ontologies and (iii) present prototypical implementations of selected recommendation techniques as a proof-of-concept.
Article
Full-text available
While in the past taxonomic and ontological knowledge was traditionally produced by small groups of co-located experts, today the production of such knowledge has a radically different shape and form. For example, potentially thousands of health professionals, scientists, and ontology experts will collaboratively construct, evaluate and maintain the most recent version of the International Classification of Diseases (ICD-11), a large ontology of diseases and causes of deaths managed by the World Health Organization. In this work, we present a novel web-based tool—iCAT Analytics—that allows to investigate systematically crowd-based processes in knowledge-production systems. To enable such investigation, the tool supports interactive exploration of pragmatic aspects of ontology engineering such as how a given ontology evolved and the nature of changes, discussions and interactions that took place during its production process. While iCAT Analytics was motivated by ICD-11, it could potentially be applied to any crowd-based ontology-engineering project. We give an introduction to the features of iCAT Analytics and present some insights specifically for ICD-11. 1
Article
Full-text available
An ontology is an explicit formal conceptualization of some domain of interest. Ontologies are increasingly used in various fields such as knowledge management, information extraction, and the semantic web. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion of application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper presents a survey of the state of the art in ontology evaluation.
Conference Paper
Full-text available
Ontologies now play an important role for many knowledge-intensive applications for which they provide a source of precisely defined terms. However, with their wide-spread usage there come problems concerning their proliferation. Ontology engineers or users frequently have a core ontology that they use, e.g., for browsing or querying data, but they need to extend it with, adapt it to, or compare it with the large set of other ontologies. For the task of detecting and retrieving relevant ontologies, one needs means for measuring the similarity between ontologies. We present a set of ontology similarity measures and a multiple-phase empirical evaluation.
Chapter
Full-text available
Recent years have seen rapid progress in the development of ontologies as semantic models intended to capture and represent aspects of the real world. There is, however, great variation in the quality of ontologies. If ontologies are to become progressively better in the future, more rigorously developed, and more appropriately compared, then a systematic discipline of ontology evaluation must be created to ensure quality of content and methodology. Systematic methods for ontology evaluation will take into account representation of individual ontologies, performance (in terms of accuracy, domain coverage and the efficiency and quality of automated reasoning using the ontologies) on tasks for which the ontology is designed and used, degree of alignment with other ontologies and their compatibility with automated reasoning. A sound and systematic approach to ontology evaluation is required to transform ontology engineering into a true scientific and engineering discipline. This chapter discusses issues and problems in ontology evaluation, describes some current strategies, and suggests some approaches that might be useful in the future.
Conference Paper
Full-text available
Enterprise modelling focuses on the construction of a structured description, the so-called enterprise model, which represents aspects relevant to the activity of an enterprise. Although it has become clearer recently that enterprise modelling is a collaborative activity, involving a large number of people, most of the enterprise modelling tools still only support very limited degrees of collaboration. Within this contribution we describe a tool for enterprise modelling, called MoKi (MOdelling wiKI), which supports agile collaboration between all different actors involved in the enterprise modelling activities. MoKi is based on a Semantic Wiki and enables actors with different expertise to develop an enterprise model not only using structural (formal) descriptions but also adopting more informal and semi-formal descriptions of knowledge.
Conference Paper
Full-text available
Semantic MediaWiki is an extension of MediaWiki – a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki. The software is already used on a number of productive installations world-wide, but the main target remains to establish “Semantic Wikipedia” as an early adopter of semantic technologies on the web. Thus usability and scalability are as important as powerful semantic features.
Conference Paper
Full-text available
With the wider use of ontologies in the Semantic Web and as part of production systems, multiple scenarios for ontology maintenance and evo- lution are emerging. For example, successive ontology versions can be posted on the (Semantic) Web, with users discovering the new versions serendipitously; ontology-developmentinacollaborative environmentcanbesynchronousorasyn- chronous; managers of projects may exercise quality control, examining changes from previous baseline versions and accepting or rejecting them before a new baseline is published, and so on. In this paper, we present different scenarios for ontology maintenance and evolution that we have encountered in our own projects and in those of our collaborators. We define several features that cate- gorize these scenarios. For each scenario, we discuss the high-level tasks that an editing environment must support. We then present a unified comprehensive set of tools to support different scenarios in a single framework, allowing users to switch between different modes easily.
Conference Paper
Full-text available
The World Health Organization is beginning to use Semantic Web technologies in the development of the 11th revision of the International Classification of Diseases (ICD-11). Health officials use ICD in all United Nations member countries to compile basic health statistics, to monitor health-related spending, and to inform policy makers. While previous revisions of ICD encoded minimal information about a disease, and were mainly published as books and tabulation lists, the creators of ICD-11 envision that it will become a multi-purpose and coherent classification ready for electronic health records. Most important, they plan to have ICD-11 applied for a much broader variety of uses than previous revisions. The new requirements entail significant changes in the way we represent disease information, as well as in the technologies and processes that we use to acquire the new content. In this paper, we describe the previous processes and technologies used for developing ICD. We then describe the requirements for the new development process and present the Semantic Web technologies that we use for ICD-11. We outline the experiences of the domain experts using the software system that we implemented using Semantic Web technologies. We then discuss the benefits and challenges in following this approach and conclude with lessons learned from this experience.
Conference Paper
Full-text available
Ontologies are becoming so large in their coverage that no single per- son or a small group of people can develop them effectively and ontology de- velopment becomes a community-based enterprise. In this paper, we discuss re- quirements for supporting collaborative ontology development and present Col- laborative Prot´ ege—a tool that supports many of these requirements, such as discussions integrated with ontology-editing process, chats, and annotations of changes and ontology components. We have evaluated Collaborative Prot´ ege in the context of ontology development in an ongoing large-scale biomedical project that actively uses ontologies at the VA Palo Alto Healthcare System. Users have found the new tool effective as an environment for carrying out discussions and for recording references for the information sources and design rationale.
Conference Paper
Full-text available
Wikipedia editors are uniquely motivated to collaborate around current and breaking news events. However, the speed, urgency, and intensity with which these collaborations unfold also impose a substantial burden on editors' abilities to effectively coordinate tasks and process information. We analyze the patterns of activity on Wikipedia following the 2011 Tōhoku earthquake and tsunami to understand the dynamics of editor attention and participation, novel practices employed to collaborate on these articles, and the resulting coauthorship structures which emerge between editors and articles. Our findings have implications for supporting future coverage of breaking news articles, theorizing about motivations to participate in online community, and illuminating Wikipedia's potential role in storing cultural memories of catastrophe.
Conference Paper
Full-text available
Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a high-quality, authoritative encyclopedia. Some Wikipedia arti-cles, however, are of questionable quality, and it is not al-ways apparent to the visitor which articles are good and which are bad. We propose a simple metric – word count – for measuring article quality. In spite of its striking simplic-ity, we show that this metric significantly outperforms the more complex methods described in related work.
Conference Paper
Full-text available
We present SOBOLEO, a system for the webbased collab- orative engineering of SKOS ontologies and annotation of web resources. SOBOLEO enables the simple creation, ex- tension and maintenance of taxonomies. At the same time, it supports the annotation of web resources with concepts from this taxonomy.
Conference Paper
Full-text available
We present OntoWiki, a tool providing support for agile, distributed knowledge engineering scenarios. OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWYG for text documents. It fosters social collaboration aspects by keeping track of changes, allowing to comment and discuss every single part of a knowledge base, enabling to rate and measure the popularity of content and honoring the activity of users. Ontowiki enhances the browsing and retrieval by offering semantic enhanced search strategies. All these techniques are applied with the ultimate goal of decreasing the entrance barrier for projects and domain experts to collaborate using semantic technologies. In the spirit of the Web 2.0 OntoWiki implements an ”architecture of participation” that allows users to add value to the application as they use it. It is available as open-source software and a demonstration platform can be accessed at http://3ba.se.
Conference Paper
Full-text available
Wikipedia's success is often attributed to the large numbers of contributors who improve the accuracy, completeness and clarity of articles while reducing bias. However, because of the coordination needed to write an article collaboratively, adding contributors is costly. We examined how the number of editors in Wikipedia and the coordination methods they use affect article quality. We distinguish between explicit coordination, in which editors plan the article through communication, and implicit coordination, in which a subset of editors structure the work by doing the majority of it. Adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not. Implicit coordination through concentrating the work was more helpful when many editors contributed, but explicit coordination through communication was not. Both types of coordination improved quality more when an article was in a formative stage. These results demonstrate the critical importance of coordination in effectively harnessing the "wisdom of the crowd" in online production environments.
Conference Paper
Full-text available
Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.
Conference Paper
Full-text available
Creating and designing an ontology is a complex task requir- ing discussions between domain and ontology engineering experts as well as the users of an ontology. We present the Cicero tool, that facilitates ecient discussions and accelerates the convergence to decisions. Fur- thermore, by integrating it with an ontology editor, it helps to improve the documentation of an ontology.
Conference Paper
Full-text available
Building and maintaining thesauri are complex and laborious tasks. PoolParty is a Thesaurus Management Tool (TMT) for the Semantic Web, which aims to support the creation and maintenance of thesauri by utilizing Linked Open Data (LOD), text-analysis and easy-to-use GUIs, so thesauri can be managed and utilized by domain experts without needing knowledge about the semantic web. Some aspects of thesaurus management, like the editing of labels, can be done via a wiki-style interface, allowing for lowest possible access barriers to contribution. PoolParty can analyse documents in order to glean new concepts for a thesaurus. Additionally a thesaurus can be enriched by retrieving relevant information from Linked Data sources and thesauri can be imported and updated via LOD URIs from external systems and also can be published as new linked data sources on the semantic web.
Conference Paper
Full-text available
Ontology engineering processes in truly distributed settings like the Semantic Web or global peer-to-peer systems may not be ade- quately supported by conventional, centralized ontology engineering methodologies. In this paper, we present our work towards the DILI- GENT methodology, which is intended to support domain experts in a distributed setting to engineer and evolve ontologies with the help of a ne-grained methodological approach based on Rhetorical Structure Theory, viz. the DILIGENT model of ontology engineering by argumentation.
Conference Paper
Full-text available
Wikipedia is a wiki-based encyclopedia that has become one of the most popular collaborative on-line knowledge systems. As in any large collaborative system, as Wikipedia has grown, conflicts and coordination costs have increased dramatically. Visual analytic tools provide a mechanism for addressing these issues by enabling users to more quickly and effectively make sense of the status of a collaborative environment. In this paper we describe a model for identifying patterns of conflicts in Wikipedia articles. The model relies on users' editing history and the relationships between user edits, especially revisions that void previous edits, known as "reverts". Based on this model, we constructed Revert Graph, a tool that visualizes the overall conflict patterns between groups of users. It enables visual analysis of opinion groups and rapid interactive exploration of those relationships via detail drill- downs. We present user patterns and case studies that show the effectiveness of these techniques, and discuss how they could generalize to other systems.
Article
Full-text available
Research on trolls is scarce, but their activities challenge online communities; one of the main challenges of the Wikipedia community is to fight against vandalism and trolls. This study identifies Wikipedia trolls’ behaviours and motivations, and compares and contrasts hackers with trolls; it extends our knowledge about this type of vandalism and concludes that Wikipedia trolls are one type of hacker. This study reports that boredom, attention seeking, and revenge motivate trolls; they regard Wikipedia as an entertainment venue, and find pleasure from causing damage to the community and other people. Findings also suggest that trolls’ behaviours are characterized as repetitive, intentional, and harmful actions that are undertaken in isolation and under hidden virtual identities, involving violations of Wikipedia policies, and consisting of destructive participation in the community.
Conference Paper
Full-text available
Prior research on Wikipedia has characterized the growth in content and editors as being fundamentally exponential in nature, extrapolating current trends into the future. We show that recent editing activity suggests that Wikipedia growth has slowed, and perhaps plateaued, indicating that it may have come against its limits to growth. We measure growth, population shifts, and patterns of editor and administrator activities, contrasting these against past results where possible. Both the rate of page growth and editor growth has declined. As growth has declined, there are indicators of increased coordination and overhead costs, exclusion of newcomers, and resistance to new edits. We discuss some possible explanations for these new developments in Wikipedia including decreased opportunities for sharing existing knowledge and increased bureaucratic stress on the socio-technical system itself. The existing trends of exponential growth in digital technologies were the basis for Kurzweil's (17) argument that biological evolution and technological evolution follow a law of accelerating returns (i.e., exponential or even super-exponential growth). This lead to the notion of the "Singularity": a point in the near future when technological change becomes "so rapid and profound that it represents a rupture in the fabric of human history." 1 We argue that Wikipedia, one of the world's largest knowledge aggregators, does indeed mirror the growth of natural populations, but, following Darwin (7), we suggest that this growth becomes increasingly constrained and limited, and under those conditions there will be increased evidence of competition and dominance. In this paper, we present data that challenges the notion that Wikipedia exhibits unconstrained exponential growth in editor participation and contribution. We will show that growth has decreased substantially over the last two years, perhaps indicating some fundamental limiting constraints to growth. In ecological systems, when unfettered population growth approaches natural limits (e.g., in available resources), one generally observes increased competition. For Wikipedia, we will examine the data for indicators of increased competition that would be expected as a growing population system comes up against limits to growth. We present data from Wikipedia addressing three different aspects over time: the global activity level, a detailed analysis of the edit rates of various editor classes, and the population shifts in editor classes.
Article
Full-text available
Wikipedia has been a resounding success story as a collaborative system with a low cost of online participation. However, it is an open question whether the success of Wikipedia results from a “wisdom of crowds ” type of effect in which a large number of people each make a small number of edits, or whether it is driven by a core group of “elite ” users who do the lion’s share of the work. In this study we examined how the influence of “elite ” vs. “common ” users changed over time in Wikipedia. The results suggest that although Wikipedia was driven by the influence of “elite ” users early on, more recently there has been a dramatic shift in workload to the “common ” user. We also show the same shift in del.icio.us, a very different type of social collaborative knowledge system. We discuss how these results mirror the dynamics found in more traditional social collectives, and how they can influence the design of new collaborative knowledge systems. Author Keywords Wikipedia, Wiki, collaboration, collaborative knowledge
Article
Full-text available
The rise of the Internet has enabled collaboration and cooperation on an unprecedentedly large scale. The online encyclopedia Wikipedia, which presently com- prises 7.2 million articles created by 7.04 million dis- tinct editors, provides a consummate example. We ex- amined all 50 million edits made to the 1.5 million English-language Wikipedia articles and found that the high-quality articles are distinguished by a marked in- crease in number of edits, number of editors, and inten- sity of cooperative behavior, as compared to other arti- cles of similar visibility and age. This is significant be- cause in other domains, fruitful cooperation has proven to be difficult to sustain as the size of the collabora- tion increases. Furthermore, in spite of the vagaries of human behavior, we show that Wikipedia articles ac- crete edits according to a simple stochastic mechanism in which edits beget edits. Topics of high interest or relevance are thus naturally brought to the forefront of quality.
Conference Paper
Full-text available
The increased availability of online knowledge has led to the design of several algorithms that solve a variety of tasks by harvesting the Semantic Web, i.e. by dynamically selecting and exploring a multitude of online ontologies. Our hypothesis is that the performance of such novel algorithms implicity provides an insight into the quality of the used ontologies and thus opens the way to a task-based evaluation of the Semantic Web. We have investigated this hypothesis by studying the lessons learnt about online ontologies when used to solve three tasks: ontology matching, folksonomy enrichment, and word sense disambiguation. Our analysis leads to a suit of conclusions about the status of the Semantic Web, which highlight a number of strengths and weaknesses of the semantic information available online and complement the findings of other analysis of the Semantic Web landscape.
Article
Full-text available
Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia’s fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research are explored including examples and first results. Several characteristics that are found in Wikipedia, such as exponential growth and scale-free networks are already known in other context. However the Wiki architecture also possesses some intrinsic specialities. General trends are measured that are typical for all Wikipedias but vary between languages in detail.
Conference Paper
The evaluation of ontologies is vital for the growth of the Semantic Web. We consider a number of problems in evaluating a knowledge artifact like an ontology. We propose in this paper that one approach to ontology evaluation should be corpus or data driven. A corpus is the most accessible form of knowledge and its use allows a measure to be derived of the ‘fit’ between an ontology and a domain of knowledge. We consider a number of methods for measuring this ‘fit’ and propose a measure to evaluate structural fit, and a probabilistic approach to identifying the best ontology.
Article
In this book the reader is provided with a tour of the principal results and ideas in the theories of completely positive maps, completely bounded maps, dilation theory, operator spaces, and operator algebras, together with some of their main applications. The author assumes only that the reader has a basic background in functional analysis and C * -algebras, and the presentation is self-contained and paced appropriately for graduate students new to the subject. The book could be used as a text for a course or for independent reading; with this in mind, many exercises are included. Experts will also want this book for their library, since the author presents new and simpler proofs of some of the major results in the area, and many applications are also included. This will be an indispensable introduction to the theory of operator spaces for all who want to know more.
Conference Paper
In our work we extend the traditional bipartite model of ontologies with the social dimension, leading to a tripartite model of actors, concepts and instances. We demonstrate the application of this representation by showing how community-based semantics emerges from this model through a process of graph transformation. We illustrate ontology emergence by two case studies, an analysis of a large scale folksonomy system and a novel method for the extraction of community-based ontologies from Web pages.
Article
In our work the traditional bipartite model of ontologies is extended with the social dimension, leading to a tripartite model of actors, concepts and instances. We demonstrate the application of this representation by showing how community-based semantics emerges from this model through a process of graph transformation. We illustrate ontology emergence by two case studies, an analysis of a large scale folksonomy system and a novel method for the extraction of community-based ontologies from Web pages.
Conference Paper
We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Reverts are important to maintaining the quality of Wikipedia. They fix mistakes, repair vandalism, and help enforce policy. However, reverts can also be damaging, especially to the aspiring editor whose work they destroy. In this research we analyze 400,000 Wikipedia revisions to understand the effect that reverts had on editors. We seek to understand the extent to which they demotivate users, reducing the workforce of contributors, versus the extent to which they help users improve as encyclopedia editors. Overall we find that reverts are powerfully demotivating, but that their net influence is that more quality work is done in Wikipedia as a result of reverts than is lost by chasing editors away. However, we identify key conditions -- most specifically new editors being reverted by much more experienced editors - under which reverts are particularly damaging. We propose that reducing the damage from reverts might be one effective path for Wikipedia to solve the newcomer retention problem.
Article
The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.
Article
Proponents of open source style software development claim that better software is produced using this model compared with the traditional closed model. However, there is little empirical evidence in support of these claims. In this paper, we present the results of a pilot case study aiming: (a) to understand the implications of structural quality; and (b) to figure out the benefits of struc- tural quality analysis of the code delivered by open source style development. To this end, we have measured quality characteristics of 100 applications written for Linux, using a software measurement tool, and compared the results with the industrial standard that is proposed by the tool. Another target of this case study was to investigate the issue of modularity in open source as this characteristic is being considered crucial by the proponents of open source for this type of soft- ware development. We have empirically assessed the relationship between the size of the application components and the delivered quality measured through user satisfaction. We have determined that, up to a certain extent, the average component size of an application is negatively related to the user satisfaction for this application.
Article
Over the last 8 years, the National Cancer Institute (NCI) has launched a major effort to integrate molecular and clinical cancer-related information within a unified biomedical informatics framework, with controlled terminology as its foundational layer. The NCI Thesaurus is the reference terminology underpinning these efforts. It is designed to meet the growing need for accurate, comprehensive, and shared terminology, covering topics including: cancers, findings, drugs, therapies, anatomy, genes, pathways, cellular and subcellular processes, proteins, and experimental organisms. The NCI Thesaurus provides a partial model of how these things relate to each other, responding to actual user needs and implemented in a deductive logic framework that can help maintain the integrity and extend the informational power of what is provided. This paper presents the semantic model for cancer diseases and its uses in integrating clinical and molecular knowledge, more briefly examines the models and uses for drug, biochemical pathway, and mouse terminology, and discusses limits of the current approach and directions for future work.