About
196
Publications
64,777
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,358
Citations
Introduction
Lars Vogt currently works at the Leibniz Information Centre for Science and Technology University Library (TIB) in the Data Science & Digital Libraries Department. Lars does research in Ontologies and Knowledge Management, Biodiversity Research, Theory of Phylogenetics, and Philosophy of Science. Their current project is 'eScience-Compliant Standards for Morphology', 'SOCCOMAS: A Web Content Management System based on a Semantic Programming Ontology' and 'Semantic Programming Ontology (SPrO)'.
Additional affiliations
October 2019 - present
October 2008 - September 2019
July 2005 - September 2008
Publications
Publications (196)
The present article discusses the need for standardization in morphology in order to increase comparability and communicability of morphological data. We analyse why only morphological descriptions and not character matrices represent morphological data
and why morphological terminology must be free of homology assumptions. We discuss why images on...
The problem of homology has been a consistent source of controversy at the heart of systematic biology, as has the step of morphological character analysis in phylogenetics. Based on a clear epistemic framework and a characterization of “characters” as diagnostic evidence units for the recognition of not directly identifiable entities, I discuss th...
Background: With the emergence of high-throughput technologies, Big Data and eScience, the use of online data repositories and the establishment of new data standards that require data to be computer-parsable become increasingly important. As a consequence, there is an increasing need for an integrated system of hierarchies of levels of different t...
Background:
Currently, almost all morphological data are published as unstructured free text descriptions. This not only brings about terminological problems regarding semantic transparency, which hampers their re-use by non-experts, but the data cannot be parsed by computers either, which in turn hampers their integration across many fields in th...
Background
In today’s landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles—ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full poten...
Building on the Open Research Knowledge Graph as an infrastructure for the production, curation, and publication of FAIR scientific knowledge, we present a concept that models original articles and the corresponding expression in the ORKG as independent and interlinked FDOs by organizing the content describing an article into semantic units.
This article presents the implementation of a neuro-symbolic system within the Open Research Knowledge Graph (ORKG), a platform designed to collect and organize scientific knowledge in a structured, machine-readable format. Our approach leverages the strengths of symbolic knowledge representation to encode complex relationships and domain-specific...
This article describes advancements in the ongoing digital transformation in materials science and engineering. It is driven by domain‐specific successes and the development of specialized digital data spaces. There is an evident and increasing need for standardization across various subdomains to support science data exchange across entities. The...
After a brief motivation, I clarify some key concepts and terms, including 'RDF', 'triple', 'semantic graph', 'ontology', 'knowledge graph', and 'reasoning', before giving a short analysis of what I believe are currently the main challenges (i.e., limited cognitive interoperability) and barriers (i.e., semantic parsing burden) that prevent semantic...
After a brief motivation, I clarify some key concepts and terms, before giving a short analysis of what I believe are currently the main challenges (i.e., limited cognitive interoperability) and barriers (i.e., semantic parsing burden) that prevent semantic (=OWL) knowledge graphs and ontologies from wider use. In the second part of the presentatio...
Starting with human communication, I start to analyze what the requirements are for interoperable communication of textual information between humans and the structural characteristics of natural language that support interoperability. Based on that analysis, I discuss the parallels between the structure of natural language statements and data stru...
Food information engineering relies on statistical and AI techniques (e.g., symbolic, connectionist, and neurosymbolic AI) for collecting, storing, processing, diffusing, and putting food information in a form exploitable by humans and machines. Food information is collected manually and automatically. Once collected, food information is organized...
Machines need data and metadata to be machine-actionable and FAIR (findable, accessible, interoperable, reusable) to manage increasing data volumes. Knowledge graphs and ontologies are key to this, but their use is hampered by high access barriers due to required prior knowledge in semantics and data modelling. The Rosetta Statement approach propos...
Knowledge graphs and ontologies are becoming increasingly vital as they align with the FAIR Guiding Principles (Findable, Accessible, Interoperable, Reusable). We address eleven challenges that may impede the full realization of the potential of FAIR knowledge graphs, as conventional solutions are perceived to be overly complex and lacking in cogni...
Langfristige Aufzeichnungen und Vergleiche mit historischen Daten, sogenannten
Legacy Data, sind entscheidend für die exakte Analyse der Reaktionen von Pflanzen- und Tierarten auf globale Veränderungen, unter anderem auf Klimawandel und biologische Invasionen. Die überzeugendsten Belege für Auswirkungen auf die Biodiversität stammen aus Langzeitstu...
Looking at the EOSC Interoperability Framework, we suggest to add cognitive interoperability as another layer and provide a characterization of it. Then we outline how the Open Research Knowledge Graph could provide FAIR Digital Objects of scholarly publications that contain structured data about the contents of the publication in addition to its b...
The ORKG has opened a new era in the way scholarly knowledge is curated, managed, and disseminated. By transforming vast arrays of unstructured narrative text
into structured, machine-processable knowledge, the ORKG has emerged as an
essential service with sophisticated functionalities. Over the past five years, our
team has developed the ORKG into...
When looking at the structures of natural language and human verbal communication that support semantic interoperability between humans, we can identify parallels to structures of data schemata. Based on this analysis, we can distinguish terminological and propositional interoperability as two aspects of semantic interoperability. Within the intero...
FAIR data presupposes their successful communication between machines and humans while preserving their meaning and reference, requiring all parties involved to share the same background knowledge. Inspired by English as a natural language, we investigate the linguistic structure that ensures reliable communication of information and draw parallels...
We take a look at how FAIR Digital Objects (FDOs) could be used for organizing the contents of scholarly publications to make them accessible to machines. Such scholarly FDOs should contain machine-actionable information about the actual contents of the publication, and not only its bibliographic metadata. We discuss how this machine-actionable inf...
The Open Research Knowledge Graph (ORKG) is a digital library for machine-actionable scholarly knowledge, with a focus on structured research comparisons obtained through expert crowdsourcing. While the ORKG has attracted a community of more than 1,000 users, the curated data has not been subject to an in-depth quality assessment so far. Here, prop...
With the exponential increase in scientific publications, new conceptual and technological tools are needed to help scientists, students, managers and policy-makers to navigate and digest current scientific knowledge. Hi Knowledge is an initiative to synthesise and visualise scientific knowledge, with an initial focus on invasion biology that is cu...
This is the video recording of the presentation "In Need of a Rosetta Stone for (Meta)Data: Learning from Natural Language to improve Semantic and Cognitive Interoperability".
Here, I present the basic ideas about a future Rosetta Stone Framework and a Rosetta Interoperability Service. I give an introduction to the terms "machine-actionability", "interoperability", "semantic interoperability", and "cognitive interoperability" and try to stay as close as possible to (English) natural language statements. I show that we wi...
I give a very basic introduction to the field of ontologies and knowledge graphs. But instead of giving a technical introduction, I start with taking a look what is needed for successfully communicating terms and statements using English and discuss what readability, interpretability, and actionability of terms and statements means for humans. From...
In order to effectively manage the overwhelming influx of data, it is crucial to ensure that data is findable, accessible, interoperable, and reusable (FAIR). While ontologies and knowledge graphs have been employed to enhance FAIRness, challenges remain regarding semantic and cognitive interoperability. We explore how English facilitates reliable...
This is a VERY brief introduction to some of the conceptual prerequisites of FAIR, interoperable, and machine-actionable (meta)data, how this relates to human communication, and how machine-actionability can be achieved in the context of ontologies and knowledge graphs.
Semantic interoperability (SI) is at the heart of the FAIR principles and of the design of large scale cross disciplinary infrastructures. The European Open Science Cloud (EOSC) is a European-wide effort towards such an infrastructure, aiming to deepen the regional research collaboration and realising a shared data space for science, research and i...
The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted...
Knowledge graphs and ontologies provide promising technical solutions for implementing the FAIR Principles for Findable, Accessible, Interoperable, and Reusable data and metadata. However, they also come with their own challenges. Nine such challenges are discussed and associated with the criterion of cognitive interoperability and specific FAIREr...
I give a very basic introduction to the field of ontologies and knowledge graphs. But instead of giving a technical introduction, I start with introducing some essential concepts relating to communication between humans in general, discussing what readability, interpretability, and actionability of terms and statements means for humans. From this f...
The Open Research Knowledge Graph is an infrastructure for the production, curation, publication and use of FAIR scientific information. Its mission is to shape a future scholarly publishing and communication where the contents of scholarly articles are FAIR research data.
Making data and metadata FAIR (Findable, Accessible, Interoperable, Reusable) has become an important objective in research and industry, and knowledge graphs and ontologies have been cornerstones in many going-FAIR strategies. In this process, however, human-actionability of data and metadata has been lost sight of. Here, in the first part, I disc...
Knowledge graphs and ontologies are becoming increasingly important as technical solutions for Findable, Accessible, Interoperable, and Reusable data and metadata (FAIR Guiding Principles). We discuss four challenges that impede the use of FAIR knowledge graphs and propose semantic units as their potential solution. Semantic units structure a knowl...
Background: In times of exponential data growth in the life sciences, machine-supported approaches are becoming increasingly important and with them the need for FAIR (Findable, Accessible, Interoperable, Reusable) and eScience-compliant data and metadata standards. Ontologies, with their queryable knowledge resources, play an essential role in pro...
Knowledge graphs and ontologies are becoming increasingly important in the context of making data and metadata findable, accessible, interoperable, and reusable (FAIR). We introduce the concept of Semantic Units for organizing Knowledge Graphs into identifiable and semantically meaningful subgraphs. Each Semantic Unit is represented in the graph by...
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF...
The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the last 250 years, research on insect systematics has generated hundreds of terms for naming and comparing those phenotypes. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits comput...
A new and uniquely structured matrix of mammalian phenotypes, MaTrics ( Ma mmalian Tr aits for Comparative Genom ics ) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics.
MaTrics was developed within a project aimed to...
Background:
The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic gr...
I will present the concept of bona fide and fiat boundaries and review the problems involved with this approach for demarcating natural units, with examples from the life sciences, focussing on issues revolving around granularity and frames of reference. I will then introduce an alternative approach that focusses on the concept of causal unity.
Video recording of the presentation
My presentation will address some of the conceptual and technical challenges currently faced by developers and users of knowledge graph applications. These include hurdles that need to be overcome in order for users without prior knowledge of semantics to fully exploit the potential of knowledge graphs for themselves. I introduce the ideas of Knowl...
I introduce the concept of semantic units and explain how they can be implemented in a knowledge graph and how they could improve the overall explorability of FAIR knowledge graphs (FAIR+E). I also introduce the concept of a Knowledge Graph Building Blocks (KGBBs) and show screenshots of a Python proof-of-concept prototype web application that uses...
This manual has been developed in the framework of the project ‘Identifying genomic loci underlying mammalian phenotypic variability using Forward Genomics’ which was funded by the Leibniz Gemeinschaft (SAW-2016-SGN-2). The project brought together an interdisciplinary scientific network covering the fields of morphology (Work Modules M1/M2, ‘Pheno...
This document is an edited version of the original funding proposal entitled 'ORKG:
Facilitating the Transfer of Research Results with the Open Research Knowledge Graph'
that was submitted to the European Research Council (ERC) Proof of Concept (PoC) Grant in September 2020 (https://erc.europa.eu/funding/proof-concept). The proposal was evaluated b...
A new and uniquely structured matrix of mammalian phenotypes, MaTrics ( Ma mmalian Tr aits for Comparative Genom ics ) is presented in a digital form. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics.
MaTrics was developed as part of a project to li...
The transfer of knowledge has not changed fundamentally for many hundreds of years: It is usually document-based-formerly printed on paper as a classic essay and nowadays as PDF. With around 2.5 million new research contributions every year, researchers drown in a flood of pseudo-digitized PDF publications. As a result research is seriously weakene...
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heter...
This is a video recording of the presentation.
With the volume of publications growing exponentially each year, it is becoming increasingly important to provide machine-based assistance other than the typical search engines for finding scholarly contents relevant to researchers. The Open Research Knowledge Graph (ORKG) is developing tools for translating unstructured text into machine-actionabl...
The transfer of knowledge has not changed fundamentally for many hundreds of years: It is usually document-based - formerly printed on paper as a classic essay and nowadays as PDF. With around 2.5 million new research contributions every year, researchers drown in a flood of pseudo-digitized PDF publications. As a result research is seriously weake...
With the volume of publications growing exponentially each year, it is becoming increasingly important to provide machine-based tools for finding contents relevant to researchers. This requires translating unstructured text into machine-actionable data, for instance in the form of graphs that contain information from ontologies and that comply with...
This is a kind of introductory talk for my new job, during which I summarize what I have been mainly doing the last 15 years
After briefly talking about how Big Data lead to a new driving force for empirical research, resulting in eScience, I discuss why morphology cannot participate in eScience. I suggest solutions which solve most of morphology's problem that prevent it from participating in eScience. These solutions involve ontologies and semantic technology. I introd...
The landscape of currently existing repositories of specimen data consists of isolated
islands, with each applying its own underlying data model. Using standardized protocols
such as DarwinCore or ABCD, specimen data and metadata are exchanged and published
on web portals such as GBIF.
However, data models differ across repositories. This can lead...
Currently, morphological data and metadata are still mostly published as unstructured free texts, which lack semantic transparency, cannot be parsed by computers, and do not comply with the FAIR (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. (2016) data principles, thus hampering their reuse by non-experts and their integration ac...
We would like to present FAIR Research Data: Semantic Knowledge Graph Infrastructure
for the Life Sciences (in short, FAIR.ReD), a project initiative that is currently being
evaluated for funding. FAIR.ReD is a software environment for developing data
management solutions according to the FAIR (Findable, Accessible, Interoperable, R
eusable; Wilkin...
Most morphological data are still published as unstructured texts. This has far-reaching
consequences for the Findability, Accessibility, Interoperability and Reusability of
morphological data and thus for their FAIRness (Wilkinson et al. 2016). The lack of FAIR
morphological data significantly affects their general usability within the life scienc...
Empirical research has changed considerably in recent decades. Morphology must face several challenges in order to be competitive in the research landscape of the 21st century. With the advent of high-throughput technologies, the amount of data generated daily has grown beyond what a single researcher can explore without the help of machines. As a...
I will start with comparing different approaches of how to model metadata for a given set of data statements and how these approaches can be applied to instance-based (i.e., Instance Anatomy Knowledge Graphs) and class-based (i.e., Semantic Phenotypes) representations of phenotypic data. I will discuss limitations of standard reification and procee...
We demonstrate that ontologies are not restricted to modeling a specific domain, but can be used for programming as well. We introduce the Semantic Programming Ontology (SPrO) and its accompanying Java-based middleware, which we use as a semantic programming language. SPrO provides ontology instances as well as annotation, object, and data properti...
We introduce SOCCOMAS, a development framework for FAIR Semantic Web Content Management Systems (S-WCMS). Each S-WCMS run by SOCCOMAS has its contents managed through a corresponding knowledge base that stores all data and metadata in the
form of semantic knowledge graphs in a Jena tuple store. Automated procedures track provenance, user contributi...
We introduce Semantic Ontology-Controlled application for web Content Management
Systems (SOCCOMAS), a development framework for FAIR (‘findable’, ‘accessible’,
‘interoperable’, ‘reusable’) Semantic Web Content Management Systems (S-WCMSs).
Each S-WCMS run by SOCCOMAS has its contents managed through a corresponding
knowledge base that stores all d...
Background: Currently, almost all morphological data are published as unstructured free text descriptions. This not only brings about terminological problems regarding semantic
transparency, which hampers their re-use by non-experts, but the data cannot be parsed
by computers either, which in turn hampers their integration across many fields in the...
Currently, morphological data are still mostly published as unstructured free text descriptions, which lack semantic transparency and cannot be parsed by computers, thus hampering their re-use by non-experts and their integration across many fields in the life sciences. With an ever-increasing amount of available ontologies and the development of a...
Computer-parsability of data is becoming increasingly important in the age of eScience. Ontologies take in an essential role in providing eScience-compliant data and metadata. Here, we start with taking a closer look at the relationship between a morphologist’s personal perceptions of the anatomy of a given specimen and any textual representation o...
Almost all morphological data are still being published as unstructured textual morphological descriptions. This has far-reaching consequences regarding the findability, accessibility, intelligibility, and comparability of morphological data, substantially impeding their overall usability within the life sciences. However, with an increasing amount...
After a brief introduction to the distinction between bona fide and fiat boundaries and their role in identifying natural units in the material realm, I discuss specific problems relating to their operational criteria that considerably limit their applicability in practical empirical research. I continue with discussing a new approach of characteri...
We present a prototype of a semantic version of Morph·D·Base that is currently in development. It is based on SOCCOMAS, a semantic web content management system that is controlled by a set of source code ontologies together with a Java-based middleware and our Semantic Programming Ontology (SPrO). The middleware interprets the descriptions containe...
We present a prototype of a semantic version of Morph·D·Base that is currently in development. It is based on SOCCOMAS, a semantic web content management system that is controlled by a set of source code ontologies together with a Java-based middleware and our Semantic Programming Ontology (SPrO). The middleware interprets the descriptions containe...
About a decade ago we discussed the Linguistic Problem of Morphology, i.e. lack of standardized morphological terminology and standardized and formalized method of description. Here we report from ongoing development of a description module for Morph∙D∙Base (https://proto.morphdbase.de) that is based on semantic programming technology. The module a...
A significant amount of morphological data is still provided as unstructured text. This is unfortunate in times of eScience in which data are increasingly available in a semantically structured format. Here we report on ongoing development of a new description module for Morph∙D∙Base (https://proto.morphdbase.de) that is based on semantic programmi...
SOCCOMAS is a ready-to-use Semantic Ontology-Controlled Content Management System (http://escience.biowikifarm.net/wiki/SOCCOMAS). Each web content anagement
system (WCMS) run by SOCCOMAS is controlled by a set of ontologies and an
accompanying Java-based middleware with the data housed in a Jena tuple store. The
ontologies describe the behavior of...
SOCCOMAS is a ready-to-use Semantic Ontology-Controlled Content Management System (http://escience.biowikifarm.net/wiki/SOCCOMAS). Each web content management system (WCMS) run by SOCCOMAS is controlled by a set of ontologies and an accompanying Java-based middleware with the data housed in a Jena tuple store. The ontologies describe the behavior o...
Providing data in a semantically structured format has become the gold standard in data science. However, a significant amount of data is still provided as unstructured text - either because it is legacy data or because adequate tools for storing and disseminating data in a semantically structured format are still missing. We have developed a descr...
Conventional approaches to phylogeny reconstruction require a character analysis step prior to and methodologically separated from a numerical tree inference step. The former results in a character matrix that contains the empirical data analysed in the latter. This separation of steps involves various methodological and conceptual problems (e.g. h...
The talk is divided into three parts. The first part introduces the notion of Semantic Instance Anatomies as an ontology-based method of morphological description (that can be expanded to cover also trait descriptions) that has the potential to establish a new theoretical and methodological framework for comparative biology by enabling the quantifi...
Morphology belongs to one of the oldest academic disciplines within the life sciences, but has not received much attention throughout the last decades regarding public interest and research funding. This may be in part due to i) the problems non-experts face when attempting to use morphological data in their research and ii) the widespread image of...
After a brief introduction in which I distinguish two types of real entities (i.e. instances vs. kinds) and their textual representations (i.e. assertional statements vs. universal statements) as well as two types of Anatomy (i.e. instance anatomy vs. canonical anatomy) and their representations in semantic graphs (i.e. ’factual’ morphological desc...
Taxonomists produce a myriad of phenotypic descriptions. Traditionally these are provided in terse (telegraphic) natural language. As seen in parallel within other fields of biology researchers are exploring ways to formalize parts of the taxonomic process so that aspects of it are more computational in nature. The currently used data formalization...
The presentation is in German! This is an introduction to SOCCOMAS, a semantic ontology-controlled content management system application with which you can describe within a set of application ontologies all functions of a Web-CMS application, including the structure and composition of its interface, all input forms and data life-cycle, and the app...
The presentation is in German! In this talk I provide the background for why we are developing a new module for Morph•D•Base with which users will be able to generate highly formalized and computer-readable semantic morphological descriptions through a sophisticated interface. I start with a brief characterization of what morphology is and talk abo...
Ontologies are usually utilized for representing knowledge. Here, we extend this use and demonstrate that ontologies also can be used for describing and controlling semantic Web-Content-Management-Systems (WCMS). We call the resulting application SOCCOMAS: a self-describing and content-independent application for semantic ontology-controlled Web-Co...
Creating an application for recording and documenting morphological data in a semantically transparent and reproducible way used to be a challenging task due to the heterogeneous nature of data within this domain. To provide a system for morphologists and taxonomists to work with their research data, collaborate and publish it, we built a Web-based...
The graphical user interface is the common primary interface between humans and computer systems. In order to make the complex content of a particular-purpose-system in biodiversity science accessible to the user in an intuitive way, it is often necessary to develop new types of interface modules that meet the specific requirements of the domain. T...
Ontologies are usually utilized for representing knowledge. Here, we extend this use and demonstrate that ontologies also can be used for describing and controlling semantic Web-Content-Management-Systems (WCMS). We call the resulting application SOCCOMAS: a self-describing and content-independent application for semantic ontology-controlled Web-Co...
High-throughput technologies enabled us to produce more data than we could manage. Because in many cases adequately analyzing and interpreting these data require the collection of relevant metadata, the amount of information that has to be recorded and subsequently managed is even larger. As a consequence, new technologies and applications for know...
We demonstrate the early prototype of a new module for Morph·D·Base that allows the generation of highly formalized semantic morphological descriptions (http://escience.bio wikifarm.net/wiki/EScience-Compliant_Standards_for_Morphology). The resulting morphological descriptions follow the individuals-based Instance Anatomy data scheme (as opposed to...
Creating an application for recording and documenting morphological data in a semantically transparent and reproducible way used to be a challenging task due to the heterogeneous nature of data within this domain. To provide a system for morphologists and taxonomists to work with their research data, collaborate and publish it, we built a Web-based...
The coding of dependent morphological characters represents a major methodological problem in phylogenetics. Based on a distinction
of semantic and ontological logical character dependency, I suggest how inapplicables can be treated properly and introduce
rules of mutually dependent character states, which specify how character states of one charac...
Questions
Question (1)
We are currently developing a Semantic Programming Ontology (SPrO) together with an accompanying Java middleware structure that functions as a compiler (https://www.researchgate.net/project/Semantic-Programming-Ontology-SPrO). The terms of SPrO can be used for describing a semantic web-based content management system within a set of application ontologies. The Java middleware understands these descriptions as declarative specifications and dynamically executes programming code based on them (https://www.researchgate.net/project/SOCCOMAS-A-Web-Content-Management-System-based-on-a-Semantic-Programming-Ontology).
In other words, we develop some sort of ontology-based programming language. I searched the Web for similar approaches and projects, but could not find anything comparable.
I would like to know if anybody of you came across an ontology-based programming language?