Sebastian Köhler’s research while affiliated with American Dental Association and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (104)


The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics
  • Article

March 2025

·

16 Reads

·

3 Citations

Genetics

·

·

Ray Stefancsik

·

[...]

·

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.


Comprehensive reanalysis for CNVs in ES data from unsolved rare disease cases results in new diagnoses
  • Article
  • Full-text available

October 2024

·

412 Reads

·

3 Citations

npj Genomic Medicine

We report the results of a comprehensive copy number variant (CNV) reanalysis of 9171 exome sequencing datasets from 5757 families affected by a rare disease (RD). The data reanalysed was extremely heterogeneous, having been generated using 28 different enrichment kits by 42 different research groups across Europe partnering in the Solve-RD project. Each research group had previously undertaken their own analysis of the data but failed to identify disease-causing variants. We applied three CNV calling algorithms to maximise sensitivity, and rare CNVs overlapping genes of interest, provided by four partner European Reference Networks, were taken forward for interpretation by clinical experts. This reanalysis has resulted in a molecular diagnosis being provided to 51 families in this sample, with ClinCNV performing the best of the three algorithms. We also identified partially explanatory pathogenic CNVs in a further 34 individuals. This work illustrates the value of reanalysing ES cold cases for CNVs.

Download

The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics

September 2024

·

33 Reads

·

2 Citations

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.


The Human Phenotype Ontology in 2024: phenotypes around the world

November 2023

·

288 Reads

·

144 Citations

Nucleic Acids Research

The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.


Fig. 1 Data filtering workflow for cases and Orphanet data. A Filtering of cases from data freeze 1 cohort extraction, definition of the initial population by redefining solved and unsolved cases. Obtention of the final study population after exclusion of cases with no HPO annotation. B Filtering of clinical entities of Orphanet database for active clinical entities, preparation of the final reference data (ORPHAcodes).
Fig. 2 Whole analytic process: Cases are submitted in RD-Connect Genome-Phenome Analysis Platform (GPAP) then processed for phenotypic similarity calculations. From selected genes, variant candidates detected by GPAP after re-analysis and filtration steps are added to phenotypic/genotypic results. Using this data, Cytoscape JS computes networks capable of providing to clinicians a visual interpretation of cases' clusters.
Fig. 3 Schema of the three complementary approaches A, B and C. A-ORPHAcodes around the triggering case, B-Cases around the triggering case and C-Cases around ORPHAcodes similar to the triggering case.
Phenotypic similarity-based approach for variant prioritization for unsolved rare disease: a preliminary methodological report

November 2023

·

78 Reads

·

6 Citations

European Journal of Human Genetics

Rare diseases (RD) have a prevalence of not more than 1/2000 persons in the European population, and are characterised by the difficulty experienced in obtaining a correct and timely diagnosis. According to Orphanet, 72.5% of RD have a genetic origin although 35% of them do not yet have an identified causative gene. A significant proportion of patients suspected to have a genetic RD receive an inconclusive exome/genome sequencing. Working towards the International Rare Diseases Research Consortium (IRDiRC)’s goal for 2027 to ensure that all people living with a RD receive a diagnosis within one year of coming to medical attention, the Solve-RD project aims to identify the molecular causes underlying undiagnosed RD. As part of this strategy, we developed a phenotypic similarity-based variant prioritization methodology comparing submitted cases with other submitted cases and with known RD in Orphanet. Three complementary approaches based on phenotypic similarity calculations using the Human Phenotype Ontology (HPO), the Orphanet Rare Diseases Ontology (ORDO) and the HPO-ORDO Ontological Module (HOOM) were developed; genomic data reanalysis was performed by the RD-Connect Genome-Phenome Analysis Platform (GPAP). The methodology was tested in 4 exemplary cases discussed with experts from European Reference Networks. Variants of interest (pathogenic or likely pathogenic) were detected in 8.8% of the 725 cases clustered by similarity calculations. Diagnostic hypotheses were validated in 42.1% of them and needed further exploration in another 10.9%. Based on the promising results, we are devising an automated standardized phenotypic-based re-analysis pipeline to be applied to the entire unsolved cases cohort.


The Entity-Quality model enables composing biological attributes in a way that is compatible with the logical definitions of widely used ontologies such as the MP and HPO which are used to document phenotypes associated with diseases or genes. On the right is a specific example of a human phenotype term, “Hypolysinemia” (HP:0500142), which means a lower than normal amount of lysine in the blood. The EQ (phenotypic effect) on the left is not only used to logically define Hypolysinemia, but also the mouse phenotype “decreased circulating lysine level” (MP:0030719). This ensures that an automated reasoner can compute the appropriate relationship between the two (in this case equivalence), as well as to the specific biological attribute they are concrete manifestations of (“blood lysine amount”). Representing phenotype and phenotypic attributes this way enables the grouping of quantitative variant data (e.g. GWAS) and qualitative variant data (e.g. MGI)
Overview of the OBA Workflow. The OBA matching pipeline searches existing trait ontologies for new terms and proposes suitable EQ fillers. The OBA editors curate EQ fillers (new ones and the ones proposed by the matching pipeline). The ODK then compiles the curated terms into OWL and imports all the referenced terms (EQ fillers) from their respective external ontologies, e.g. Uberon, into a special import module
DOS-DP template example. The fillers declared in the template above (attribute, entity) are mapped to the respective column names in the TSV file below. A specialised tool reads both files and generates the axioms specified by the template file
Distribution of OBA attributes across categories and qualities
The Ontology of Biological Attributes (OBA)—computational traits for the life sciences

April 2023

·

201 Reads

·

18 Citations

Mammalian Genome

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.


Figure 1: The Entity-Quality model enables composing biological attributes in a way that is compatible with the logical definitions of widely used ontologies such as the MP and HPO which are used to document phenotypes associated with diseases or genes. On the right is a specific example of a human phenotype term, "Hypolysinemia" (HP:0500142), which means a lower than normal amount of lysine in the blood. The EQ (phenotypic effect) on the left is not only used to logically define Hypolysinemia, but also the mouse phenotype "decreased circulating lysine level" (MP:0030719). This ensures that an automated reasoner can compute the appropriate relationship between the two (in this case equivalence), as well as to the specific biological attribute they are concrete manifestations of ("blood lysine amount"). Representing phenotype and phenotypic attributes this way enables the grouping of quantitative variant data (e.g. GWAS) and qualitative variant data (e.g. MGI).
Figure 3: Distribution of OBA attributes across categories and qualities.
The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences

January 2023

·

205 Reads

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.



Figure 1. Example of mappings between different identifiers representing statements about similarity or identity of concepts across resources and vocabularies. Even with this simplified example, it is possible to see a range of mapping types, and that providing information about each mapping is crucial to understanding the bigger picture. This information helps avoid errors such as mistakenly conflating two variants of a disease.
Figure 2. Example of basic SSSOM mapping model with some illustrative mapping metadata elements.
A Simple Standard for Sharing Ontological Mappings (SSSOM)

May 2022

·

385 Reads

·

77 Citations

Database

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec


Figure 3. An example SSSOM TSV table, with a table header (lines that start with #, shown in purple) that contains the mapping set metadata, followed by the mappings. This example is from the sssom.tsv file for ECTO, the environmental exposure ontology (19).
A Simple Standard for Sharing Ontological Mappings (SSSOM)

December 2021

·

222 Reads

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec.


Citations (75)


... Applying open, community-driven ontologies within the CCR will allow for integration of species-specific terms with precise mappings between terms allowing for cross-species comparisons. [25] . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. ...

Reference:

The missing link: Electronic health record linkage across species offers opportunities for improving One Health
The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics
  • Citing Article
  • March 2025

Genetics

... While this approach may allow to identify more VUS and result in unclear diagnosis, it may allow to identify causative variants in genes beyond restricted gene panel testing. Furthermore, this approach may allow for re-analysis of data to yield future diagnosis when novel genes are identified or tools for variant detection in short read-sequencing data become available [32][33][34]. Second, the cases analyzed in this study were referred for genetic counseling and thus the cohort may be enriched for patients with a suspected family history and/or specific cancer type. For example, a relatively high number of cases in the studied cohort(s) developed breast cancer. ...

Comprehensive reanalysis for CNVs in ES data from unsolved rare disease cases results in new diagnoses

npj Genomic Medicine

... The most widely used DL is the Web Ontology Language (OWL) [15], which also serves as the main exchange language for ontologies such as ChEBI. DL reasoning has been widely employed by a number of other ontologies such as the Gene Ontology [16], the Uberon anatomy [17], the Cell Ontology [18], and several phenotype and disease ontologies [19]. It is used as part of many ontology release pipelines to automatically classify portions of the ontology hierarchy [20]. ...

The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics
  • Citing Preprint
  • September 2024

... Additionally, using Human Phenotype Ontology (HPO) annotations can help identify rare syndromes where neutropenia is not a primary diagnostic feature and involves genes not typically linked to hematological or immunological disorders. 22 Finally, using exome or genome sequencing for patients with suspected CN removes the need for clinicians to have exhaustive knowledge of all immunological disorders or rare syndromes that may include hematological features in their clinical spectrum. This broader approach significantly enhances diagnostic efficiency and accuracy. ...

The Human Phenotype Ontology in 2024: phenotypes around the world
  • Citing Article
  • November 2023

Nucleic Acids Research

... Comparing phenotypic and clinical ("pheno-clinical") data is essential in both research and clinical settings, facilitating informed decisions across a wide range of diseases [1,2]. For instance, the use of similarity matching contributes to the accurate diagnosis of diseases by aligning patient profiles with comparable cases [3][4][5]. Similarity matching also has been used in the development of human disease networks, grouping diseases based on common traits to deepen our understanding of their origins [6]. These analytical methods are key in propelling medical research forward and improving patient care, offering essential insights into a variety of medical conditions. ...

Phenotypic similarity-based approach for variant prioritization for unsolved rare disease: a preliminary methodological report

European Journal of Human Genetics

... Structuring breed (e.g., with VBO), phenotype (e.g., with Ontology for Biological Attributes [49]), and disease data (e.g., with the Mondo Disease Ontology [39]) using ontologies makes clinical and research data computable and better informs clinical decision support tools that help veterinarians prioritize differential diagnoses and diagnostics even if they have not seen a similar case previously. While not sufficient to make treatment decisions, when they use quality data the support tool predictions inform veterinarians and researchers on avenues and directions for additional investigations. ...

The Ontology of Biological Attributes (OBA)—computational traits for the life sciences

Mammalian Genome

... The increasing use of genomic tests in the clinical practice has revealed another attractive outcome for unresolved cases, as periodic reanalysis of available genomic data can identify new disease-causing genes, even after years, and finally reveal the definitive diagnosis [4][5][6][7][8]. In recent years, great efforts have been made in the field of pediatric rare disease diagnostics, clearly suggesting WES as a first-line approach in the routine clinical practice [3,[9][10][11][12]. ...

Solving patients with rare diseases through programmatic reanalysis of genome-phenome data

... Due to the inherent fragmentation of rare disease individuals across institutions and health systems, interoperability is essential for enabling consistent data interpretation and exchange 6 . Adoption of medical ontologies, Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR), and the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema designed by the Monarch Initiative enables reliable data exchange, supporting registries, secondary data use, and precision medicine [6][7][8][9] . However, many hospital information systems either lack rare disease-specific data or do not support the integration of these standards natively, posing barriers to clinical and translational use 10 . ...

The GA4GH Phenopacket schema defines a computable representation of clinical data
  • Citing Article
  • June 2022

Nature Biotechnology

... The resulting dataset includes many ChEBI classes that are rarely used in the biochemical literature, so we then created a biologist "slim" of C3POv237, intended to capture the most biologically relevant subset. To do this, we used mappings [48] as a proxy, for biological relevance, on the assumption that classes that are mapped to metabolomics identifiers or identifiers in pathway databases such as KEGG [49] are more likely to be relevant. ChEBI also includes mappings to PubMed and Wikipedia, and we assume these correspond more to relevant classes. ...

A Simple Standard for Sharing Ontological Mappings (SSSOM)

Database