Stefan Schulz

Stefan Schulz
Medical University of Graz · Institute of Medical Computer Sciences, Statistics and Documentation

Doctor of Medicine

About

397
Publications
62,636
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,228
Citations
Introduction
Stefan Schulz holds a full professiorship at the Institute of Medical Computer Sciences, Statistics and Documentation, Medical University of Graz, Austria. Stefan does research in Artificial Intelligence and Computing applied to health care and life sciences. Stefan Schulz works in addition at Averbis GmbH, Freiburg, Germany as head of medical research projects. Averbis is a leading provider of NLP solutions tailored to medical language in Germany.
Additional affiliations
August 2018 - present
Averbis GmbH
Averbis GmbH
Position
  • Head of Department
January 2011 - December 2012
January 2011 - present
Federal University of Pernambuco

Publications

Publications (397)
Chapter
Full-text available
Transfer learning has demonstrated its potential in natural language processing tasks, where models have been pre-trained on large corpora and then tuned to specific tasks. We applied pre-trained transfer models to a Spanish biomedical document classification task. The main goal is to analyze the performance of text classification by clinical speci...
Article
Full-text available
Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical t...
Article
Full-text available
Early identification of patients with life-threatening risks such as delirium is crucial in order to initiate preventive actions as quickly as possible. Despite intense research on machine learning for the prediction of clinical outcomes, the acceptance of the integration of such complex models in clinical routine remains unclear. The aim of this s...
Article
Objective: Machine learning models trained on electronic health records have achieved high prognostic accuracy in test datasets, but little is known about their embedding into clinical workflows. We implemented a random forest-based algorithm to identify hospitalized patients at high risk for delirium, and evaluated its performance in a clinical s...
Article
Full-text available
Word embeddings have become the predominant representation scheme on a token-level for various clinical natural language processing (NLP) tasks. More recently, character-level neural language models, exploiting recurrent neural networks, have again received attention, because they achieved similar performance against various NLP benchmarks. We inve...
Article
Processes like the care of type 2 diabetes mellitus patients require support by information systems considering the heterogeneity of the actors from different domains involved, enabling harmonization and integration of their specific methodologies and knowledge representation approaches towards interdisciplinary cooperation. Currently, the developm...
Article
Full-text available
Acronyms frequently occur in clinical text, which makes their identification, disambiguation and resolution an important task in clinical natural language processing. This paper contributes to acronym resolution in Spanish through the creation of a set of sense inventories organized by clinical specialty containing acronyms, their expansions, and c...
Article
Frequent utilization of the Intensive Care Unit (ICU) is associated with higher costs and decreased availability for patients who urgently need it. Common risk assessment tool, like the ASA score, lack objectivity and do account only for some influencing parameters. The aim of our study was (1) to develop a reliable machine learning model predictin...
Article
Full-text available
Background The amount of patient-related information within clinical information systems accumulates over time, especially in cases where patients suffer from chronic diseases with many hospitalizations and consultations. The diagnosis or problem list is an important feature of the electronic health record, which provides a dynamic account of a pat...
Chapter
Full-text available
Clinical data interoperability requires shared specifications of meaning. This is the rationale for clinical data standards. Up until now, the adoption of such standards has been varied, although they are increasingly advocated in an area where proprietary specifications prevail, and semantic resources are geared to specific purposes and limited by...
Article
Clinical information systems contain free-text entries in different contexts to be used in a variety of application scenarios. In this study we investigate to what extent diagnosis codes using the disease classification system ICD-10 can be automatically post-assigned to patient-based short problem list entries, (50 characters maximum). Classifiers...
Article
Full-text available
SNOMED CT provides about 300,000 codes with fine-grained concept definitions to support interoperability of health data. Coding clinical texts with medical terminologies it is not a trivial task and is prone to disagreements between coders. We conducted a qualitative analysis to identify sources of disagreements on an annotation experiment which us...
Conference Paper
Full-text available
Ontology engineering is error-prone, and many published ontologies suffer from quality problems. This paper initiates a discussion about how axiomatically rich foundational ontologies can contribute to prevent and to detect bad ontology design. Examples T-boxes are presented, and it is demonstrated how typical design errors can be detected by upper...
Article
Abstract Background Semantic interoperability of eHealth services within and across countries has been the main topic in several research projects. It is a key consideration for the European Commission to overcome the complexity of making different health information systems work together. This paper describes a study within the EU-funded project...
Article
Terminologies facilitate data exchange and enable laboratories to assist in patient care even if complex treatment pathways involve multiple stakeholders. This paper examines the three common terminologies Nomenclature for Properties and Units (NPU), Logical Observation Identifiers Names and Codes (LOINC), and SNOMED Clinical Terms (SNOMED CT). The...
Article
Full-text available
Background Terminologies facilitate data exchange and enable laboratories to assist in patient care even if complex treatment pathways involve multiple stakeholders. This paper examines the three common terminologies Nomenclature for Properties and Units (NPU), Logical Observation Identifiers Names and Codes (LOINC), and SNOMED Clinical Terms (SNOM...
Article
Full-text available
Patients with multiple disorders usually have long diagnosis lists, constitute by ICD-10 codes together with individual free-text descriptions. These text snippets are produced by overwriting standardized ICD-Code topics by the physicians at the point of care. They provide highly compact expert descriptions within a 50-character long text field fre...
Article
The terminological content of SNOMED CT, the world's largest clinical terminology is linked to description logics expressions, which give support to consider SNOMED CT a formal ontology. The Terminology Quality Assurance (TQA) of such a terminology resource is hampered by errors in modeling, which act as a barrier for the successful use of electron...
Poster
Full-text available
The TREC Conference held in 2017 the Precision Medicine Track with the challenge of finding relevant documents from two collections, namely biomedical abstracts and clinical trials, given a set of 30 input topics representing cancer patients. We proposed a free and open-source (FOSS) Java framework for design, testing, and validation of ranking str...
Article
An important SNOMED CT use case is to support semantic interoperability between electronic health records and aggregation terminologies such as ICD. From the ongoing alignment exercise between SNOMED CT and the new version of ICD, now in its pre-final form, we studied whether the ambiguity of clinical language as displayed by SNOMED CT synonyms ham...
Article
Organised repositories of published scientific literature represent a rich source for research in knowledge representation. MEDLINE, one of the largest and most popular biomedical literature databases, provides metadata for over 24 million articles each of which is indexed using the MeSH controlled vocabulary. In order to reuse MeSH annotations for...
Article
The use of electronic health records for risk prediction models requires a sufficient quality of input data to ensure patient safety. The aim of our study was to evaluate the influence of incorrect administrative diabetes coding on the performance of a risk prediction model for delirium, as diabetes is known to be one of the most relevant variables...
Article
Clinical models are artefacts that specify how information is structured in electronic health records (EHRs). However, the makeup of clinical models is not guided by any formal constraint beyond a semantically vague information model. We address this gap by advocating ontology design patterns as a mechanism that makes the semantics of clinical mode...
Conference Paper
Full-text available
In this paper we report on our participation in the TREC 2017 Precision Medicine track (team name: imi_mug ). We submitted 5 fully automatic runs to both the biomedical articles and clinical trials subtasks, focusing strongly on the former. Our system was based on Elasticsearch, whose queries were generated modularly via our own open source framewo...
Conference Paper
Full-text available
Historically, numerous indirect references to real world phe- nomena have been conserved in literature. High-quality li- braries of digitized books and their derivatives (like the Google NGram Viewer) have proliferated. These tools simpli- fy the visualization of trends in phrase usage within the collec- tive memory of language groups. A straightfo...
Article
Full-text available
Clinical narratives are typically produced under time pressure, which incites the use of abbreviations and acronyms. To expand such short forms in a correct way eases text comprehension and further semantic processing. We propose a completely unsupervised and data-driven algorithm for the resolution of non-lexicalised and potentially ambiguous abbr...
Article
Full-text available
Background: Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontolo...
Conference Paper
The emergence of electronic health records has highlighted the need for semantic standards for representation of observations in laboratory medicine. Two such standards are LOINC, with a focus on detailed encoding of lab tests, and SNOMED CT, which is more general, including the representation of qualitative and ordinal test results. In this paper...
Article
Full-text available
SNOMED CT supports post-coordination, a technique to combine clinical concepts to ontologically define more complex concepts. This technique follows the validity restrictions defined in the SNOMED CT Concept Model. Pre-coordinated expressions are compositional expressions already in SNOMED CT, whereas post-coordinated expressions extend its content...
Article
BioTop is a domain upper level ontology for the life sciences, based on OWL DL, introduced ten years ago. This paper provides an update of the current state of this resource, with a special focus on BioTop's top level, BioTopLite, which currently contains 55 classes, 37 object properties and 247 description logics axioms. A bridging file allows har...
Article
'A solid ontology-based analysis with a rigorous formal mapping for correctness' is one of the ten reasons why the HL7 standard Fast Healthcare Interoperability Resources (FHIR) is advertised to be better than other standards for EHR interoperability. In this paper, we aim at contributing to this formal analysis by proposing an RDF representation o...
Article
Full-text available
Historically, numerous indirect references to real world phenomena have been conserved in literature. High-quality libraries of digitized books and their derivatives (like the Google NGram Viewer) have proliferated. These tools simplify the visualization of trends in phrase usage within the collective memory of language groups. A straightforward in...
Article
The time has come to end unproductive competitions among different types of biomedical terminology artefacts. Tools and strategies to create the foundation of a seamless environment covering clinical jargon, clinical terminologies, and classifications are necessary. Whereas language processing relies on human interface terminologies, which represen...
Article
Routine patient data in electronic patient records are only partly structured, and an even smaller segment is coded, mainly for administrative purposes. Large parts are only available as free text. Transforming this content into a structured and semantically explicit form is a prerequisite for querying and information extraction. The core of the sy...
Conference Paper
Full-text available
Clinical narratives in electronic health record systems are a rich resource of patient-based in- formation. They constitute an ongoing challenge for natural language processing, due to their high compactness and abundance of short forms. German medical texts exhibit numerous ad-hoc abbreviations that terminate with a period character. The disambigu...
Article
Full-text available
Background Objectives of this work are to (1) present an ontological framework for the TNM classification system, (2) exemplify this framework by an ontology for colon and rectum tumours, and (3) evaluate this ontology by assigning TNM classes to real world pathology data. Methods The TNM ontology uses the Foundational Model of Anatomy for anatomic...
Article
Full-text available
Unprincipled modeling decisions in large-domain ontologies, such as SNOMED CT, are problematic and might act as a barrier for their quality assurance and successful use in electronic health records. Most previous work has focused on clustering problematic concepts, which is helpful for quality control but faces difficulties in pinpointing the origi...
Conference Paper
Full-text available
Motivation: In general, the meaning of biological database records is not sufficiently specified from an ontological point of view. We explore the options for an ontology-based integration and interpretation of database content of individuals, defined classes, dispositions and a combination of these. Results: Four interpretation models are created,...
Article
Full-text available
Background In biomedical applications where the size and complexity of SNOMED CT become problematic, using a smaller subset that can act as a reasonable substitute is usually preferred. In a special class of use cases—like ontology-based quality assurance, or when performing scaling experiments for real-time performance—it is essential that modules...
Conference Paper
Full-text available
ASSESS CT is an EU funded project that aims at contributing to better semantic interoperability of eHealth services in Europe. Its main goal is the investigation of the fitness of the international clinical terminology SNOMED CT as a potential standard for EU-wide eHealth deployments. This panel will report on the current use of SNOMED CT's in Euro...
Conference Paper
Medical natural language statements uttered by physicians are usually graded, i.e., are associated with a degree of uncertainty about the validity of a medical assessment. This uncertainty is often expressed through specific verbs, adverbs, or adjectives in natural language. In this paper, we look into a representation of such graded statements by...
Article
The vast amount of clinical data in electronic health records constitutes a great potential for secondary use. However, most of this content consists of unstructured or semi-structured texts, which is difficult to process. Several challenges are still pending: medical language idiosyncrasies in different natural languages, and the large variety of...
Article
The construction and publication of predications form scientific literature databases like MEDLINE is necessary due to the large amount of resources available. The main goal is to infer meaningful predicates between relevant co-occurring MeSH concepts manually annotated from MEDLINE records. The resulting predications are formed as subject-predicat...
Article
Current systems that target Patient Safety (PS) like mandatory reporting systems and specific vigilance reporting systems share the same information types but are not interoperable. Ten years ago, WHO embarked on an international project to standardize quality management information systems for PS. The goal is to support interoperability between di...
Article
Big data resources are difficult to process without a scaled hardware environment that is specifically adapted to the problem. The emergence of flexible cloud-based virtualization techniques promises solutions to this problem. This paper demonstrates how a billion of lines can be processed in a reasonable amount of time in a cloud-based environment...
Article
It is investigated whether the content of the Joint Linearization for Mortality and Morbidity Statistics of the 11th ICD revision can be semantically represented by formalisms acting on the clinical terminology SNOMED CT, viz. the IHTSDO Compositional Grammar (CG) and the Expression Constraint Language (ECL). Whereas CG provides a composition synta...
Article
Full-text available
The Internet and social media are becoming ubiquitous technologies that are transforming the health sector. Social media has become an avenue for accessing, creating and sharing health information among patients and healthcare professionals. Furthermore, social media has become a key feature in many eHealth solutions, including wearable technologie...
Article
The goal of this work is to contribute to a smooth and semantically sound inter-operability between the ICD-11 (International Classification of Diseases-11th revision Joint Linearization for Mortality, Morbidity and Statistics) and SNOMED CT (SCT). To guarantee such inter-operation between a classification, characterized by a single hierarchy of mu...
Article
Full-text available
The identification of relevant predicates between co-occurring concepts in scientific literature databases like MEDLINE is crucial for using these sources for knowledge extraction, in order to obtain meaningful biomedical predications as subject-predicate-object triples. We consider the manually assigned MeSH indexing terms (main headings and subhe...
Article
We here describe JuFiT, an easily adjustable rule engine which allows to filter non-natural terms (i.e., ones usually not occurring in running citation texts) from the Umls metathesaurus and even adds new terms to the UMLS (by rewriting non-natural terms). Unlike previous attempts (with MetaMap or Casper), JuFiT serves multilingual purposes in that...
Conference Paper
The objectives of this work are (1) to develop a classifier application for tumor staging based on a formal representation of the Tumor-Node-Metastasis classification system (TNM), and (2) to show the feasibility of this approach on real data. This paper presents a classifier application for colorectal tumors based on the TNM-O ontology. It was dev...
Article
Full-text available
Due to fundamental differences in design and editorial policies, semantic interoperability between two de facto standard terminologies in the healthcare domain – the International Classification of Diseases (ICD) and SNOMED CT (SCT), requires combining two different approaches: (i) axiom-based, which states logically what is universally true, using...
Article
This study introduces ontological aspects concerning the Telehealth Ontology (TEON), an ontology that represents formal-ontological content concerning the delivery of telehealth services. TEON formally represents the main services, actors and other entity types relevant to telehealth service delivery. TEON uses the upper level ontology BioTopLite2...
Article
The massive accumulation of biomedical knowledge is reflected by the growth of the literature database MEDLINE with over 23 million bibliographic records. All records are manually indexed by MeSH descriptors, many of them refined by MeSH subheadings. We use subheading information to cluster types of MeSH descriptor co-occurrences in MEDLINE by proc...
Article
Full-text available
The integration of heterogeneous ontologies is often hampered by different upper level categories and relations. We report on an on-going effort to align clinical terminology/ontology SNOMED CT with the formal upper-level ontology BioTopLite. This alignment introduces several constraints at the OWL-DL level. The mapping was done manually by analysi...
Article
Full-text available
In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms. The problem is addressed by two binary classifier...
Article
Full-text available
A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or t...
Article
Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool o...
Article
Full-text available
Quality management information systems for safety as a whole or for specific vigilances share the same information types but are not interoperable. An international initiative tries to develop an integrated information model for patient safety and vigilance reporting to support a global approach of heath care quality.
Article
Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pa...
Article
Chronic diseases such as Type 2 Diabetes Mellitus (T2DM) constitute a big burden to the global health economy. T2DM Care Management requires a multi-disciplinary and multi-organizational approach. Because of different languages and terminologies, education, experiences, skills, etc., such an approach establishes a special interoperability challenge...
Article
Full-text available
Semantic Interoperability, i.e., Preserving the meaning among health related data, is one of the crucial topics of Health Informatics. The International Classification of Diseases by WHO and SNOMED-CT, by IHTSDO, are the most prominent systems currently available for coding health data. In 2010 a collaboration agreement between the maintainers of I...
Article
To improve semantic interoperability of electronic health records (EHRs) by ontology-based mediation across syntactically heterogeneous representations of the same or similar clinical information. Our approach is based on a semantic layer that consists of: (1) a set of ontologies supported by (2) a set of semantic patterns. The first aspect of the...