ArticlePDF Available

Abstract and Figures

Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, research institutions and IT companies. SMITH's goals are to establish Data Integration Centers (DICs) at each SMITH partner hospital and to implement use cases which demonstrate the usefulness of the approach. Objectives: To give insight into architectural design issues underlying SMITH data integration and to introduce the use cases to be implemented. Governance and policies: SMITH implements a federated approach as well for its governance structure as for its information system architecture. SMITH has designed a generic concept for its data integration centers. They share identical services and functionalities to take best advantage of the interoperability architectures and of the data use and access process planned. The DICs provide access to the local hospitals' Electronic Medical Records (EMR). This is based on data trustee and privacy management services. DIC staff will curate and amend EMR data in the Health Data Storage. Methodology and architectural framework: To share medical and research data, SMITH's information system is based on communication and storage standards. We use the Reference Model of the Open Archival Information System and will consistently implement profiles of Integrating the Health Care Enterprise (IHE) and Health Level Seven (HL7) standards. Standard terminologies will be applied. The SMITH Market Place will be used for devising agreements on data access and distribution. 3LGM2 for enterprise architecture modeling supports a consistent development process.The DIC reference architecture determines the services, applications and the standardsbased communication links needed for efficiently supporting the ingesting, data nourishing, trustee, privacy management and data transfer tasks of the SMITH DICs. The reference architecture is adopted at the local sites. Data sharing services and the market place enable interoperability. Use cases: The methodological use case "Phenotype Pipeline" (PheP) constructs algorithms for annotations and analyses of patient-related phenotypes according to classification rules or statistical models based on structured data. Unstructured textual data will be subject to natural language processing to permit integration into the phenotyping algorithms. The clinical use case "Algorithmic Surveillance of ICU Patients" (ASIC) focusses on patients in Intensive Care Units (ICU) with the acute respiratory distress syndrome (ARDS). A model-based decision-support system will give advice for mechanical ventilation. The clinical use case HELP develops a "hospital-wide electronic medical record-based computerized decision support system to improve outcomes of patients with blood-stream infections" (HELP). ASIC and HELP use the PheP. The clinical benefit of the use cases ASIC and HELP will be demonstrated in a change of care clinical trial based on a step wedge design. Discussion: SMITH's strength is the modular, reusable IT architecture based on interoperability standards, the integration of the hospitals' information management departments and the public-private partnership. The project aims at sustainability beyond the first 4-year funding period.
Content may be subject to copyright.
© Schattauer 2018 Methods Inf Med Open 01/2018
Smart Medical Information
Technology for Healthcare (SMITH)*
Data Integration based on Interoperability Standards
Alfred Winter1; Sebastian Stäubert1; Danny Ammon2; Stephan Aiche3; Oya Beyan4;
Verena Bischoff5; Philipp Daumke6; Stefan Decker4; Gert Funkat7; Jan E. Gewehr8;
Armin de Greiff9; Silke Haferkamp10; Udo Hahn11; Andreas Henkel2; Toralf Kirsten12;
Thomas Klöss13; Jörg Lippert14; Matthias Löbe1; Volker Lowitsch10; Oliver Maassen15;
Jens Maschmann16; Sven Meister17; Rafael Mikolajczyk18; Matthias Nüchter12; Mathias
W. Pletz19; Erhard Rahm20; Morris Riedel21; Kutaiba Saleh2; Andreas Schuppert22; Stefan
Smers7; André Stollenwerk23; Stefan Uhlig24; Thomas Wendt25; Sven Zenker26; Wolfgang
Fleig27,**; Gernot Marx15,**; André Scherag28, 29,**; Markus Löffler1,**
1Leipzig University, Institute of Medical Informatics, Statistics and Epidemiology, Leipzig, Germany;
2University Medical Center Jena, Central Service Provider For Information Technology, Jena, Germany;
3SAP SE, Potsdam, Germany;
4RWTH Aachen University, Chair of Computer Science 5, Aachen, Germany;
5University of Leipzig Medical Center, Division Staff and Justice, Leipzig, Germany;
6Averbis GmbH, Freiburg, Germany;
7University of Leipzig Medical Center, Division Information Management, Leipzig, Germany;
8University Medical Center Hamburg-Eppendorf, Business Division for Information Technology, Hamburg, Germany;
9Essen University Hospital, Central Information Technology, Essen, Germany;
10RWTH Aachen University Hospital, Division Information Technology, Aachen, Germany;
11Friedrich-Schiller-Universität Jena, Language & Information Engineering Lab (JULIE Lab), Jena, Germany;
12Leipzig University, LIFE Research Centre for Civilization Diseases, Leipzig, Germany;
13Martin-Luther-Universität Halle-Wittenberg Medical Center, Medical Director, Halle, Germany;
14Bayer AG, Wuppertal, Germany;
15RWTH Aachen University Hospital, Department of Intensive Care and Intermediate Care, Aachen, Germany;
16University Medical Center Jena, Medical Director, Jena, Germany;
17Fraunhofer Institute for Software and Systems Engineering, Dortmund, Germany;
18Martin-Luther-Universität Halle-Wittenberg, Institute of Medical Epidemiology, Biometry and Informatics, Halle,
19University Medical Center Jena, Institute of Infectious Diseases and Infection Control, Jena, Germany;
20Leipzig University, Department of Computer Science – Database Group, Leipzig, Germany;
21Forschungszentrum Jülich, Jülich Supercomputing Centre, Jülich, Germany;
22RWTH Aachen University, Institute for Computational Biomedicine II, Aachen, Germany;
23RWTH Aachen University, Informatik 11 – Embedded Software, Aachen, Germany;
24RWTH Aachen University, Medical Faculty, Dean, Aachen, Germany;
25University of Leipzig Medical Center, Data Integration Center, Leipzig, Germany;
26University of Bonn Medical Center, Department of Anesthesiology and Intensive Care Medicine, Bonn, Germany;
27University of Leipzig Medical Center, Medical Director, Leipzig, Germany;
28University Medical Center Jena, Center for Sepsis Control and Care, Jena, Germany;
29University Medical Center Jena, Institute of Medical Statistics, Computer and Data Sciences (IMSID), Jena, Germany
Health information interoperability, inte-
grated advanced information management
systems, systems integration, phenotyping,
intensive care, antimicrobial stewardship
Introduction: This article is part of the
Focus Theme of Methods of Information in
Medicine on the German Medical Inform -
atics Initiative. “Smart Medical Information
Technology for Healthcare (SMITH)” is one
Correspondence to:
Prof. Alfred Winter
Leipzig University
Institute of Medical Informatics, Statistics
and Epidemiology
Haertelstr. 16-18
04107 Leipzig
Methods Inf Med 2018; 57(Open 1): e92– e105
received: March 5, 2018
accepted: May 7, 2018
This work has been supported by German Federal
Ministry of Education and Research (Grant No’s.
01ZZ1609A, 01ZZ1609B, 01ZZ1609C, 01ZZ1803A,
01ZZ1803B, 01ZZ1803C, 01ZZ1803D, 01ZZ1803E,
01ZZ1803F, 01ZZ1803G, 01ZZ1803H, 01ZZ1803I,
01ZZ1803J, 01ZZ1803K, 01ZZ1803L, 01ZZ1803M,
* Supplementary material published on our
** Shared senior authorship
Focus Theme – Original Articles
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
of four consortia funded by the German
Medical Informatics Initiative (MI-I) to create
an alliance of universities, university hospit-
als, research institutions and IT companies.
SMITH‘s goals are to establish Data Inte-
gration Centers (DICs) at each SMITH partner
hospital and to implement use cases which
demonstrate the usefulness of the approach.
Objectives: To give insight into architectural
design issues underlying SMITH data inte-
gration and to introduce the use cases to be
Governance and Policies: SMITH imple-
ments a federated approach as well for its
governance structure as for its information
system architecture. SMITH has designed a
generic concept for its data integration
centers. They share identical services and
functionalities to take best advantage of the
interoperability architectures and of the data
use and access process planned. The DICs
provide access to the local hospitals’ Elec-
tronic Medical Records (EMR). This is based
on data trustee and privacy management ser-
vices. DIC staff will curate and amend EMR
data in the Health Data Storage.
Methodology and Architectural Frame-
work: To share medical and research data,
SMITH’s information system is based on com-
munication and storage standards. We use
the Reference Model of the Open Archival In-
formation System and will consistently imple-
ment profiles of Integrating the Health Care
Enterprise (IHE) and Health Level Seven (HL7)
standards. Standard terminologies will be ap-
plied. The SMITH Market Place will be used
for devising agreements on data access and
distribution. 3LGM² for enterprise architec-
ture modeling supports a consistent develop-
ment process.
The DIC reference architecture determines the
services, applications and the standards-
based communication links needed for effi-
ciently supporting the ingesting, data nour-
ishing, trustee, privacy management and data
transfer tasks of the SMITH DICs. The refer-
ence architecture is adopted at the local sites.
Data sharing services and the market place
enable interoperability.
Use Cases: The methodological use case
“Phenotype Pipeline“ (PheP) constructs algo-
rithms for annotations and analyses of pa-
tient-related phenotypes according to clas-
sification rules or statistical models based on
structured data. Unstructured textual data
will be subject to natural language process-
ing to permit integration into the phenotyp-
ing algorithms. The clinical use case “Algorith-
mic Surveillance of ICU Patients” (ASIC) fo-
cusses on patients in Intensive Care Units
(ICU) with the acute respiratory distress syn-
drome (ARDS). A model-based decision-sup-
port system will give advice for mechanical
ventilation. The clinical use case HELP devel-
ops a “hospital-wide electronic medical rec-
ord-based computerized decision support sys-
tem to improve outcomes of patients with
blood-stream infections” (HELP). ASIC and
HELP use the PheP. The clinical benefit of the
use cases ASIC and HELP will be demon-
strated in a change of care clinical trial based
on a step wedge design.
Discussion: SMITH’s strength is the modu-
lar, reusable IT architecture based on interop-
erability standards, the integration of the hos-
pitals’ information management departments
and the public-private partnership. The pro-
ject aims at sustainability beyond the first
4-year funding period.
1. Introduction and
“Using the wealth of data for improved
patient care” [1] is the motto motivating
the German Federal Ministry of Education
and Research for funding the projects in
the German Medical Informatics Initiative
(MI-I) and introduced in this special issue.
Better usage of the wealth of data can con-
tribute to validating the relevance of novel
diagnostic and therapeutic methods as well
as to directly improving care. Patient care
would benefit from the availability of
actionable, data-driven decision support
standardized across care delivering organ-
izations, e.g. to optimize diagnosis and
therapy in intensive care patients with lung
failure or antibiotic therapy of patients with
blood-stream infections.
Taking up the initiative of the German
Federal Ministry of Education and Re-
search, Leipzig University and University
Hospital Leipzig, Jena University Hospital
and Friedrich-Schiller-University Jena,
University Hospital RWTH Aachen and
RWTH Aachen University founded the
“Smart Medical Information Technology
for Healthcare (SMITH)” consortium in
order to better use the wealth of data for
the sake of patients. In addition, the follow-
ing research organizations and companies
joined the initial SMITH set-up: SAP SE,
Fraunhofer Institute for Software and Sys-
tems Engineering, Bayer AG, März Inter-
network Services AG, Averbis GmbH, ID
GmbH & Co. KGaA, Forschungszentrum
Jülich – Jülich Supercomputing Centre. In
a second consolidation phase, University
Hospitals Halle, Bonn, Hamburg and Essen
completed the SMITH consortium.
The SMITH consortium has particu-
larly set the following goals to be reached
until 2022:
to implement an overarching concept of
data sharing and data ownership;
to establish synchronized Data Inte-
gration Centers (DIC) at each SMITH
partner hospital. They will have two
major responsibilities: (1) implement
the interoperability processes and the
architecture of the SMITH transinstitu-
tional information system SMItHIS; (2)
provide data curation, data sharing and
trustee functions;
to implement three use cases in order to
demonstrate the usefulness of the DICs.
This paper’s objective is to give insight
into fundamental design issues underlying
SMITH. It will especially introduce the
governance of SMITH and its basic pol-
icies, explain the overall SMItHIS architec-
ture to be implemented and the methods
used, and will give more details of the use
cases to be implemented in order to show
the feasibility of the architecture.
2. Governance and Policies
The SMITH consortium’s governance
structure as well as the SMItHIS architec-
ture results from a strictly federated ap-
proach. Instead of one central DIC, local
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
DICs are coordinated by appropriate com-
mittees. These committees aim at integrat-
ing the interests and activities of patient
care, i.e. the member hospitals, of research,
i.e. the member medical faculties and non-
university research institutions, and of the
industrial partners. Therefore, leadership
as well as means for participation are care-
fully balanced.
In this governance system, the Super -
visory Board bears responsibility for the
SMITH consortium and the overall project.
This board nominates the delegates for the
National Steering Committee of the MI-I,
which coordinates the activities of all fund -
ed consortia in the MI-I. The Supervisory
Board appoints the Executive Board.
The Executive Board bears the scientific
responsibility for SMITH and steers the
project. It is supported by the External Ad-
visory Board whose members are selected
international experts in the field.
The Coordination Board represents
delegates from all sub-projects and from all
partners and coordinates their work.
Besides local management units at
each partner’s site, there is a consortium
management unit for SMITH located in
Leipzig. The consortium management unit
takes care of operations of SMITH as a
whole and is supported by the local man-
agement units. Operations include the
entire project management, controlling,
contracting, public relations and com-
Based on the governance structure,
functionally identical DICs will be set up in
Leipzig, Jena, Aachen and, subsequently, in
Halle, Bonn, Hamburg and Essen. DIC
staff will have the same job descriptions
and will use common standard operating
procedures. The DICs will adhere to the
following policies:
Local access to the HIS: The DICs have
to provide access to the data of the Elec-
tronic Medical Records (EMR) in the
local hospital information systems
(HIS). This access is subject to German
legislation on the use of patient data and
established data protection and privacy
regulations. Each DIC will analyze and
annotate patient data on an individual
basis within the hospital. Authorized
local DIC staff will have access to the
EMR in the locally operational HIS sys-
tem. This requires the DIC to be organ-
izationally linked to the hospital and
technically interoperable with its in-
formation system and with the clinical
documentation procedures (cf. Data
and Metadata Transfer Management
(DMT) in section 3.2.1).
Trustee and Consent for Research: Ger-
many has strict regulations concerning
data protection and privacy. Research
on patient data (with local amendments,
annotations etc.) is only possible if pa-
tient consent is provided. We anticipate
that multilevel consent will be provided.
This will range from consenting to data
being shared regarding anonymous or
pseudonymous healthcare data to ad -
ditional consent for specific research
projects. The DIC provides the relevant
services for the trustee center (cf. Data
Trustee and Privacy Management
(DTP) in section 3.2.1).
Health Data Storage (HDS) to provide
electronic health records (EHR): The
local HISs typically exhibit different IT
architectures and consist of numerous
specialized application components,
which are also currently being refined.
However, functionally identical Health
Data Storages will be established in the
participating institutions to maintain
standardized and interoperable EMRs
curated and amended by the DIC staff
(cf. SMItHIS architecture in section
Organization: Overall management pol-
icies will be identical across the DICs.
However, we will have local sub-policies
on organizational embedding.
In order to further strengthen Medical In-
formatics, education in Health and Bio-
medical Informatics at different levels will
be driven by systematic, evidence-based
and outcome-oriented curriculum devel-
opment. This process is concerted by the
Joint Expertise Center for Teaching
(SMITH-JET) as part of the SMITH gov-
ernance structure.
3. Architectural Framework
and Methodology
3.1 Methodology
3.1.1 Open Archival Information
System (OAIS)
Our approach to data integration centers
and their organizational structure as well as
their tasks is inspired by the Reference
Model for an Open Archival Information
System (OAIS) [2]. This model provides a
framework, including terminology and
concepts, for describing and comparing
architectures and operations of archives
and thus for sharing their content. OAIS is
the most common standard for archival
organizations (ISO Standard 14721:2012).
OAIS conformant archives have to take
care of proper ingest of Submission In-
formation Packages (“raw data”) from
producers and their transformation to
Archival Information Packages (“metada-
ta-enriched data”), which are processed by
Data Management and Archival Storage.
Consumers order or query Dissemination
Information Packages (“exported data”)
which are processed by the archive. OAIS is
helpful in structuring tasks within the DIC
without prescribing implementation de-
tails. It is already used as guideline for data
and model sharing at the LIFE Research
Center for Civilization Diseases [3, 4, 5]
and the Leipzig Health Atlas project [6].
3.1.2 Three Layer Graph Based Meta
Model (3LGM²) for Enterprise
Architecture Modeling
In order to provide an integrated and con-
sistent view of the design of the entire sys-
tem of data processing and data exchange
within SMITH and beyond, we decided to
use an enterprise architecture modeling
approach. By using the three layer graph
based meta model 3LGM for modeling
health information systems [7], especially
transinstitutional information systems [8]
and, therefore, the entire information sys-
tem of SMITH with its local institutional
components can be described by concepts
on three layers [9]:
The domain layer describes an in-
formation system independent of its im-
plementation by the tasks it supports
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
(e.g. “Ingesting Data and Knowledge in
DIC Health Data Storage”). Tasks need
certain data and provide data for other
processes. In 3LGM models, these data
are represented as entity types.
The logical tool layer focusses on appli-
cation components supporting tasks (e.g.
“Health Data Storage (HDS)”, “Data In-
tegration Engine”). Application compo-
nents are responsible for the processing,
storage, and transfer of data. Computer-
based application components are in-
stalled software. Interfaces ensure the
communication among application
components and, therefore, enable their
The physical tool layer consists of physi-
cal data processing systems (e.g. person-
al computers, servers, switches, routers,
etc.), which are connected in a network.
This approach for structuring and describ-
ing information systems turned out to be
an appropriate basis for not only describing
health information systems at the interface
between patient care and research [10] but
also for assessing their quality [11].
3LGM model data for both the domain
and the logical tool layer have been col-
lected at several workshops and interviews
from all partners of the SMITH consor-
tium. In the first step, we collected tasks
and entity types (i.e. the data) needed or
produced by the tasks in workshops with
experts in the domain of data sharing in
and between health care and medical re-
search. In the second step, the workshops
were held with the CIOs and their experts
of health information systems architectures
and standards-based communication. The
application components found and their
interfaces and the communication links be-
tween them were connected to the tasks
and to the entity types determined by the
domain experts. Thus, we developed a
blueprint for the transinstitutional SMITH
information system, which integrates the
domain layer view of tasks and entity types
used with a tool layer view of applications,
services and communication. This was the
basis for the SMITH DIC reference archi-
tecture design explained in more depth in The continuously updated 3LGM
model will be used during the entire pro-
ject as an evolving blueprint for the devel-
opment process.
3.1.3 Integrating the Healthcare
Enterprise (IHE)
Our goal is to share medical and research
data not only inside the SMITH consor-
tium but also with a stepwise growing
number of partners in the near future.
Therefore, we decided to design the logical
tool layer of SMITH’s information system
strictly based on communication and stor-
age standards as far as possible. Especially,
we decided to implement IHE integration
profiles, which describe the precise and
coordinated implementation of communi-
cation standards, such as DICOM, HL7
W3C and security standards” [12]
(DICOM: Digital Imaging and Communi-
cations in Medicine; W3C: World Wide
Web Consortium). IHE profiles ensure pro-
cessual interoperability, since they define
how application systems in a certain role
(described as actor, e.g. “document con-
sumer”, “document provider”, “document
repository”) can communicate with other
application systems through certain trans-
In particular, the following IHE inte-
gration profiles have been used for design-
ing the SMITH DIC reference architecture
Patient Identifier Cross-referencing (PIX)
profile for pseudonym management
across all SMITH sites.
Patient Demographics Query (PDQ)
profile for querying demographic data
from any patient management system of
a SMITH partner hospital.
Cross-Enterprise Document Sharing
(XDS) profile for sharing documents
within an affinity domain, which us -
ually covers (parts of) a SMITH partner
Cross-Community Access (XCA) profile
for sharing documents across affinity
domains, e.g. between SMITH partner
Basic Patient Privacy Consents (BPPC)
and Advanced Patient Privacy Consents
(APPC) profiles for recording patients’
privacy consents in a way that they can
be used for controlling access to docu-
ments via XDS and XCA.
The Cross-Enterprise User Assertion
(XUA) profile for checking and con-
firming the identity of persons or sys-
tems trying to access data.
The Audit Trail and Node Authenti-
cation (ATNA) profile for providing
data integrity and confidentiality even
in case local interim copies of confiden-
tial data have to be allowed for safety
The Cross-Enterprise Document Work-
flow (XDW) profile to define and use
workflows upon document sharing pro-
files like XDS.
The Cross-Community Patient Discovery
(XCPD) profile to find patient data
across the SMITH partner hospitals.
The Cross-Community Fetch (XCF) pro-
file: like XCA, this profile accounts for
sharing documents, but in a simplified
way. In SMITH, we also use this profile
in a modified way to transport FHIR
queries and resources.
The Personnel White Pages (PWP) inte-
gration profile for managing user con-
tact information.
The Healthcare Provider Directory
(HPD) profile for managing healthcare
provider information, including roles
and access rights.
These profiles have been adapted to the
planned architecture at the logical tool
layer of the SMITH information system.
Using 3LGM, we mapped IHE profile
actors to the planned application compo-
nents and used the transactions as tem-
plates for defining the applications’ com-
munication interfaces. This endeavor was
supported by predefined templates and
corresponding workflows which have been
described in more detail in [13].
3.1.4 Communication Standards HL7
IHE profiles, especially XDS und XCA,
support the shared use of medical docu-
ments. For this purpose, the HL7 Clinical
Document Architecture (CDA) provides
specifications for various types of docu-
ments typically used in clinical care. In this
XML-based format, the CDA level de-
scribes a degree of structuring in the XML
body, ranging from level 1 (containing only
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
textual information) via level 2 (coded sec-
tions) to level 3 (structured entries).
Since we do not only want to share
documents but discrete individual data as
well, we took advantage of recent develop-
ments of IHE profiles using Fast Healthcare
Interoperability Resources (FHIR) [14].
FHIR defined “resources can be used to
arrange patient data, e.g. about allergies
and intolerances, family member histories,
medication requests and provide them for
access by remote application systems.
Modern web-based application program-
ming interface (API) technology, especially
the RESTful protocol, can be used to access
this data using certain operations, which
are defined as “interactions” by FHIR.
Since entries contained in CDA docu-
ments can be linked to FHIR resources and
several FHIR resources together can be
combined in a document, both formats are
suitable to facilitate syntactic interoperabil-
ity for retrieval and exchange of clinical
care data.
3.1.5 Medical Terminologies
For a transinstitutional use of medical and
research data, these data must include or
refer to machine-processable descriptions
of their content. International medical ter-
minologies like SNOMED CT offer code
systems and define relations between sub-
ject entities for an unambiguous specifi-
cation of certain medical information. We
will set up terminology services to provide
detailed representations of conceptual en-
tities. This includes browsing terminol-
ogies, displaying concepts, facetted search-
ing in terminologies and exporting. We will
import standard terminologies for coded
medical data like the International Classifi-
cation of Diseases (ICD) and the Logical
Observation Identifiers Names and Codes
(LOINC); terminologies and ontologies
used for text mining and phenotyping, like
Medical Subject Headings (MeSH), ICD,
LOINC or Human Phenotype Ontology
(HPO); furthermore, local vocabularies
will be used in a DIC (e.g. for medication).
In addition, new vocabularies and concepts
can be created and mapped to standardized
terminologies to facilitate the use of core
data sets. These terminology services will
be based on the Common Terminology
Services 2 (CTS2) data model.
Both HL7 CDA and HL7 FHIR refer to
the use of medical terminologies for the en-
coding of clinical concepts, thus enabling
semantic interoperability of shared data.
A link is created from process-related,
technical and semantic interoperability
with IHE, HL7 (procedure description and
definition), with HL7 CDA, FHIR, Clinical
Quality Language (CQL), etc. (definition of
protocols and file formats) and LOINC,
SNOMED-CT, etc. (Information Models).
3.1.6 Industrial Data Space (IDS)
SMITH wants to go beyond mere data ag-
gregation and therefore will develop the
SMITH Market Place (SMP) for devising
agreements on data access and distribution
(see 4.2.2). The architecture of the SMP is
based on the Industrial Data Space (IDS)
project, led by Fraunhofer ISST. IDS is a
virtual data space using standards and
common governance models to facilitate
the secure exchange and easy linkage of
data in business ecosystems. It thereby pro-
vides the basis for creating and using smart
services and innovative business processes,
while at the same time ensuring digital
sovereignty of data owners, decentral data
management, data economy, value cre-
ation, easy linkage of data, trust and secure
data supply chains and, finally, data gov-
ernance [15].
To facilitate the above-mentioned char-
acteristics, three key architectural compo-
nents were defined within the IDS. The
connector allows the secure interconnec-
tion with data sources and execution of
apps. To mediate between different con-
nectors, the broker allows the semantical
description of data sources as well as apps.
The last component is the app store, which
manages certified apps. With respect to the
SMP, the already defined concepts should
be used to enable an all-encompassing data
market place for medical informatics, pay-
ing attention to specific requirements like
regulatory affairs (e.g. consenting, data
protection and privacy) and existing data
exchange and integration standards (see
3.1.3 and 3.1.4).
The concepts of the connector as well as
the broker, will be reflected in the concep-
tualization process of the SMP. The con-
nector will be a secure and trusted soft-
ware-based endpoint for external users to
enable secure handshaking and contracting
between external participants and the SMP.
Furthermore, the connector has to be used
to initiate the onboarding process. The
broker will serve as an interface to the
metadata repository for data provisioning
and services. Like a directory, the broker
will allow external as well as internal users
to query a metadata repository in order to
fulfill a customer-specific demand.
3.2 Architectural Framework of
SMITH intends to build a medical network
of its partner hospitals, medical faculties
and research institutions. This network is
based on SMItHIS, the network’s trans -
institutional Health Information System
(tHIS) [8, 16]. SMItHIS connects the in-
formation systems of the partner insti -
tutions, including their Data Integration
Centers (DIC) and local clinical and re-
search application components. SMItHIS
integrates them by added components. In
this section, we introduce the architectural
framework for SMItHIS, which will guide
the entire development process.
In order to provide a holistic and con-
sistent view of the design of the entire sys-
tem of data processing and data exchange
within SMITH and beyond, we decided to
use the Enterprise Architecture Modeling
approach 3LGM. Using 3LGM, the en-
tire transinstitutional information system
SMItHIS with its local institutional com -
ponents can be described by concepts on
three layers. Since there are no special sol-
utions on the physical tool layer, we here
focus on the domain and logical tool layers.
SMITH has designed a generic concept
for its data integration centers. They share
identical services and functionalities to
take best advantage of the interoperability
architectures and of the data use and access
process planned. In the following section
3.2.1, we describe the main tasks of a DIC
in SMITH; they are common for the DICs
of all partners. In section 3.2.2 we describe
the application components and their in-
terfaces and communication links. We will
do this first by a generic reference model
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
Figure 1
High-level tasks (rec-
tangles) and entity
types (ovals) of
Arrows pointing from
a task T to an entity
type (ET): ET is up-
dated by T, other di-
rection: ET is used by
T. Dotted lines indi-
cate availability of
more detailed refine-
ments. Rectangles in-
side other rectangles
indicate part-of re-
lations [1].
and show later how we will assemble and
integrate the local instances of the refer-
ence model.
3.2.1 SMItHIS Domain Layer:
Tasks of DIC
The domain layer describes which tasks
have to be performed in SMITH and what
data are used for these tasks.
Figure 1
gives a high-level overview of the tasks and
entity types (data) relevant for SMItHIS
from the DIC perspective (dotted lines
indicate availability of more detailed re-
finements that can be found in the figure of
Online Appendix).
Each DIC operates a Health Data Storage
(HDS) containing a local EHR (electronic
health record) [17, 18] covering both data
in the local EMR (electronic medical rec-
ord) [18] ingested from a local partner
University Hospital (UH) and data in-
gested from other sources. In addition, the
HDS contains knowledge, i.e. rules and
methods on how to nourish the ingested
data. Health Data Storage Operations in
DIC cover ingesting data and knowledge,
as well as data nourishing.
Ingesting Data and Knowledge: Con-
siderable parts of local EMR data are
still unstructured data in reports, find-
ings or discharge letters. However,
structured, i.e. discrete individual data
is needed. Taking unstructured patient
data under DIC supervision calls for in-
corporating natural language processing
(NLP) tools. Text-mining algorithms
extract information from narrative texts
and render it as structured data [19].
The needed algorithms will be devel-
oped within the so-called Research &
Development Factory. By this term, we
summarize the research and develop-
ment projects using the shared data and
deriving rules and methods, i.e. knowl-
edge, out of the data. Within the factory,
the methodological use case “Phenotype
pipeline(PheP) will e.g. develop phe-
notyping rules (see 4.1). A DIC will as
well be able to ingest data from insur-
ance companies (record linkage) and
from patients themselves (e.g. patient
reported outcomes) by providing speci -
fic services in a Patient Portal. Ingested
patient data from EMR in a certain uni-
versity hospital and from shared re-
sources are stored as a local EHR in the
HDS. Knowledge will be ingested to the
local HDS as well. Note that ingesting
data into the HDS does not necessarily
mean that data has to be copied. Rather,
links to the data in their sources may be
Data Nourishing deals with adding value
to the ingested patient data. Ingesting
comprises metadata management, data
curation and phenotyping.
Metadata management provides and
maintains a local metadata repository,
which describes data elements and their
semantics. It will conform to ISO/IEC
norm 11179, which is widely used in the
health care domain. As part of data and
metadata transfer management, subsets
of this catalog may be shared. Addition-
ally, semantic information out of the
local catalog of metadata enrich patient
data in the local EHR.
Curation and applying the phenotyping
rules to data in the local EHR is sup-
ported by a Rules-Engine executing
rules, which are taken from the pre-
viously ingested knowledge. The DIC is
responsible for proper operation of the
Rules-Engine and thus of automatically
executing the rules in daily routine.
Based on the automatic execution of
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
rules for data curation, data of patients
are checked for plausibility, consistency,
redundancy, etc. Curated and semanti-
cally annotated EHR data is amended
with computed phenotype tags in the
course of the phenotyping pipeline (see
Figure 4). The computed phenotype
tags are used, e.g. by computerized deci-
sion support systems during patient
care in the hospitals (cf. 4.2 and 4.3).
Nourished EHR-local data will be used
for patient care in the university hospi-
tal. Thus, the innovation cycle from
bedside to bench and back from bench
to bedside [20] is closed.
Tagging rules derived from phenotyping
are developed and provided especially by
the use cases ASIC and HELP in the Re-
search & Development Factory (cf. 4.2 and
4.3). They use shared EHR data as input,
which is provided by the Data and Meta -
data Transfer Management units (DMT) in
the DICs of the partner university hospit-
als. The DMT acts under control of the
Data Use and Access committee (DUA).
Besides distribution of assets like data and
knowledge, consulting for research groups
is an important task of DMT. Depending
on privacy regulations and patients’ con-
sents, the Data Trustee and Privacy Man-
agement unit (DTP) will ensure pseudo-
nymization prior to data transfer.
3.2.2 SMItHIS Logical Tool Layer:
Application Components and
Services Supporting the DIC Tasks
The SMItHIS logical tool layer consists of
application components and their inter-
faces and communication links.
At each partner site, application compo-
nents are needed to support the local DIC,
i.e. supporting the execution of tasks as
well as storing and communicating data
(entity types) as discussed in the previous
section. Data and knowledge sharing be-
tween sites and between patient care and
Research & Development Factory have to
be enabled by appropriate communication
links and dedicated components. Com-
munication for data and knowledge shar-
ing at the SMItHIS logical tool layer uses
the IDS and is standards-based, if possible,
using especially IHE profiles, CDA, FHIR
and medical terminologies to ensure pro-
cessual, syntactic and semantic interoper-
ability, respectively (cf. section 3).
The architecture of the local system of
application components and their com-
munication links at each site, i.e. each of
the local sub-information systems of
SMItHIS, follows the DIC reference archi-
tecture. Using IHE profiles and the IDS
based SMITH Market Place, the local sub-
information systems are integrated in order
to build SMItHIS as a whole. Still, SMItHIS
allows for local peculiarities by applying
the DIC reference architecture locally. DIC Reference Architecture
As explained before, a DIC has to ingest
data from various Data Sources, i.e. differ-
ent application components of a local HIS.
We classify communications between ap-
plication components into three categories,
A, B, and C, according to their interface
type “if-type(see legend in
Figure 2).
Sources of type A are already designed ac-
cording to common coding schemes and
nomenclatures and they transfer data ac-
cording to processual standards, using e.g.
IHE profiles. Type B sources export data in
standard formats, such as HL7, DICOM
etc., but overarching standardization, pre-
venting variations in a technical standard
or semantically coded metadata for a uni-
fied data exchange is missing, and these
have to be added in a transformation step.
Finally, type C sources are proprietary, such
as data provided by comma-separated
(CSV) files. While data transformations
usually are not necessary for type A
sources, they need to be specified for type
B and C sources. Such data transform -
ations consist of data type conversions,
transformations/calculations into a stan-
dard unit of measurement (e.g. weight in
kg), and additions or replacements of the
codes and labels of categorical data in ac-
cordance with a standard terminology or
value set (coding scheme).
The Data Integration Engine will execute
all kinds of data transformation and load
processes from sources into the Health
Data Storage (HDS). The HDS contains
both a component for storing HL7 FHIR
resources (Health Data Repository), provid-
ing data by RESTful interfaces [21], and an
IHE XDS Document Repository, comprising
clinical data in HL7 CDA documents [22].
Thus, the HDS is the central and harmon-
ized base for all user-specified queries, data
exchanges, reports, etc. Using the interface-
type scheme (A, B, and C), we integrate
data beyond department borders within a
single hospital, i.e., laboratory, data and
pre-analytic data are stored in the same
way as treatment data from medical docu-
mentation systems. Note, at this generic
stage we do not specify whether data of
certain sources are virtually referred by or
materially copied into the HDS.
The data catalogue contained in the
HDS is a superset of the National Core
Data Set for Health Care [23], a commu-
nity-developed set of data from different
health care domains (“modules”). It con-
sists of basic modules for representing
demographic data, encounters, diagnoses,
procedures and medications in a standard-
ized way and extended modules, e.g. for
diagnostics, imaging, biobanking, Omics
and ICU data. It will be utilized in inter-
consortia use cases for evaluating the level
of interoperability between the consortia
described in this issue.
Metadata curation and harmonization
services will define a process to harmonize
common metadata elements from each site
by creating descriptive metadata at both
document and data element level, includ-
ing semantic, technical and provenance
metadata. Varying coding schemes and vo-
cabularies will be semantically annotated,
mapped and harmonized by alignment. A
quality management process will be set up
to ensure better metadata quality at the
metadata entry stage by applying terminol-
ogy management and semantic data valid-
ity checks. Metadata management will
strictly follow the FAIR principles [24].
Having captured data that are semantically
enriched by metadata (cf. “nourishing” in
previous section) is a prerequisite for later
analyses. Such metadata are typically pro-
vided by type A sources. Type B and C
sources necessitate the capture of meaning-
ful names and descriptions on the data el-
ement level, i.e., there is a name and a
description for each column of a data table
or class of data files, if they contain data of
lowest granularity, such as images. Such
metadata are centrally managed at each site
by a Metadata Repository as one part of the
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
Figure 2 SMITH-DIC Reference Architecture.
Rounded rectangles represent application systems and services. Small rounded rectangles represent interfaces. Lines between interfaces represent communi-
cation links. Arrows indicate initiation of communication.
Metadata Services. Conversely, the XDS
Document Registry comprises another type
of metadata. It describes each CDA docu-
ment of the XDS Document Repository ac-
cording to the Dublin Core standard [25],
i.e., when and by whom the document has
been imported (provenance metadata).
Furthermore, just like in the Metadata Re-
pository, subsets of metadata stored in the
XDS Document Registry need to be shared
or unified, e.g. metadata on how docu-
ments are categorized into classes and
types. To support an overarching semantic
interoperability, unified metadata at the
conceptual level will be linked to inter-
nationally shared terminologies. National
and international initiatives, such as Clini-
cal Information Modeling Initiative
(CIMI), the National Metadata Repository
and the German value sets for XDS are
carefully observed.
There are different Data Analytic Ser-
vices within each DIC. Various analysis
routines will be made available for data in
the HDS. These routines are used, for
example, for filtering patients (or cases)
according to specific disease entities, such
as pre-diabetes, taking data of different
sources into account (examination data,
laboratory data, and genetic data, if avail-
able). In this way, essential tasks for clinical
research (hypothesis generation, patient re-
cruitment for clinical trials) as well as for
health care (quality indicators, hospital
controlling) are seamlessly supported at the
architectural level. Similarly, data will be
prepared for sharing by the Data & Meta-
data Transfer Management Service. Be-
tween all consortia, an agreement towards
the development of a common data model
has been reached, where data transfer be-
tween DICs should be based on. The model
will be based on published information
models and best practice examples [26, 27,
28]. The national core data set will be
mapped to an exchange model based on
the reference model. In this way, metadata
and data can be analyzed as to whether rel-
evant data and sufficient cases/patients are
available to set up an analysis project for an
intended medical hypothesis. Analysis rou-
tines running on patient data will be
executed in a privacy-preserving computing
environment [29, 30] without interaction
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
Figure 3 SMItHIS high-level architecture. Lines and rectangles have the same meaning as in Figure
2. Each grey shaded application system “DIC …” is an adapted instance of the generic DIC reference
model in Figure 2.
beyond the DIC; the results will then be
shared with specific applicants of analysis
projects. Large scale analysis routines util-
izing clinical data from multiple sites will
be executed in a distributed manner by
bringing algorithms to the data, e.g. by pro-
viding docker containers [31].
In addition to the Data Analytic Ser-
vices, there are also other application sys-
tems that interact with the HDS, such as
the Patient Portal to support patient em-
powerment, which is shown in
Figure 2.
In addition to care-related services, pa-
tients shall be able to get detailed in-
formation on conducted clinical research
and data sharing and to “donate” their own
data to selected projects. The first step is to
provide a module showing which types of
consent a patient has given. This Consent
Creator Service also provides patients (or by
proxy medical personnel) with the oppor-
tunity to consent electronically. A second
step will provide information on the spe-
cific usage of their data in data analysis
projects. A third step will be to provide
a link to the EMR documents available.
Further web portals based on DIC services
are planned to provide additional support
for Data Trustee and Privacy Management
and Data and Metadata Transfer Manage-
ment functions.
Among others, the reference architec-
ture contains a Patient Identification and
Pseudonym Management Service, which
relies on IHE PIX and PDQ profiles to
manage patient-related identifiers inde-
pendent from clinical care or personal data.
All components are connected by the
Data Integration Engine. In this way, analy-
sis services can transparently access data of
source systems via the Health Data Storage.
The same holds for authenticated users
who have been granted data access. Both –
named users and automatically running
analysis routines – extensively use available
metadata, which are managed by the Meta-
data Repository and the XDS Document
All cross-community, i.e. transinstitu-
tional communication requires authenti-
cation, authorization and auditing. Access
control interoperability is crucial for a suc-
cessful and sustainable health inform -
ation exchange. The Access Control Services
(ACS) use several international standards
and frameworks (e.g. IHE profiles
A/BPPC, XUA and ATNA, c.f. section
3.1.3) to fulfill access control workflows
and to ensure proper auditing. The ACS
Facade (Data Sharing Services Interface)
handles all transinstitutional communi-
cation and encapsulates the associated se-
curity aspects based on the facade pattern
of software design. It provides all IHE-
based interfaces to access structured data
and unstructured documents and a FHIR-
based interface to access discrete data ob-
jects, and enforces the appropriate consents
and authorizations. SMItHIS High-Level Architecture
Based on the DIC reference architecture,
the SMItHIS High-Level Architecture (see
Figure 3) describes the services for a
connection of all DICs, other Medical In-
formatics Initiative consortia and external
partners. The SMITH consortium intro-
duces two additional architectural con-
cepts: Data Sharing Services and the
SMITH Market Place (SMP) (see
2). The high-level architecture clearly dis-
tinguishes between access and contracting
services provided for researchers and DICs
through the SMP and Data Sharing Ser-
vices. While the SMP provides services for
contracting and granting access to data by
incorporating concepts from the Fraun-
hofer Industrial Data Space (IDS), The
Data Sharing Services provide functions
for started projects and connect all DIC
ACS Facades for access control and stan-
dardized data sharing based on IHE pro-
The SMP will be a central contracting
endpoint for internal as well as external
data consumers to provide data and knowl-
edge to all interested researchers. As such,
it will enable researchers to identify rel-
evant data sets by providing a GUI to cre-
ate and execute feasibility queries based on
the HL7 CQL standard for queries to clini-
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
cal knowledge. Once a suitable dataset has
been identified, the SMP will support re-
searchers in creating data use and access
proposals. Furthermore, the SMP will sup-
port the Data Use and Access Committee
in the review and approval process. Finally,
when the proposal is approved, the SMP
will activate the required access policies,
enabling data transfer to the applicant. A
central aspect is to enforce a contract be-
tween the data user and the data provider.
Foreseeing a potential extension of the
SMP, the contracting mechanisms will be
implemented in a generic way, enabling the
reuse of the same principles when sharing
not only data, but also analytical services,
for example. Similarly, the architecture of
the SMP will be designed to enable the in-
tegration of third-party products. The SMP
does not store any data on its own, i.e. it
has no data storage functionality besides
the one required for processing the actual
contracts and access requests. The SMP
will define and provide interfaces and
facades to mediate between data use and
access requests and consortia-specific data
storage and access policies (interconnec-
tion to the Data Sharing Services).
The Data Sharing Services include an
overarching identity provider (IHE HPD)
for managing all participating identities
and their roles for clinical studies and
analysis projects. An Enterprise Master Pa-
tient Index (EMPI, IHE PIX/PDQ) man-
ages the linkage of patient pseudonyms
across the DICs whereby no personal pa-
tient data is stored. Access to the EMPI is
protected by the Access Control System
(SAML / XACML). It utilizes security
tokens and policies (IHE XUA) to secure
communication with the central compo-
nents. The consent registry (IHE XDS.b,
APPC) for providing information about
consent policy documents is also included.
This registry is connected to the Consent
Creator Services, i.e. local Patient Portals
for the creation of such consents based on
patientsinformed permissions to use their
data. It can be queried by a Consent Con-
sumer, e.g. the Data and Metadata Transfer
Management Service, via IHE XCA Query
and then further processed to check, for
example, whether certain data can be
shared and used for a specified purpose.
Terminology services manage and provide
codes and concepts on a consortium level.
These can also be used to connect to ex -
ternal terminology services, e.g. at the
national level, or to connect to other third-
party services, such as existing public key
infrastructures or external identity pro-
viders. Thus, the Data Sharing Services
comprise functions for a federation of stan-
dardized data sharing and access control
which can be cascaded from the DICs up to
a national level.
In summary, the SMITH Market Place
and the Data Sharing Services connect the
individual sites and provide the functional-
ities and services to initiate projects, to
share medical knowledge, data and algo-
rithms and to support existing and new use
cases for all possible partners (further uni-
versity hospitals during the development
and networking phase as well as additional
healthcare institutions and network part -
ners in an elaborated roll-out process) in
the Medical Informatics Initiative.
4. Use Cases
In SMITH, we will implement one metho-
dological use case (PheP) and two clinical
use cases (ASIC and HELP). The use cases
shall gather evidence for the usefulness
and benefit of the DICs and their services
implemented on top of the SMItHIS archi-
tecture. The clinical and patient-relevant
impact of both clinical use cases will be
evaluated by stepped wedge study designs.
4.1 Methodological Use Case
PheP: Phenotype Pipeline
The SMITH consortium plans to develop
and implement a set of tools and algo-
rithms to systematically annotate and ana-
lyze patient-related phenotypes according
to classification rules and statistical or dy-
namic models. Using DIC services (c.f. sec-
tion 3.2.1,
Figure 1) EHR data in the
HDS is annotated with computed pheno-
type tags. The annotations and derivatives
will be made available for triggering alerts
and actions, e.g. by study nurses in order to
acquire additional findings from certain
patients for data completion. The tags are
subject to sharing and will be used for ana-
lyses of patient care and outcomes. This set
of tools and algorithms constitutes the
“Phenotype pipeline” (PheP). PheP will
utilize data from the participating hospitals
information systems. Besides structured
data, unstructured textual data from clini-
cal reports and the EMR will be incorpor-
ated and will be subject to natural language
processing (NLP). The technology will be
implemented in the DICs and first used to
support the clinical use cases HELP and
ASIC and in other research driven specific
data use projects.
Figure 4 illustrates the basic tasks of
and entity types used in PheP. This figure
refines the generic DIC tasks as illustrated
Figure 1:
Within the Research & Development
Factory, research groups may be estab-
lished to develop knowledge to be used
in the DIC. Especially phenotyping
rules for phenotype classifications and
NLP algorithms for information extrac-
tion from unstructured documents are
in focus. Following a strict division
between research and patient care, the
research groups in the Research & De-
velopment Factory will use pseudonym-
ized patient data provided for sharing
by the DMT unit of one or more DICs.
We use the term “phenotype” in a very
general sense, referring to a set of
attributes that can be attached to an
individual. We distinguish between ob-
servable and computable phenotypes,
defined as a clinical condition, charac-
teristic, or set of clinical features that
can be determined solely from the data
in EMRs and ancillary data sources and
does not require chart review or inter-
pretation by a clinician” [32]. Phenotype
classification rules are based on classifi-
cation trees, statistical models or simu-
lation models. This will result in anno-
tations (called tags) of a broad spectrum
of attributes linked to a patient and his/
her pattern of care and outcome.
The main prerequisite for performing
phenotyping is the availability of struc-
tured data. Phenotype information will
be automatically extracted from un-
structured EMR entries and clinical
documents using NLP. For this, we plan
two building blocks: a clinical document
corpus (ClinDoC; for a preliminary ver-
sion, cf. [33]) and a collection of NLP
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
Figure 4 Detailed model of tasks and entity types of a DIC focusing on the phenotyping pipeline
PheP. Here, entity types are depicted as rounded rectangles. Rectangles and arrows have the same
meaning as in Figure 1.
components for processing German-
language clinical documents, the
SMITH Clinical Text Analytics Proces-
sor (ClinTAP; according to software en-
gineering standards outlined in [34],
cf. [19] for a preliminary version).
ClinDoC will serve as the backbone for
system performance evaluation and as a
training and development environment
for single NLP modules, which will be
integrated, after quality control, in the
ClinTAP information extraction (IE)
The Phenotyping Rules Engine (see
Figure 2) automatically applies the
tagging rules for phenotyping and com-
putes the phenotype tags. As tools ma-
ture, rules and tags will be handed over
for routine use to the DICs.
Any DIC interested in using this knowl-
edge, i.e. the rules for NLP and pheno-
typing, will ingest it into its Health Data
Storage. Whereas structured data may
easily be ingested in the HDS, unstruc-
tured documents from a hospital’s EMR
have to be processed by information
extraction and text mining tools using
the NLP algorithms.
The Metadata Repository (MDR) will
provide access to medical terminologies
and code systems and thus supports
semantic interoperability. In the PheP
use case, we will enhance the MDR with
metadata from a large variety of health-
care datasets, epidemiological cohorts,
clinical trials and use cases. The MDR
will also provide a directory of data
items available projects in the Research
& Development Factory (catalog of
4.2 Clinical Use Case ASIC:
Algorithmic Surveillance of
ICU Patients
Due to the epidemiological challenges in
Germany, demand for intensive care medi-
cine will increase over the next 10-15 years.
At the same time, there will be a shortage
of staff resources and so there is an urgent
need for interoperable and intelligent sol-
utions for using data to improve outcomes
and processes. The need for outcome im-
provement is especially urgent in patients
suffering from the acute respiratory dis-
tress syndrome (ARDS). Incidence of
ARDS worldwide remains high, with 10.4%
of total ICU admissions and 23.4% of all
patients requiring mechanical ventilation
[35]. The use case Algorithmic Surveillance
of ICU Patients (ASIC) will therefore in-
itially focus on ICU patients with ARDS.
By means of continuous analyses of data
from the Patient Data Management Sys-
tem (PDMS), a model-based “algorithmic
surveillance” of the state of critically ill
patients will be established. To predict indi-
vidual disease progression, ASIC will util-
ize pattern recognition technologies as well
as established mechanistic systems medi-
cine models, complemented by machine
learning, both integrated in a hybrid virtual
patient model [36]. Ultimately, the virtual
patient model will enable individual prog-
noses to support therapy decisions, clinical
trials and training of future clinicians.
Training the virtual patient model requires
high-performance computing.
The resulting ASIC system will be an
on-line rule-based computerized decision-
support system (CDSS). As such, it will ex-
tensively use the DIC services and espe -
cially the Phenotype Rules-Engine, which
applies the phenotyping rules developed in
PheP. Its aim is to accelerate correct and
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
sound diagnosis and increase guideline
compliance regarding ventilation. The
rules may be realized by explicit decision
trees, complex models, mechanistic or ma-
chine learning-based, or combinations of
both. Its main component is the Diagnostic
Expert Advisor (DEA). The development
utilizes the PheP efforts in the following
way (c.f.
Figure 4 and
Online Appen-
Usually PDMS data is well structured
and will be ingested to the DIC Health
Data Storage without NLP. Specific
NLP-processed data mining from other
sources (e.g. radiology and microbiol-
ogy reports) will complement the data.
The Data and Metadata Transfer Man-
agement (DMT) will provide pseudo-
nymized patient data taken from the
different partner HDS for training the
The ASIC team contributes to the Re-
search & Development factory. It will
provide phenotyping rules for comput-
ing phenotype alert tags. Established
machine learning approaches (with pre -
ference given to hierarchical clustering,
random forest, Support Vector Ma-
chines, neural networks and Hidden
Markov Models) will be applied to the
training data collections to identify new
patient classification schemes. We will
also use the large data sets from all con-
sortiums DICs to evaluate and utilize
the predictivity of deep learning for
time series.
The identified data patterns and models
will be expressed in a system of rules in
the Phenotype Rules-Engine, which will
be provided for ingesting into the
knowledge bases of all partnering DICs.
In the initial phase, this will be used for
research purposes, in the later phase the
ASIC-apps will be certified for care ac-
cording to the medical products regu-
The rule system will be used for pheno-
typing of new patients using the EHR
data in the HDS. The computed tags
will then be used as input for the ASIC
decision support system.
The utility of the ASIC approach will be
demonstrated in the clinical care setting
using a step wedge design. In this setting,
the ASIC app will provide the interface
between the DIC, the surveillance algo-
rithms and the medical professionals at the
bedside. Physicians will be automatically
informed if a priori specified limits are
exceeded. This alert function will be en-
hanced by smart latest action-based algo-
rithms, preventing medical professionals
from moot alerts (and, thus, alert fatigue).
4.3 Clinical Use Case HELP:
A Hospital-wide Electronic Medical
Record-based Computerized
Decision Support System to
Improve Outcomes of Patients
with Blood-stream Infections
Antimicrobial stewardship programs (ABS
programs) use a variety of methods to im-
prove patient care and outcomes. One of
the core strategies of ABS programs is
prospective audit and feedback, which is
labor-intensive and costly. When an infec-
tious diseases specialist is involved in a
patient’s care and the physician in charge
follows his/her recommendations, patients
are more often correctly diagnosed, have
shorter lengths of stay, receive more appro-
priate therapies, have fewer complications,
and may use fewer antibiotics, overall.
However, physicians with infectious dis-
ease expertise are rare in Germany and
usually limited to a few university hospit-
As an alternative, CDSS have been rec-
ommended by a recent German S3 Guide-
line on ABS programs [37]. CDSS may help
to establish the correct diagnosis, to choose
appropriate antimicrobial treatment, and
to balance optimal patient care with unde-
sirable aspects such as the development of
antibiotic resistance, adverse events, and
costs. However, the availability of CDSS for
ABS program implementation is currently
underdeveloped. HELP aims to develop
and implement a CDSS. We will first
focus on Staphylococcal bacteremia, i.e.
staphylococcal bloodstream infections,
since Staphylococci are the most frequent
pathogens detected in blood cultures
(BCs). Furthermore, a recent meta-analysis
has shown that the particularly high mor-
tality rate of staphylococcus aureus bac-
teremia can be reduced by 47% when an
infectious diseases specialist is involved.
This reduction is achieved by increased ad-
herence to an evidence-based bundle of
care which will be implemented into the
Development and use of the HELP-
CDSS will use SMITH’s Phenotype Pipe-
line PheP similarly like ASIC (see
4, and
Online Appendix):
The HELP-CDSS requires the inte-
gration of structured clinical data and
unstructured clinical reports from the
partnering hospitals’ EHRs. Previous
work (e.g. [39]) has indicated a potential
benefit of such an integration in re-
search fields related to our HELP objec-
tive. Again, structured data will be in-
gested based on technical standards,
whereas NLP algorithms will be used to
ingest the reports into HDS, and the
structured information derived there-
The DMT will provide and share pseu-
donymized patient data for developing
As above, the HELP team is part of
the Research & Development Factory
and will develop phenotyping rules. In
HELP, these rules are to classify patients
according to their need of antibiotic
stewardship. The proposed CDSS algo-
rithm consists of several decision levels
for which i) no human support is re -
quired, ii) all required information are
or will become digitally available and iii)
an automated directive/management
proposal can be reported immediately
to the treating physician.
These rules will be used for phenotyp-
ing of new patients using the EHR data
in the HDS. The Rules-Engine, there-
fore, monitors patient data in real-time,
phenotypes the patients applying the
rules, which represent the generalized
and derived guidelines and models. The
HELP CDSS uses the computed pheno-
type tags and issues an alert if a poten-
tial candidate for action is identified.
Feedback to and support of the health care
professionals will be based on the HELP
App, which will be based on a generic
SMITH app – like the ASIC app. The goal
of the HELP app is to provide immediate
alerts/directives to the treating physician
along the outlined algorithms.
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
© Schattauer 2018 Methods Inf Med Open 01/2018
5. Discussion
This paper outlines an ambitious program.
It is ambitious not only with regard to the
standards-based SMItHIS architecture, but
with regard to the use cases as well. We are
confident of being successful in the end be-
SMITH is not only an academic consor-
tium. Moreover, the CIOs of the part-
nering hospitals and their information
management departments are actively
involved. Hence, we will be able to con-
nect bench and bedside in both direc-
tions [20].
A unique public-private partnership of
complementary partners and the incor-
poration of experienced companies in
the field of interoperable transinstitu-
tional information systems and in the
field of analyzing medical data (struc-
tured as well as unstructured) efficiently
will effectively help implementing the
standards-based architecture.
However, we are aware of the challenges
and risks of such an endeavor, e.g.
The DICs in general and the intended
use cases require high quality clinical
documentation. Although we will im-
plement NLP methods to analyze un-
structured documents, more and more
structured documentation will be
needed. Therefore, the sustainability of
DICs depends on the ongoing support
by the hospital’s management and, most
importantly, by their health care pro -
fessionals. This support can only be ob-
tained when healthcare and patients, in
particular, will benefit from SMItHIS
and thus promote the use of clinical
data for its services.
The rights of citizens and patients for
informational self-determination and
the legal regulations for data protection
and privacy may come into conflict with
important medical research goals in
SMITH. The German Ethics Council
recognized this potential for conflict
and recommends legal regulations
based on the principle of data sover-
eignty [40]. In SMITH, we will act
closely to this recommendation es-
pecially by integrating a patient portal.
The patient portal shall give individuals
information on the usage of their data
and shall provide the opportunity for
data donation.
1. Federal Ministry of Education and Research, Ger-
many. Medical Informatics Funding Scheme: Net-
working data – improving health care. Berlin, Ger-
many; 2015. Available from:
2. CCSDS Secretariat. Reference Model for an Open
Archival Information System (OAIS): Magenta
Book: NASA Headquarters; 2012. Available from:
3. Kirsten T, Kiel A, Wagner J, Rühle M, Löffler M.
Selecting, Packaging, and Granting Access for
Sharing Study Data. In: Eibl M, Gaedke M, editors.
Informatik 2017 – Bände I-III: Tagung vom 25.-29.
September 2017 in Chemnitz. Bonn: Gesellschaft
für Informatik; 2017. p. 1381–1392 (GI-Edition
Proceedings; vol. 275).
4. Kirsten T, Kiel A, Rühle M, Wagner J. Metadata
Management for Data Integration in Medical
Sciences – Experiences from the LIFE Study. In:
Mitschang B, Ritter N, Schwarz H, Klettke M,
Thor A, Kopp O, et al., editors. BTW 2017: Daten-
banksysteme für Business, Technologie und Web
(Workshopband); Tagung vom 6. – 7.März 2017 in
Stuttgart. Bonn: Gesellschaft für Informatik; 2017.
p. 175–194 (GI-Edition lecture notes in in-
formatics (LNI) Proceedings; volume P-266).
5. Loeffler M, Engel C, Ahnert P, Alfermann D, Are-
lin K, Baber R, et al. The LIFE-Adult-Study: Ob-
jectives and design of a population-based cohort
study with 10,000 deeply phenotyped adults in
Germany. BMC Public Health 2015; 15: 691.
6. Löffler M, Binder H, Kirsten T. Leipzig Health
Atlas; 2018 [cited 2018 Jan 30]. Available from:
7. Winter A, Brigl B, Wendt T. Modeling Hospital In-
formation Systems (Part 1): The Revised Three-
Layer Graph-Based Meta Model 3LGM2. Methods
Inf Med 2003; 42(5): 544–551.
8. Winter A, Haux R, Ammenwerth E, Brigl B, Hell-
rung N, Jahn F. Health Information Systems –
Architectures and Strategies. London: Springer;
9. Staemmler M. Towards sustainable e-health net-
works: does modeling support efficient manage-
ment and operation? In: Kuhn KA, Warren JR,
Leong T-Y, editors. Proceedings of Medinfo 2007
(Part 1). Amsterdam: IOS Press; 2007. p. 53–57
(Stud Health Technol Inform; vol. 129).
10. Stäubert S, Winter A, Speer R, Loffler M. Design-
ing a Concept for an IT-Infrastructure for an Inte-
grated Research and Treatment Center. In: Safran
C, Marin H, Reti S, editors. MEDINFO 2010 Part-
nerships for Effective eHealth Solutions. Amster-
dam: IOS Press; 2010. p. 1319–1323 (Stud Health
Technol Inform; vol. 160).
11. Winter A, Takabayashi K, Jahn F, Kimura E, Engel-
brecht R, Haux R, et al. Quality Requirements for
Electronic Health Record Systems: A Japanese-
German Information Management Perspective.
Methods Inf Med 2017; 56(Open): e92–e104.
Available from:
12. IHE International Inc. IHE Profiles; 2017 [cited
2018 Jan 4]. Available from:
13. Stäubert S, Schaaf M, Jahn F, Brandner R, Winter
A. Modeling Interoperable Information Systems
with 3LGM and IHE. Methods Inf Med 2015;
54(5): 398–405.
14. Introducing HL7 FHIR; 2014 Jan 1 [cited
2018 Jan 4]. Available from:
15. Otto B, Jürjens J, Schon J, Auer S, Menz N, Wenzel
S et al. INDUSTRIAL DATA SPACE: Digitale Sou-
veränität über Daten; 2016. Available from: www.
16. Juhr M, Haux R, Suzuki T, Takabayashi K. Over-
view of recent trans-institutional health network
projects in Japan and Germany. J Med Syst 2015;
39(5): 234.
17. International Organization for Standardization
(ISO). ISO 18308: 2011 Requirements for an elec-
tronic health record architecture; 2011 [cited 2014
Jan 23].
18. Garets D, Davis M. Electronic Medical Records vs.
Electronic Health Records: Yes, There Is a Differ-
ence. Chicago, IL: HIMSS Analytics; 2006. Avail-
able from:
19. Hellrich J, Matthies F, Faessler E, Hahn U. Sharing
models and tools for processing German clinical
texts. Stud Health Technol Inform 2015; 210:
20. Marincola FM. Translational Medicine: A two-way
road. J Transl Med 2003; 1(1): 1.
21. FHIR Developer Introduction; 2015 [cited 2016
Apr 13]. Available from:
22. Health Level Seven International. CDA® Release 2:
Health Level Seven International; 2016 [cited 2016
Feb 12]. Available from:
23. Redaktionsgruppe Kerndatensatz der AG Interop-
erabilität des Nationalen Steuerungsgremiums der
Medizininformatik-Initiative; 2017 Mar 10. Avail-
able from: http://www.medizininformatik-initi
24. Wilkinson MD, Dumontier M, Aalbersberg IJJ,
Appleton G, Axton M, Baak A et al. The FAIR
Guiding Principles for scientific data management
and stewardship. Sci Data 2016; 3: 160018.
25. Darmoni SJ, Thirion B, Leroy JP, Douyère M. The
use of Dublin Core metadata in a structured health
resource guide on the internet. Bull Med Libr
Assoc 2001; 89(3): 297–301.
26. Meineke, Frank, Stäubert S, Löbe M, Winter A. A
Comprehensive Clinical Research Database based
on CDISC ODM and i2b2. Stud Health Technol
Inform 2014; 205: 1115–1119.
27. Murphy SN, Mendis M, Hackett K, Kuttan R, Pan
W, Phillips LC, et al. Architecture of the open-
source clinical research chart from Informatics for
Integrating Biology and the Bedside. AMIA Annu
Symp Proc 2007; 2007: 548–552.
A. Winter et al.: SMITH
German Med. Informatics Initiative
© Georg Thieme Verlag KG 2018 License terms: CC-BY-NC-ND (
Methods Inf Med Open 01/2018 © Schattauer 2018
28. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser
V, Schuemie MJ, et al. Observational Health Data
Sciences and Informatics (OHDSI): Opportunities
for Observational Researchers. Stud Health Tech-
nol Inform 2015; 216: 574–578.
29. Kho AN, Cashy JP, Jackson KL, Pah AR, Goel S,
Boehnke J, et al. Design and implementation of a
privacy preserving electronic health record linkage
tool in Chicago. J Am Med Inform Assoc 2015;
22(5): 1072–1080.
30. Vatsalan D, Sehili Z, Christen P, Rahm E. Privacy-
Preserving Record Linkage for Big Data: Current
Approaches and Research Challenges. In: Zomaya
A, Sakr S, editors. Privacy-Preserving Record
Linkage for Big Data: Handbook of Big Data Tech-
nologies. Springer; 2017.
31. Löbe M, Ganslandt T, Lotzmann L, Mate S, Chris-
toph J, Baum B et al. Simplified Deployment of
Health Informatics Applications by Providing
Docker Images. Stud Health Technol Inform 2016;
228: 643–647.
32. Richesson RL, Smerek MM, Blake Cameron C. A
Framework to Support the Sharing and Reuse of
Computable Phenotype Definitions Across Health
Care Delivery and Clinical Research Applications.
EGEMS (Wash DC) 2016; 4(3): 1232.
33. Hahn U, Matthies F, Lohr C, Löffler M.
3000PA—Backbone for a national clinical refer-
ence corpus of German. MIE 2018. In: Ugon A,
Karlsson D, Klein GO, Moen A, editors. Building
Continents of Knowledge in Oceans of Data: The
future of co-created ehealth. Amsterdam: IOS
Press; 2018. p. 26–30.
34. Hahn U, Matthies F, Faessler E, Hellrich J. UIMA-
based JCoRe 2.0 goes GitHub and Maven Central:
State-of-the-art software resource engineering and
distribution of NLP pipelines. In: Calzolari N, edi-
tor. LREC 2016: [proceedings]. [S. l.: s. n.]; 2016. p.
35. Henry KE, Hager DN, Pronovost PJ, Saria S. A tar-
geted real-time early warning score (TREWScore)
for septic shock. Sci Transl Med 2015; 7(299):
36. Wolkenhauer O, Auffray C, Brass O, Clairambault
J, Deutsch A, Drasdo D, et al. Enabling multiscale
modeling in systems medicine. Genome Med
2014; 6(3): 21.
37. With K de, Allerberger F, Amann S, Apfalter P,
Brodt H-R, Eckmanns T, et al. Strategies to en-
hance rational use of antibiotics in hospital: A
guideline by the German Society for Infectious
Diseases. Infection 2016; 44(3): 395–439.
38. Vogel M, Schmitz RPH, Hagel S, Pletz MW, Gagel-
mann N, Scherag A, et al. Infectious disease con-
sultation for Staphylococcus aureus bacteremia –
A systematic review and meta-analysis. J Infect
2016; 72(1): 19–28.
39. DeLisle S, South B, Anthony JA, Kalp E, Gundla-
pallli A, Curriero FC, et al. Combining free text
and structured electronic medical record entries to
detect acute respiratory infections. PLoS One
2010; 5(10): e13377.
40. Deutscher Ethikrat. Big Data und Gesundheit
Datensouveränität als informationelle Freiheits-
gestaltung: Stellungnahme. Berlin; 2017 Nov 30.
Available from:
A. Winter et al.: SMITH
German Med. Informatics Initiative
License terms: CC-BY-NC-ND ( © Georg Thieme Verlag KG 2018
... Technical training for medical researchers and governing entities as well as ethical and legal training for technical experts can increase confidence in project-related decision making 1,18,23,24,27,28 . The same effect can be achieved by developing MDS guidelines and actionable data protection concepts (DPC) [13][14][15][16] . A good example is the DPC of the MI-I that was developed in collaboration with the German working group of medical ethics committees (AK-EK) 12 . ...
... data ownership, individual autonomy, confidentiality, necessity of data processing, non-maleficence and beneficence 1,33 . Considered jointly, they result in a trade-off to be made between the preservation of ethical rights of treated patients and the beneficence of the scientific project 15,18,26 . Criticism often arises concerning the prevailing trade-off in favor of patients' privacy, where ethics committees tend to overprotect patient data 23,27 . ...
... This is because the governing entities require all planned processing steps to be documented in a study plan, serving as the foundation for their decision-making process. This results in long project planning phases due to uncertainties in a complex multi-player environment [13][14][15][16]21 . Additionally, creating a strict study plan usually works for clinical trials, but in data science, meaningful results often require more flexibility. ...
Full-text available
Medical real-world data stored in clinical systems represents a valuable knowledge source for medical research, but its usage is still challenged by various technical and cultural aspects. Analyzing these challenges and suggesting measures for future improvement are crucial to improve the situation. This comment paper represents such an analysis from the perspective of research.
... In the case of microbiology, we used the thematically related laboratory CDS module as starting point as well as the information model already developed for the use case Infection Control (IC) of the HiGHmed 19,20 consortium within the MII. Additional impulses incorporated into the modeling process came from the HELP 21 use case of another MII consortium, SMITH 22 , dedicated to the use of antibiotics in infection medicine and from the guidelines of the Austrian electronic health record (ELGA) 23 . In general, we built on previous experiences but expanded the scope to comprise all possible microbiology examinations including microbial genetics. ...
Full-text available
The COVID-19 pandemic has made it clear: sharing and exchanging data among research institutions is crucial in order to efficiently respond to global health threats. This can be facilitated by defining health data models based on interoperability standards. In Germany, a national effort is in progress to create common data models using international healthcare IT standards. In this context, collaborative work on a data set module for microbiology is of particular importance as the WHO has declared antimicrobial resistance one of the top global public health threats that humanity is facing. In this article, we describe how we developed a common model for microbiology data in an interdisciplinary collaborative effort and how we make use of the standard HL7 FHIR and terminologies such as SNOMED CT or LOINC to ensure syntactic and semantic interoperability. The use of international healthcare standards qualifies our data model to be adopted beyond the environment where it was first developed and used at an international level.
... 26 The project runs under the Smart Medical Information Technology for Healthcare (SMITH) consortium, within the German Medical Informatics Initiative funded by the German Federal Ministry of Education and Research (BMBF). 27 It was approved by the independent ethics committee of the medical faculty of RWTH Aachen University (local EC reference number: EC 102/19, date of approval: 26/03/2019) and registered in the German clinical trials registry (registration number: DRKS00014330). The ethics committee waived the need to obtain informed consent for the collection and retrospective analysis of de-identified data and publication of the results of the analysis. ...
Full-text available
Pandemic preparedness has gained high relevance throughout the SARS-CoV-2 pandemic, necessitating optimized therapeutic strategies for early phase disease management. Predicting patient-specific outcomes, particularly for novel infectious diseases with complex multimorbidity patterns, presents significant challenges due to limited availability of disease-specific patient datasets and an incomplete understanding of disease mechanisms early in the disease wave. We propose a concept for rapid predictive modeling of disease outcomes based on the hypothesis of common disease-promoting mechanisms and leveraging transfer learning across different disease types. Introducing the Medical Reference Space (MRS) as a unifying framework to integrate heterogeneous clinical data structures, allows us to monitor therapy response dynamics and predict individual outcomes across various diseases. We performed a retrospective analysis on mechanically ventilated patients from intensive care units, comprising 495 COVID and 6212 non-COVID patients. The results demonstrate that MRS-based models, trained on non-COVID patients, effectively predict individual survival in COVID-19 patients, comparable to models trained exclusively on COVID data. Moreover, the MRS approach provides insights into the impact of rapidly evolving events, such as sepsis, on the predictivity of machine learning-based prognostic systems indicating conceptual limitations for patient-specific decision support systems. Our findings highlight the potential of using retrospective patient data repositories for transfer learning as a rapid strategy for predicting disease outcomes of new evolving diseases, opening a route towards future pandemic preparedness.
... In this sense, the MII junior research group 'Terminology and Ontology-based Phenotyping (TOP)' (part of the MII SMITH consortium [11]) aims to develop an easily applicable ontology-based framework (TOP Framework) for modelling and executing PAs. The most important advantage of our approach is a clear separation between the modelling of the domain knowledge (by medical staff, biometricians, etc., i.e., non-IT experts) and the implementation. ...
Full-text available
The detection and prevention of medication-related health risks, such as medication-associated adverse events (AEs), is a major challenge in patient care. A systematic review on the incidence and nature of in-hospital AEs found that 9.2% of hospitalised patients suffer an AE, and approximately 43% of these AEs are considered to be preventable. Adverse events can be identified using algorithms that operate on electronic medical records (EMRs) and research databases. Such algorithms normally consist of structured filter criteria and rules to identify individuals with certain phenotypic traits, thus are referred to as phenotype algorithms. Many attempts have been made to create tools that support the development of algorithms and their application to EMRs. However, there are still gaps in terms of functionalities of such tools, such as standardised representation of algorithms and complex Boolean and temporal logic. In this work, we focus on the AE delirium, an acute brain disorder affecting mental status and attention, thus not trivial to operationalise in EMR data. We use this AE as an example to demonstrate the modelling process in our ontology-based framework (TOP Framework) for modelling and executing phenotype algorithms. The resulting semantically modelled delirium phenotype algorithm is independent of data structure, query languages and other technical aspects, and can be run on a variety of source systems in different institutions.
... However, these are a basic prerequisite for the follow-up development of algorithms and models capable of analyzing clinical speech. In Germany, some progress has been made in the last five years, not least in the context of the MII 2 -a large-scale national funding initiative -in particular in the SMITH consortium [1], and a few corpora are more or less freely available [2][3][4][5]. But the issue of data privacy protection is still a big roadblock for making these and similar resources available to the NLP community and, by extension, to researchers and research programs that want to gather information from unstructured medical data, e.g. ...
Full-text available
The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.
... Data from five German hospitals (hereinafter referred to as Derivation Hospital and Validation Hospitals 1-4) were retrospectively sourced and thoroughly depersonalized from ICU patients involved in the project titled "Algorithmic surveillance of ICU patients with acute respiratory distress syndrome" (ASIC) 43 . This project is an integral part of the SMITH consortium 44 , a body within the German Medical Informatics Initiative. ...
Full-text available
The development of reliable mortality risk stratification models is an active research area in computational healthcare. Mortality risk stratification provides a standard to assist physicians in evaluating a patient's condition or prognosis objectively. Particular interest lies in methods that are transparent to clinical interpretation and that retain predictive power once validated across diverse datasets they were not trained on. We've developed a hybrid model integrating mechanistic, clinical knowledge with mathematical and machine learning models to predict ICU mortality using ICD codes. A tree-structured network connecting independent modules that carry clinical meaning is implemented for interpretability. The trained model is then validated on external datasets from different hospitals, demonstrating successful generalization capabilities.
... The smart medical service platform is based on cloud technology and realizes the integration of various medical services through data docking with hospitals, health institutions and scientific research institutions [12,13]. Centered on the intelligent medical cloud platform, through the integration and reorganization of various medical services and data, build an intelligent medical cloud service ecosystem. ...
Full-text available
Due to the rapid changes in current technology, machine learning and high-performance computing in medical applications also usher in new development opportunities. They are widely used in medical data analysis, diagnostic decision-making, disease prediction, disease assisted diagnosis, disease prognosis evaluation, new drug research and development, health management, and other fields. The impact of medical application on daily life is also increasing, which makes the use of intelligent medical service decision-making more extensive. However, with the continuous improvement and development of the population’s physical fitness, the physical fitness of university students is deteriorating. Physical decline has become a common concern. Therefore, it is of great significance to investigate the physical condition of college students and find a more suitable method to promote the physical health of college students. It helps college students better engage in learning and life, enabling them to adapt to work faster and better meet the current social development needs for college students’ physical fitness. For this reason, this paper proposes the idea of building a smart supervision platform for college students’ physical health through smart medical service decision-making. Through empirical research on this platform, it is found that the method of building the platform proposed in this paper is more conducive to the improvement of college students’ physical health. The excellent grade of freshmen in this platform is 5.4% higher than that of the traditional platform, and the excellent grade of sophomores in the test is 6.31% higher than that of the traditional platform, the excellent grade of college students’ physical health test on this platform accounts for a higher proportion. The platform provides corresponding personalized sports programs through real-time monitoring of students’ physical health, so as to realize teaching students in accordance with their aptitude, scientifically guide students’ physical exercise, and accurately improve students’ physical health. Meanwhile, research on the use of big data in sports has also led to advances in machine learning and high performance computing for medical applications, which improves their shortcomings.
... This work is conducted as part of the use case Algorithmic Surveillance of Intensive Care Unit patients with ARDS (ASIC) which is part of the Smart Medical Information Technology for Healthcare (SMITH) project under the guidance of the German Federal Ministry of Education and Research (BMBF) [33,34]. Furthermore, the work described here paves the way for the future development of surrogate models from pre-established mechanistic disease representations, thus providing valuable tools to accelerate diagnosis in critical situations. ...
Full-text available
Acute Respiratory Distress Syndrome (ARDS) is a condition that endangers the lives of many Intensive Care Unit patients through gradual reduction of lung function. Due to its heterogeneity, this condition has been difficult to diagnose and treat, although it has been the subject of continuous research, leading to the development of several tools for modeling disease progression on the one hand, and guidelines for diagnosis on the other, mainly the “Berlin Definition”. This paper describes the development of a deep learning-based surrogate model of one such tool for modeling ARDS onset in a virtual patient: the Nottingham Physiology Simulator. The model-development process takes advantage of current machine learning and data-analysis techniques, as well as efficient hyperparameter-tuning methods, within a high-performance computing-enabled data science platform. The lightweight models developed through this process present comparable accuracy to the original simulator (per-parameter R2 > 0.90). The experimental process described herein serves as a proof of concept for the rapid development and dissemination of specialised diagnosis support systems based on pre-existing generalised mechanistic models, making use of supercomputing infrastructure for the development and testing processes and supported by open-source software for streamlined implementation in clinical routines.
With the growth of the social economy and technology, innovation networks have emerged as one of the most significant methods for analyzing the evolution of industrial innovation. Yet, there is a shortage of studies analyzing the components that influence network creation. By highly integrating digital technology with the traditional medical industry chain, the smart medical industry has become one of the important sectors of the digital economy. With the advent of internet-based diagnosis and treatment technologies, innovation inside the smart medical industry has taken the form of a network. This study aims to construct an innovation network by organizing and analyzing patent data from China's smart medical industry cooperation, covering the period from 2005 to 2022. The data is sourced from the IncoPat database. The analysis utilizes the Exponential Random Graph Model (ERGM) approach to conduct regression analysis on various factors. These factors include endogenous structural characteristics, node feature variables such as node emergence time and institutional attributes, as well as the distance network and IPC attribute network. By examining the driving mechanism and influence mechanism that influence the innovation network, this study contributes to the smart medical industry research by gaining a better understanding of the current status of innovation network, which can be advantageous for businesses in this field to accurately recognize and actively promote their innovation practices.
Full-text available
Background: For more than 30 years, there has been close cooperation between Japanese and German scientists with regard to information systems in health care. Collaboration has been formalized by an agreement between the respective scientific associations. Following this agreement, two joint workshops took place to explore the similarities and differences of electronic health record systems (EHRS) against the background of the two national healthcare systems that share many commonalities. Objectives: To establish a framework and requirements for the quality of EHRS that may also serve as a basis for comparing different EHRS. Methods: Donabedian's three dimensions of quality of medical care were adapted to the outcome, process, and structural quality of EHRS and their management. These quality dimensions were proposed before the first workshop of EHRS experts and enriched during the discussions. Results: The Quality Requirements Framework of EHRS (QRF-EHRS) was defined and complemented by requirements for high quality EHRS. The framework integrates three quality dimensions (outcome, process, and structural quality), three layers of information systems (processes and data, applications, and physical tools) and three dimensions of information management (strategic, tactical, and operational information management). Conclusions: Describing and comparing the quality of EHRS is in fact a multidimensional problem as given by the QRF-EHRS framework. This framework will be utilized to compare Japanese and German EHRS, notably those that were presented at the second workshop.
Full-text available
The growth of Big Data, especially personal data dispersed in multiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacy-preserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.
Full-text available
Introduction: The ability to reproducibly identify clinically equivalent patient populations is critical to the vision of learning health care systems that implement and evaluate evidence-based treatments. The use of common or semantically equivalent phenotype definitions across research and health care use cases will support this aim. Currently, there is no single consolidated repository for computable phenotype definitions, making it difficult to find all definitions that already exist, and also hindering the sharing of definitions between user groups. Method: Drawing from our experience in an academic medical center that supports a number of multisite research projects and quality improvement studies, we articulate a framework that will support the sharing of phenotype definitions across research and health care use cases, and highlight gaps and areas that need attention and collaborative solutions. Framework: An infrastructure for re-using computable phenotype definitions and sharing experience across health care delivery and clinical research applications includes: access to a collection of existing phenotype definitions, information to evaluate their appropriateness for particular applications, a knowledge base of implementation guidance, supporting tools that are user-friendly and intuitive, and a willingness to use them. Next Steps: We encourage prospective researchers and health administrators to re-use existing EHR-based condition definitions where appropriate and share their results with others to support a national culture of learning health care. There are a number of federally funded resources to support these activities, and research sponsors should encourage their use.
Conference Paper
Full-text available
We introduce JCoRe 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCoRe on firm software engineering ground, we uploaded it to GitHub, a social coding platform, with an underlying source code versioning system and various means to support collaboration for software development and code modification management. In order to automate the builds of complex NLP pipelines and properly represent and track dependencies of the underlying Java code, we incorporated Maven as part of our software configuration management efforts. In the meantime, we have deployed our artifacts on Maven Central, as well. JCoRe 2.0 offers a broad range of text analytics functionality (mostly) for English-language scientific abstracts and full-text articles, especially from the life sciences domain. (Full Text:
Full-text available
Introduction: In the time of increasing resistance and paucity of new drug development there is a growing need for strategies to enhance rational use of antibiotics in German and Austrian hospitals. An evidence-based guideline on recommendations for implementation of antibiotic stewardship (ABS) programmes was developed by the German Society for Infectious Diseases in association with the following societies, associations and institutions: German Society of Hospital Pharmacists, German Society for Hygiene and Microbiology, Paul Ehrlich Society for Chemotherapy, The Austrian Association of Hospital Pharmacists, Austrian Society for Infectious Diseases and Tropical Medicine, Austrian Society for Antimicrobial Chemotherapy, Robert Koch Institute. Materials and methods: A structured literature research was performed in the databases EMBASE, BIOSIS, MEDLINE and The Cochrane Library from January 2006 to November 2010 with an update to April 2012 (MEDLINE and The Cochrane Library). The grading of recommendations in relation to their evidence is according to the AWMF Guidance Manual and Rules for Guideline Development. Conclusion: The guideline provides the grounds for rational use of antibiotics in hospital to counteract antimicrobial resistance and to improve the quality of care of patients with infections by maximising clinical outcomes while minimising toxicity. Requirements for a successful implementation of ABS programmes as well as core and supplemental ABS strategies are outlined. The German version of the guideline was published by the German Association of the Scientific Medical Societies (AWMF) in December 2013.
Full-text available
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Full-text available
Objective: Mortality and morbidity of Staphylococcus aureus bacteremia (SAB) still remains considerably high. We aimed to evaluate the impact of infectious disease consultation (IDC) on the management and outcomes of patients with SAB. Methods: We systematically searched 3 publication databases from inception to 31st May 2015 and reference lists of identified primary studies. Results: Our search returned 2874 reports, of which 18 fulfilled the inclusion criteria, accounting for 5337 patients. Overall 30-day mortality was 19.95 % [95% CI 14.37 - 27.02] with a significant difference in favour of the IDC group (12.39 % vs 26.07 %) with a relative risk (RR) of 0.53 [95% CI 0.43 - 0.65]. 90-day mortality and relapse risk for SAB were also reduced significantly with RRs of 0.77 [95% CI 0.64-0.92] and 0.62 [95% CI 0.39-0.99], respectively. Both, the appropriateness of antistaphylococcal agent and treatment duration was improved by IDC (RR 1.14 [95% CI 1.08-1.20] and 1.85 [95% CI 1.39-2.46], respectively). Follow-up blood cultures and echocardiography were performed more frequently following IDC (RR 1.35 [95% CI 1.25-1.46] and 1.98 [95% CI 1.66-2.37], respectively). Conclusions: Evidence-based clinical management enforced by IDC may improve outcome of patients with SAB. Well-designed cluster-randomized controlled trials are needed to confirm this finding from observational studies.
Full-text available
The vision of creating accessible, reliable clinical evidence by accessing the clincial experience of hundreds of millions of patients across the globe is a reality. Observational Health Data Sciences and Informatics (OHDSI) has built on learnings from the Observational Medical Outcomes Partnership to turn methods research and insights into a suite of applications and exploration tools that move the field closer to the ultimate goal of generating evidence about all aspects of healthcare to serve the needs of patients, clinicians and all other decision-makers around the world.
Due to the specific needs of biomedical researchers, in-house development of software is widespread. A common problem is to maintain and enhance software after the funded project has ended. Even if many tools are made open source, only a couple of projects manage to attract a user basis large enough to ensure sustainability. Reasons for this include complex installation and configuration of biomedical software as well as an ambiguous terminology of the features provided; all of which make evaluation of software laborious. Docker is a para-virtualization technology based on Linux containers that eases deployment of applications and facilitates evaluation. We investigated a suite of software developments funded by a large umbrella organization for networked medical research within the last 10 years and created Docker containers for a number of applications to support utilization and dissemination.
Background: Strategic planning of information systems (IS) in healthcare requires descriptions of the current and the future IS state. Enterprise architecture planning (EAP) tools like the 3LGM² tool help to build up and to analyze IS models. A model of the planned architecture can be derived from an analysis of current state IS models. Building an interoperable IS, i. e. an IS consisting of interoperable components, can be considered a relevant strategic information management goal for many IS in healthcare. Integrating the healthcare enterprise (IHE) is an initiative which targets interoperability by using established standards. Objectives: To link IHE concepts to 3LGM² concepts within the 3LGM² tool. To describe how an information manager can be supported in handling the complex IHE world and planning interoperable IS using 3LGM² models. To describe how developers or maintainers of IHE profiles can be supported by the representation of IHE concepts in 3LGM². Methods: Conceptualization and concept mapping methods are used to assign IHE concepts such as domains, integration profiles actors and transactions to the concepts of the three-layer graph-based meta-model (3LGM²). Results: IHE concepts were successfully linked to 3LGM² concepts. An IHE-master-model, i. e. an abstract model for IHE concepts, was modeled with the help of 3LGM² tool. Two IHE domains were modeled in detail (ITI, QRPH). We describe two use cases for the representation of IHE concepts and IHE domains as 3LGM² models. Information managers can use the IHE-master-model as reference model for modeling interoperable IS based on IHE profiles during EAP activities. IHE developers are supported in analyzing consistency of IHE concepts with the help of the IHE-master-model and functions of the 3LGM² tool Conclusion: The complex relations between IHE concepts can be modeled by using the EAP method 3LGM². 3LGM² tool offers visualization and analysis features which are now available for the IHE-master-model. Thus information managers and IHE developers can use or develop IHE profiles systematically. In order to improve the usability and handling of the IHE-master-model and its usage as a reference model, some further refinements have to be done. Evaluating the use of the IHE-master-model by information managers and IHE developers is subject to further research.