Michael Christoph Thrun

Michael Christoph Thrun
Philipps University of Marburg | PUM · Faculty of Mathematics and Computer Science

Dr. habil., Dipl.-Phys.

About

89
Publications
40,026
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
926
Citations
Additional affiliations
November 2019 - September 2020
Philipps University of Marburg
Position
  • PostDoc Position
April 2014 - March 2017
Philipps University of Marburg
Position
  • PhD
Description
  • Databionics means the transfer of data processing techniques from nature to computers by using emergence and self-organization.
April 2014 - April 2017
Philipps University of Marburg
Position
  • Research Assistant
Description
  • Databionics means the transfer of data processing techniques from nature to computers by using emergence and self-organization.

Publications

Publications (89)
Article
Full-text available
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and...
Article
Full-text available
The understanding of water quality and its underlying processes is important for the protection of aquatic environments. With the rare opportunity of access to a domain expert, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series. The XAI provides explanations that are interpretable by domain experts. In thre...
Article
Full-text available
The forecasting of univariate time series poses challenges in industrial applications if the seasonality varies. Typically, a non-varying seasonality of a time series is treated with a model based on Fourier theory or the aggregation of forecasts from multiple resolution levels. If the seasonality changes with time, various wavelet approaches for u...
Article
Full-text available
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures...
Article
Full-text available
Explainable AIs (XAIs) often do not provide relevant or understandable explanations for a domain-specific human-in-the-loop (HIL). In addition, internally used metrics have biases that might not match existing structures in the data. The habilitation thesis presents an alternative solution approach by deriving explanations from high dimensional str...
Book
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These run in both high- and low-dimensional environments and often have better results than traditional methods. Instead of a theoretical approach, it provides examples of how...
Book
Full-text available
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These operate in both high- and low-dimensional environments and often have better results than traditional methods. The basic principles and main methods are introduced with r...
Preprint
Full-text available
Diagnostic immunophenotyping of malignant non-Hodgkin-lymphoma (NHL) by multiparameter flow cytometry (MFC) relies on highly trained physicians. Artificial intelligence (AI) systems have been proposed for this diagnostic task, often requiring more learning examples than are usually available. In contrast, Flow XAI has reduced the number of needed l...
Article
Full-text available
Introduction Inflammatory conditions in patients have various causes and require different treatments. Bacterial infections are treated with antibiotics, while these medications are ineffective against viral infections. Autoimmune diseases and graft-versus-host disease (GVHD) after allogeneic stem cell transplantation, require immunosuppressive the...
Article
Full-text available
Typical state-of-the-art flow cytometry data samples typically consist of measures of 10 to 30 features of more than 100,000 cell “events”. Artificial intelligence (AI) systems are able to diagnose such data with almost the same accuracy as human experts. However, such systems face one central challenge: their decisions have far-reaching consequenc...
Article
Full-text available
Dimensionality reduction methods can be used to project high-dimensional data into low-dimensional space. If the output space is restricted to two dimensions, the result is a scatter plot whose goal is to present insightful visualizations of distance- and density-based structures. The topological invariance of dimension indicates that the two-dimen...
Article
Full-text available
The Gene Ontology (GO) knowledge base provides a standardized vocabulary of GO terms for describing gene functions and attributes. It consists of three directed acyclic graphs which represent the hierarchical structure of relationships between GO terms. GO terms enable the organization of genes based on their functional attributes by annotating gen...
Chapter
Full-text available
Research data obtained during economics or human studies experiments often displays a complex distribution. Even in the two-dimensional case, the statistical identification of subgroups in research data poses an analytical challenge. Here we introduce an interactive R-based tool called “AdaptGauss2D”. It enables a valid identification of a meaningf...
Article
Full-text available
Background: The International Prognostic Index (IPI) is applied to predict the outcome of chronic lymphocytic leukemia (CLL) with five prognostic factors, including genetic analysis. We investigated whether multiparameter flow cytometry (MPFC) data of CLL samples could predict the outcome by methods of explainable artificial intelligence (XAI). Fu...
Article
Full-text available
“Big omics data” provoke the challenge of extracting meaningful information with clinical benefit. Here, we propose a two-step approach, an initial unsupervised inspection of the structure of the high dimensional data followed by supervised analysis of gene expression levels, to reconstruct the surface patterns on different subtypes of acute myeloi...
Article
Full-text available
Minimal residual disease (MRD) detection is a strong predictor for survival and relapse in acute myeloid leukemia (AML). MRD can be either determined by molecular assessment strategies or via multiparameter flow cytometry. The degree of bone marrow (BM) dilution with peripheral blood (PB) increases with aspiration volume causing consecutive underes...
Article
Full-text available
Three different Flow Cytometry datasets consisting of diagnostic samples of either peripheral blood (pB) or bone marrow (BM) from patients without any sign of bone marrow disease at two different health care centers are provided. In Flow Cytometry, each cell rapidly passes through a laser beam one by one, and two light scatter, and eight surface pa...
Chapter
Full-text available
The analysis of gene expression data plays a crucial role in disease diagnosis. However, such analysis is rather complex due to the high-dimensional gene space. This works shows that semantically related genes can be grouped using biological knowledge in the Gene Ontology (GO) database. The GO defines knowledge phrases called GO terms describing bi...
Article
Full-text available
In principle, the fundamental data of companies may be used to select stocks with a high probability of either increasing or decreasing price. Many of the commonly known rules or used explanations for such a stock-picking process are too vague to be applied in concrete cases, and at the same time, it is challenging to analyze high-dimensional data...
Article
Full-text available
Clustering is an important task in knowledge discovery with the goal to identify structures of similar data points in a dataset. Here, the focus lies on methods that use a human-in-the-loop, i.e., incorporate user decisions into the clustering process through 2D and 3D displays of the structures in the data. Some of these interactive approaches fal...
Preprint
Full-text available
Although distance measures are used in many machine learning algorithms, the literature on the context-independent selection and evaluation of distance measures is limited in the sense that prior knowledge is used. In cluster analysis, current studies evaluate the choice of distance measure after applying unsupervised methods based on error probabi...
Article
Full-text available
Although distance measures are used in many machine learning algorithms, the literature on the context-independent selection and evaluation of distance measures is limited in the sense that prior knowledge is used. In cluster analysis, current studies evaluate the choice of distance measure after applying unsupervised methods based on error probabi...
Preprint
Full-text available
Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of peop...
Preprint
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and...
Preprint
Full-text available
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures...
Article
Full-text available
The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as “Fundamental Clustering Problems Suite” (FCPS) is provided. The software library is named “FCPS”, available in R on CRAN and accessible within Python. The input and output of clustering al...
Preprint
The understanding of water quality and its underlying processes is important for the protection of aquatic environments enabling the rare opportunity of access to a domain expert. Hence, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series resulting in explanations that are interpretable by a domain expert. T...
Article
Full-text available
One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or cont...
Article
Full-text available
Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-...
Chapter
Full-text available
The Databionic swarm (DBS) is a flexible and robust clustering framework that consists of three independent modules: swarm-based projection, high-dimensional data visualization, and representation guided clustering. The first module is the parameter-free projection method Pswarm, which exploits concepts of self-organization and emergence, game theo...
Article
Full-text available
For high-dimensional datasets in which clusters are formed by both distance and density structures (DDS), many clustering algorithms fail to identify these clusters correctly. This is demonstrated for 32 clustering algorithms using a suite of datasets which deliberately pose complex DDS challenges for clustering. In order to improve the structure f...
Article
Full-text available
In provenance analysis, identifying the origin of the archaeological artifacts plays a significant role. Usually, this problem is addressed by discovering natural groups in data measured with spectroscopic techniques. Then, principal component and classical partitioning cluster analysis are employed to reveal the groups that supposedly define the o...
Article
Full-text available
In provenance analysis, identifying the origin of the archaeological artifacts plays a significant role. Usually, this problem is addressed by discovering natural groups in data measured with spectroscopic techniques. Then, principal component and classical partitioning cluster analysis are employed to reveal the groups that supposedly define the o...
Conference Paper
Full-text available
The Databionic swarm (DBS) is a flexible and robust clustering framework that consists of three independent modules: swarm based projection, high-dimensional data visualization and representation guided clustering. The first module is the parameter-free projection method Pswarm, which exploits concepts of self-organization and emergence, game theor...
Article
Background The Matutes score (MS) was proposed to differentiate chronic lymphocytic leukemia (CLL) from other B‐cell non‐Hodgkin lymphomas (B‐NHLs). However, ambiguous immunophenotypes are common and remain a diagnostic challenge. Therefore, we evaluated the diagnostic benefit of measuring CD200 and CD43 expression together with the standard MS ant...
Conference Paper
Full-text available
For many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored-Density plot (MDplot). The MDplot allows investigating the distributions of many vari...
Preprint
Full-text available
Abstract. The understanding of water quality and its underlying processes is important for the protection of aquatic environments. Here an explainable AI (XAI) based multivariate time series analytical framework is applied on high-frequency water quality measurements including nitrate and electrical conductivity and twelve other environmental param...
Article
Full-text available
The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions...
Preprint
Full-text available
One aim of data mining is the identification of interesting structures in data. The basic properties of an empirical distribution, such as skewness and eventual clipping, i.e., hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to differe...
Conference Paper
Full-text available
Stocks of the German Prime standard have to publish financial reports every three months which were not used fully for fundamental analysis so far. Through web scrapping, an up-to-date high-dimensional dataset of 45 features of 269 companies was extracted, but finding meaningful cluster structures in a high-dimensional dataset with a low number of...
Method
Full-text available
An interactive tool to optimize the parameters of a GMM, called “AdaptGauss”, was realized using the freely available R software package (version 3.2.0 for Windows / version 3.2.1 for Linux; http://CRAN.R-project.org/). The newly devolved R library “AdaptGauss” is freely available at https://cran.r-project.org/web/packages/AdaptGauss/index.html. Fo...
Article
Full-text available
Objective: The purpose of this article is to show the value of exploratory data analysis performed on the multivariate time series dataset of gross domestic products per capita (GDP) of 160 countries for the years 1970-2010. New knowledge can be derived by applying cluster analysis to the time series of GDP to show how patterns in GDP can be explai...
Presentation
Full-text available
Human activities modify the global nitrogen cycle, mainly through farming. These practices have unintended consequences; for example, nitrate lost from terrestrial runoff to streams and estuaries can impact aquatic life [Aubert et al., 2016]. A greater understanding of water quality variations can improve the evaluation of the state of water bodies...
Presentation
Full-text available
Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space [1]. If the output space is restricted in the projection method to two dimensions, the result is a scatter plot. The goal of this scatter plot is a visualization of distance and density-ba...
Conference Paper
Full-text available
The methods and possibilities of data mining for knowledge discovery in economic data are demonstrated on data of the German system of allocating tax revenues to municipalities. This system is complex and not easily understandable due to the involvement of several layers of administration and legislation. The general aim of the system is that a sha...
Article
Full-text available
Heat pain and its modulation by capsaicin varies among subjects in experimental and clinical settings. A plausible cause is a genetic component, of which TRPV1 ion channels, by their response to both heat and capsaicin, are primary candidates. However, TRPA1 channels can heterodimerize with TRPV1 channels and carry genetic variants reported to modu...
Chapter
Full-text available
This chapter describes all the data sets used in the results chapter and the parameter settings for the various methods. In the final section, brief overviews of the Gene Ontology (GO) database and overrepresentation analysis (ORA) are provided. For general distribution analyses, the CRAN R package AdaptGauss [Thrun/Ultsch, 2015; Ultsch et al., 201...
Chapter
Full-text available
A new and data-driven approach for cluster analysis and visualization is introduced in this work. The projection based clustering combines structures preserved in two dimensions with underlying high-dimensional structures (see also [Thrun et al., 2017, Thrun/Ultsch, 2017a]). It is a flexible and robust approach for cluster analysis that consists of...
Chapter
Full-text available
Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold learning methods attempt to find sub-spaces in which the high-dimensional distances are preserved.
Chapter
Full-text available
In contrast to chapter 11, in which Databionic swarm (DBS) clustering was applied to recognize more or less obvious knowledge, this chapter shows that DBS is also able to discover new knowledge. A hydrological data set of multivariate time series [Aubert et al., 2016] and a data set consisting of pain genes [Ultsch et al., 2016b] are used for this...
Chapter
Full-text available
This chapter introduces a new concept for the use of swarm intelligence. It makes use of insights from the previous chapter and proposes a projection method based on a swarm of intelligent agents called DataBots [Ultsch, 2000c]. This new swarm is called a polar swarm (Pswarm) because its agents move in polar coordinates based on symmetry considerat...
Chapter
Full-text available
Many data mining methods rely on some concept of the similarity between pieces of information encoded in the data of interest. Various names have been applied to these clustering methods, depending largely on the field of application in data science. For example, in biology the term “numerical taxonomy” is used [Thorel et al., 1990], in psychology...
Chapter
Full-text available
Projection methods are a common approach to dimensionality reduction with the aim of transforming high-dimensional data into a low-dimensional space. For data visualization purposes, projections into two dimensions are considered here. However, when the output space is limited to two dimensions, the low-dimensional similarities cannot completely re...
Chapter
Full-text available
This chapter has three sections. In the first section, the results of the Databionic swarm (DBS) clustering framework are compared with the given prior classifications for data sets from the Fundamental Clustering Problems Suite (FCPS) [Ultsch, 2005a]. The results for nine data sets analyzed using common clustering algorithms are compared in the fi...
Chapter
Full-text available
The first section of this chapter familiarizes the reader with the definitions of the basic notation and terminology used in this thesis. Concepts of graph theory are introduced in the next section. They give rise to a new concept of neighborhoods, which is utilized in several chapters.
Chapter
Full-text available
Several real-world data sets are used in this chapter to show that Databionic swarm (DBS) is able to find clusters in a variety of cases. The leukemia data set is based on luminance measurements of 7747 different active or non-active genes in 554 human subjects. The World GDP data set is a multivariate time series that consists of monetary values f...
Chapter
Full-text available
Many technological advances have been achieved with the help of bionics, which is defined as the application of biological methods and systems found in nature. A related, rarely discussed subfield of information technology is called databionics. Databionics refers to the attempt to adopt information processing techniques from nature.
Chapter
Full-text available
Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold-learning methods attempt to find a sub-space in which the high-dimensional distances can be preserved.
Book
Full-text available
This book is published open access under a CC BY 4.0 license (for free download see http://www.springer.com/us/book/9783658205393 ) It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm(DBS). DBS consists of the 3D landscape vi...
Presentation
Full-text available
Many data mining methods rely on some concept of the dissimilarity between pieces of information encoded in the data of interest. These methods can be used for cluster analysis. However, no generally accepted definition of clusters exists in the literature [Hennig et al., 2015]. Here, consistent with Bouveyron et al., it is assumed that a cluster i...
Article
Full-text available
Lipid metabolism has been suggested to be a major pathophysiological mechanism of multiple sclerosis (MS). With the increasing knowledge about lipid signaling, acquired data become increasingly complex making bioinformatics necessary in lipid research. We used unsupervised machine-learning to analyze lipid marker serum concentrations, pursuing the...
Conference Paper
Full-text available
Planar projections, i.e. projections from a high dimensional data space onto a two dimensional plane, are still in use to detect structures, such as clusters , in multivariate data. It can be shown that only the subclass of focusing projections such as CCA, NeRV and the ESOM are able to disentangle linear non separable data. However, even these pro...
Article
Full-text available
Background: Pain in response to noxious cold has a complex molecular background probably involving several types of sensors. A recent observation has been the multimodal distribution of human cold pain thresholds. This study aimed at analysing reproducibility and stability of this observation and further exploration of data patterns supporting a c...
Article
Full-text available
High-frequency, in-situ monitoring provides large environmental datasets. These datasets will likely bring new insights in landscape functioning and process scale understanding. However, tailoring data analysis methods is necessary. Here, we detach our analysis from the usual temporal analysis performed in hydrology to determine if it is possible t...