Michael Christoph Thrun

Michael Christoph Thrun
Philipps University of Marburg | PUM · Faculty of Mathematics and Computer Science

Dr. habil., Dipl.-Phys.

About

72
Publications
27,962
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
478
Citations
Additional affiliations
May 2017 - July 2019
Philipps University of Marburg
Position
  • PostDoc Position
April 2017 - present
Philipps University of Marburg
Position
  • Analyst
Description
  • Databionics means the transfer of data processing techniques from nature to computers by using emergence and self-organization.
April 2014 - April 2017
Philipps University of Marburg
Position
  • Research Assistant
Description
  • Databionics means the transfer of data processing techniques from nature to computers by using emergence and self-organization.

Publications

Publications (72)
Article
Full-text available
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and...
Article
Full-text available
In provenance analysis, identifying the origin of the archaeological artifacts plays a significant role. Usually, this problem is addressed by discovering natural groups in data measured with spectroscopic techniques. Then, principal component and classical partitioning cluster analysis are employed to reveal the groups that supposedly define the o...
Article
Full-text available
The understanding of water quality and its underlying processes is important for the protection of aquatic environments. With the rare opportunity of access to a domain expert, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series. The XAI provides explanations that are interpretable by domain experts. In thre...
Article
Full-text available
The forecasting of univariate time series poses challenges in industrial applications if the seasonality varies. Typically, a non-varying seasonality of a time series is treated with a model based on Fourier theory or the aggregation of forecasts from multiple resolution levels. If the seasonality changes with time, various wavelet approaches for u...
Article
Full-text available
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures...
Article
Full-text available
Three different Flow Cytometry datasets consisting of diagnostic samples of either peripheral blood (pB) or bone marrow (BM) from patients without any sign of bone marrow disease at two different health care centers are provided. In Flow Cytometry, each cell rapidly passes through a laser beam one by one, and two light scatter, and eight surface pa...
Article
Full-text available
In principle, the fundamental data of companies may be used to select stocks with a high probability of either increasing or decreasing price. Many of the commonly known rules or used explanations for such a stock-picking process are too vague to be applied in concrete cases, and at the same time, it is challenging to analyze high-dimensional data...
Article
Full-text available
Clustering is an important task in knowledge discovery with the goal to identify structures of similar data points in a dataset. Here, the focus lies on methods that use a human-in-the-loop, i.e., incorporate user decisions into the clustering process through 2D and 3D displays of the structures in the data. Some of these interactive approaches fal...
Preprint
Full-text available
Although distance measures are used in many machine learning algorithms, the literature on the context-independent selection and evaluation of distance measures is limited in the sense that prior knowledge is used. In cluster analysis, current studies evaluate the choice of distance measure after applying unsupervised methods based on error probabi...
Article
Full-text available
Although distance measures are used in many machine learning algorithms, the literature on the context-independent selection and evaluation of distance measures is limited in the sense that prior knowledge is used. In cluster analysis, current studies evaluate the choice of distance measure after applying unsupervised methods based on error probabi...
Preprint
Full-text available
Typical state of the art flow cytometry data samples consists of measures of more than 100.000 cells in 10 or more features. AI systems are able to diagnose such data with almost the same accuracy as human experts. However, there is one central challenge in such systems: their decisions have far-reaching consequences for the health and life of peop...
Preprint
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and...
Preprint
Full-text available
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures...
Article
Full-text available
The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as “Fundamental Clustering Problems Suite” (FCPS) is provided. The software library is named “FCPS”, available in R on CRAN and accessible within Python. The input and output of clustering al...
Article
Full-text available
One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or cont...
Article
Full-text available
Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-...
Chapter
Full-text available
The Databionic swarm (DBS) is a flexible and robust clustering framework that consists of three independent modules: swarm-based projection, high-dimensional data visualization, and representation guided clustering. The first module is the parameter-free projection method Pswarm, which exploits concepts of self-organization and emergence, game theo...
Article
Full-text available
For high-dimensional datasets in which clusters are formed by both distance and density structures (DDS), many clustering algorithms fail to identify these clusters correctly. This is demonstrated for 32 clustering algorithms using a suite of datasets which deliberately pose complex DDS challenges for clustering. In order to improve the structure f...
Conference Paper
Full-text available
The Databionic swarm (DBS) is a flexible and robust clustering framework that consists of three independent modules: swarm based projection, high-dimensional data visualization and representation guided clustering. The first module is the parameter-free projection method Pswarm, which exploits concepts of self-organization and emergence, game theor...
Article
Background The Matutes score (MS) was proposed to differentiate chronic lymphocytic leukemia (CLL) from other B‐cell non‐Hodgkin lymphomas (B‐NHLs). However, ambiguous immunophenotypes are common and remain a diagnostic challenge. Therefore, we evaluated the diagnostic benefit of measuring CD200 and CD43 expression together with the standard MS ant...
Conference Paper
Full-text available
For many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored-Density plot (MDplot). The MDplot allows investigating the distributions of many vari...
Article
Full-text available
The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions...
Preprint
Full-text available
One aim of data mining is the identification of interesting structures in data. The basic properties of an empirical distribution, such as skewness and eventual clipping, i.e., hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to differe...
Conference Paper
Full-text available
Stocks of the German Prime standard have to publish financial reports every three months which were not used fully for fundamental analysis so far. Through web scrapping, an up-to-date high-dimensional dataset of 45 features of 269 companies was extracted, but finding meaningful cluster structures in a high-dimensional dataset with a low number of...
Method
Full-text available
An interactive tool to optimize the parameters of a GMM, called “AdaptGauss”, was realized using the freely available R software package (version 3.2.0 for Windows / version 3.2.1 for Linux; http://CRAN.R-project.org/). The newly devolved R library “AdaptGauss” is freely available at https://cran.r-project.org/web/packages/AdaptGauss/index.html. Fo...
Article
Full-text available
Objective: The purpose of this article is to show the value of exploratory data analysis performed on the multivariate time series dataset of gross domestic products per capita (GDP) of 160 countries for the years 1970-2010. New knowledge can be derived by applying cluster analysis to the time series of GDP to show how patterns in GDP can be explai...
Presentation
Full-text available
Human activities modify the global nitrogen cycle, mainly through farming. These practices have unintended consequences; for example, nitrate lost from terrestrial runoff to streams and estuaries can impact aquatic life [Aubert et al., 2016]. A greater understanding of water quality variations can improve the evaluation of the state of water bodies...
Presentation
Full-text available
Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space [1]. If the output space is restricted in the projection method to two dimensions, the result is a scatter plot. The goal of this scatter plot is a visualization of distance and density-ba...
Conference Paper
Full-text available
The methods and possibilities of data mining for knowledge discovery in economic data are demonstrated on data of the German system of allocating tax revenues to municipalities. This system is complex and not easily understandable due to the involvement of several layers of administration and legislation. The general aim of the system is that a sha...
Article
Full-text available
Heat pain and its modulation by capsaicin varies among subjects in experimental and clinical settings. A plausible cause is a genetic component, of which TRPV1 ion channels, by their response to both heat and capsaicin, are primary candidates. However, TRPA1 channels can heterodimerize with TRPV1 channels and carry genetic variants reported to modu...
Chapter
Full-text available
This chapter describes all the data sets used in the results chapter and the parameter settings for the various methods. In the final section, brief overviews of the Gene Ontology (GO) database and overrepresentation analysis (ORA) are provided. For general distribution analyses, the CRAN R package AdaptGauss [Thrun/Ultsch, 2015; Ultsch et al., 201...
Chapter
Full-text available
A new and data-driven approach for cluster analysis and visualization is introduced in this work. The projection based clustering combines structures preserved in two dimensions with underlying high-dimensional structures (see also [Thrun et al., 2017, Thrun/Ultsch, 2017a]). It is a flexible and robust approach for cluster analysis that consists of...
Chapter
Full-text available
Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold learning methods attempt to find sub-spaces in which the high-dimensional distances are preserved.
Chapter
Full-text available
In contrast to chapter 11, in which Databionic swarm (DBS) clustering was applied to recognize more or less obvious knowledge, this chapter shows that DBS is also able to discover new knowledge. A hydrological data set of multivariate time series [Aubert et al., 2016] and a data set consisting of pain genes [Ultsch et al., 2016b] are used for this...
Chapter
Full-text available
This chapter introduces a new concept for the use of swarm intelligence. It makes use of insights from the previous chapter and proposes a projection method based on a swarm of intelligent agents called DataBots [Ultsch, 2000c]. This new swarm is called a polar swarm (Pswarm) because its agents move in polar coordinates based on symmetry considerat...
Chapter
Full-text available
Many data mining methods rely on some concept of the similarity between pieces of information encoded in the data of interest. Various names have been applied to these clustering methods, depending largely on the field of application in data science. For example, in biology the term “numerical taxonomy” is used [Thorel et al., 1990], in psychology...
Chapter
Full-text available
Projection methods are a common approach to dimensionality reduction with the aim of transforming high-dimensional data into a low-dimensional space. For data visualization purposes, projections into two dimensions are considered here. However, when the output space is limited to two dimensions, the low-dimensional similarities cannot completely re...
Chapter
Full-text available
This chapter has three sections. In the first section, the results of the Databionic swarm (DBS) clustering framework are compared with the given prior classifications for data sets from the Fundamental Clustering Problems Suite (FCPS) [Ultsch, 2005a]. The results for nine data sets analyzed using common clustering algorithms are compared in the fi...
Chapter
Full-text available
The first section of this chapter familiarizes the reader with the definitions of the basic notation and terminology used in this thesis. Concepts of graph theory are introduced in the next section. They give rise to a new concept of neighborhoods, which is utilized in several chapters.
Chapter
Full-text available
Several real-world data sets are used in this chapter to show that Databionic swarm (DBS) is able to find clusters in a variety of cases. The leukemia data set is based on luminance measurements of 7747 different active or non-active genes in 554 human subjects. The World GDP data set is a multivariate time series that consists of monetary values f...
Chapter
Full-text available
Many technological advances have been achieved with the help of bionics, which is defined as the application of biological methods and systems found in nature. A related, rarely discussed subfield of information technology is called databionics. Databionics refers to the attempt to adopt information processing techniques from nature.
Chapter
Full-text available
Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold-learning methods attempt to find a sub-space in which the high-dimensional distances can be preserved.
Book
Full-text available
This book is published open access under a CC BY 4.0 license (for free download see http://www.springer.com/us/book/9783658205393 ) It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm(DBS). DBS consists of the 3D landscape vi...
Presentation
Full-text available
Many data mining methods rely on some concept of the dissimilarity between pieces of information encoded in the data of interest. These methods can be used for cluster analysis. However, no generally accepted definition of clusters exists in the literature [Hennig et al., 2015]. Here, consistent with Bouveyron et al., it is assumed that a cluster i...
Article
Full-text available
Lipid metabolism has been suggested to be a major pathophysiological mechanism of multiple sclerosis (MS). With the increasing knowledge about lipid signaling, acquired data become increasingly complex making bioinformatics necessary in lipid research. We used unsupervised machine-learning to analyze lipid marker serum concentrations, pursuing the...
Conference Paper
Full-text available
Planar projections, i.e. projections from a high dimensional data space onto a two dimensional plane, are still in use to detect structures, such as clusters , in multivariate data. It can be shown that only the subclass of focusing projections such as CCA, NeRV and the ESOM are able to disentangle linear non separable data. However, even these pro...
Article
Full-text available
Background: Pain in response to noxious cold has a complex molecular background probably involving several types of sensors. A recent observation has been the multimodal distribution of human cold pain thresholds. This study aimed at analysing reproducibility and stability of this observation and further exploration of data patterns supporting a c...
Article
Full-text available
High-frequency, in-situ monitoring provides large environmental datasets. These datasets will likely bring new insights in landscape functioning and process scale understanding. However, tailoring data analysis methods is necessary. Here, we detach our analysis from the usual temporal analysis performed in hydrology to determine if it is possible t...
Conference Paper
Full-text available
Dimensionality reduction by feature extraction is commonly used to project high-dimensional data into a low-dimensional space. With the aim to create a visualization of data, only projections onto two dimensions are considered here. Self-organizing maps were chosen as the projection method, which enabled the use of the U*-Matrix as an established m...
Article
Full-text available
Biomedical data obtained during cell experiments, laboratory animal research, or human studies often display a complex distribution. Statistical identification of subgroups in research data poses an analytical challenge. Here were introduce an interactive R-based bioinformatics tool, called “AdaptGauss”. It enables a valid identification of a biolo...
Conference Paper
Full-text available
Descriptions of income distributions using a single distribution, like Lognormal or Gamma are often quite poor in describing the tails of the distribution [1]. This led to separate models for the upper vs. lower parts of income distributions [2]. For example [3-5] describe the high-income region with the Pareto power laws. Other authors model the l...
Article
Full-text available
Whether overt attention in natural scenes is guided by object content or by low-level stimulus features has become a matter of intense debate. Experimental evidence seemed to indicate that once object locations in a scene are known, salience models provide little extra explanatory power. This approach has recently been criticised for using inadequa...
Article
Full-text available
The exact function of color vision for natural-scene perception has remained puzzling. In rapid serial visual presentation (RSVP) tasks, categorically defined targets (e.g., animals) are detected typically slightly better for color than for grayscale stimuli. Here we test the effect of color on animal detection, recognition, and the attentional bli...