Bernhard C. Geiger

Bernhard C. Geiger
Know-Center · Knowledge Discovery

PhD

About

136
Publications
20,118
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,253
Citations
Additional affiliations
June 2018 - present
Know-Center
Position
  • Senior Researcher
September 2017 - May 2018
Graz University of Technology
Position
  • PostDoc Position
November 2014 - July 2017
Technical University of Munich
Position
  • Senior Researcher
Education
March 2010 - June 2014
Graz University of Technology
Field of study
  • Electrical Engineering
September 2004 - November 2009
Graz University of Technology
Field of study
  • Electrical Engineering

Publications

Publications (136)
Article
Finite precision approximations of discrete probability distributions are considered, applicable for distribution synthesis, e.g., probabilistic shaping. Two algorithms are presented that find the optimal $M$-type approximation $Q$ of a distribution $P$ in terms of the variational distance $| Q-P|_1$ and the informational divergence $\mathbb{D}(Q|...
Article
Full-text available
In this paper, we present a method for reducing a regular, discrete-time Markov chain (DTMC) to another DTMC with a given, typically much smaller number of states. The cost of reduction is defined as the Kullback-Leibler divergence rate between a projection of the original process through a partition function and the a DTMC on the correspondingly p...
Article
Full-text available
A lumping of a Markov chain is a coordinatewise projection of the chain. We characterise the entropy rate preservation of a lumping of an aperiodic and irreducible Markov chain on a finite state space by the random growth rate of the cardinality of the realisable preimage of a finite-length trajectory of the lumped chain and by the information need...
Preprint
Full-text available
We propose the novel concept of anomaly-free regions (AFR) to improve anomaly detection. An AFR is a region in the data space for which it is known that there are no anomalies inside it, e.g., via domain knowledge. This region can contain any number of normal data points and can be anywhere in the data space. AFRs have the key advantage that they c...
Article
Full-text available
Article Many-Objective Simulation Optimization for Camp Location Problems in Humanitarian Logistics Yani Xue 1,*, Miqing Li 2, Hamid Arabnejad 1, Diana Suleimenova 1, Alireza Jahani 1, Bernhard C. Geiger 3, Freek Boesjes 4, Anastasia Anagnostou 1, Simon J.E. Taylor 1, Xiaohui Liu 1, and Derek Groen 1,* 1 Department of Computer Science, Brunel Unive...
Preprint
Full-text available
A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the train...
Conference Paper
div class="section abstract"> Virtual sensing, i.e., the method of estimating quantities of interest indirectly via measurements of other quantities, has received a lot of attention in various fields: Virtual sensors have successfully been deployed in intelligent building systems, the process industry, water quality control, and combustion process...
Preprint
Full-text available
The turbulent jet ignition concept using prechambers is a promising solution to achieve stable combustion at lean conditions in large gas engines, leading to high efficiency at low emission levels. Due to the wide range of design and operating parameters for large gas engine prechambers, the preferred method for evaluating different designs is comp...
Preprint
Full-text available
Flamelet models are widely used in computational fluid dynamics to simulate thermochemical processes in turbulent combustion. These models typically employ memory-expensive lookup tables that are predetermined and represent the combustion process to be simulated. Artificial neural networks (ANNs) offer a deep learning approach that can store this t...
Conference Paper
Full-text available
Digital product passports (DPPs) are an emerging technology and are considered as enablers of sustainable and circular value chains as they support sustainable product management (SPM) by gathering and containing product life cycle data. However, some life cycle data are considered sensitive by stakeholders, resulting in a reluctance to share such...
Conference Paper
div class="section abstract"> Precise prediction of combustion parameters such as peak firing pressure (PFP) or crank angle of 50% burned mass fraction (MFB50) is essential for optimal engine control. These quantities are commonly determined from in-cylinder pressure sensor signals and are crucial to reach high efficiencies and low emissions. Highl...
Preprint
Full-text available
The information-theoretic framework promises to explain the predictive power of neural networks. In particular, the information plane analysis, which measures mutual information (MI) between input and representation as well as representation and output, should give rich insights into the training process. This approach, however, was shown to strong...
Preprint
Full-text available
Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that...
Article
Full-text available
Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation and noise, but there is a more fundamental...
Preprint
Full-text available
For a long time, machine learning (ML) has been seen as the abstract problem of learning relationships from data independent of the surrounding settings. This has recently been challenged, and methods have been proposed to include external constraints in the machine learning models. These methods usually come from application-specific fields, such...
Article
We introduce a novel approach to reconstructing the in-cylinder pressure trace from vibration signals recorded with common knock sensors. The proposed methodology is purely data-driven and employs a convolutional neural network that has two distinct branches. Each branch is allowed to learn individual aspects of the mapping process, with boundary c...
Preprint
Full-text available
We consider the problem of finding an input to a stochastic black box function such that the scalar output of the black box function is as close as possible to a target value in the sense of the expected squared error. While the optimization of stochastic black boxes is classic in (robust) Bayesian optimization, the current approaches based on Gaus...
Article
Full-text available
Physics-informed neural networks (PINNs) have emerged as a promising deep learning method, capable of solving forward and inverse problems governed by differential equations. Despite their recent advance, it is widely acknowledged that PINNs are difficult to train and often require a careful tuning of loss weights when data and physics loss functio...
Preprint
Learning invariant representations that remain useful for a downstream task is still a key challenge in machine learning. We investigate a set of related information funnels and bottleneck problems that claim to learn invariant representations from the data. We also propose a new element to this family of information-theoretic objectives: The Condi...
Article
This article introduces a method for the detection of knock occurrences in an internal combustion engine (ICE) using a 1-D convolutional neural network trained on in-cylinder pressure data. The model architecture is based on expected frequency characteristics of knocking combustion. All cycles were reduced to $60^{\circ }$ CA long windows with no...
Chapter
Full-text available
In the context of humanitarian support for forcibly displaced persons, camps play an important role in protecting people and ensuring their survival and health. A challenge in this regard is to find optimal locations for establishing a new asylum-seeker/unrecognized refugee or IDPs (internally displaced persons) camp. In this paper we formulate thi...
Article
Full-text available
In this paper, the authors investigated changes in mass concentrations of particulate matter (PM) during the Coronavirus Disease of 2019 (COVID-19) lockdown. Daily samples of PM1, PM2.5 and PM10 fractions were measured at an urban background sampling site in Zagreb, Croatia from 2009 to late 2020. For the purpose of meteorological normalization, th...
Article
Full-text available
An optimal control of the combustion process of an engine ensures lower emissions and fuel consumption plus high efficiencies. Combustion parameters such as the peak firing pressure (PFP) and the crank angle (CA) corresponding to 50% of mass fraction burned (MFB50) are essential for a closed-loop control strategy. These parameters are based on the...
Preprint
In this paper, we frame homogeneous-feature multi-task learning (MTL) as a hierarchical representation learning problem, with one task-agnostic and multiple task-specific latent representations. Drawing inspiration from the information bottleneck principle and assuming an additive independent noise model between the task-agnostic and task-specific...
Preprint
Online social networks are a dominant medium in everyday life to stay in contact with friends and to share information. In Twitter, users can connect with other users by following them, who in turn can follow back. In recent years, researchers studied several properties of social networks and designed random graph models to describe them. Many of t...
Preprint
We survey information-theoretic approaches to the reduction of Markov chains. Our survey is structured in two parts: The first part considers Markov chain coarse graining, which focuses on projecting the Markov chain to a process on a smaller state space that is informative}about certain quantities of interest. The second part considers Markov chai...
Article
Full-text available
Most engineering domains abound with models derived from first principles that have beenproven to be effective for decades. These models are not only a valuable source of knowledge, but they also form the basis of simulations. The recent trend of digitization has complemented these models with data in all forms and variants, such as process monitor...
Preprint
Physics-informed neural networks (PINNs) seamlessly integrate data and physical constraints into the solving of problems governed by differential equations. In settings with little labeled training data, their optimization relies on the complexity of the embedded physics loss function. Two fundamental questions arise in any discussion of frequently...
Article
Full-text available
Complex systems, abstractly represented as networks, are ubiquitous in everyday life. Analyzing and understanding these systems requires, among others, tools for community detection. As no single best community detection algorithm can exist, robustness across a wide variety of problem settings is desirable. In this work, we present Synwalk, a rando...
Article
Full-text available
The avoidance of scrap and the adherence to tolerances is an important goal in manufacturing. This requires a good engineering understanding of the underlying process. To achieve this, real physical experiments can be conducted. However, they are expensive in time and resources, and can slow down production. A promising way to overcome these drawba...
Preprint
This paper introduces a method for the detection of knock occurrences in an internal combustion engine (ICE) using a 1D convolutional neural network trained on in-cylinder pressure data. The model architecture was based on considerations regarding the expected frequency characteristics of knocking combustion. To aid the feature extraction, all cycl...
Preprint
We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities...
Experiment Findings
Full-text available
Document represents the final report of WP6 concluding its developments. This report presents (i) the final set of requirements and their KPIs, (ii) the Artificial Intelligence (AI) enabled use case workflows, as well as highlights (iii) the final outcomes of the integration process. It should be noted that all respective objectives have been succe...
Article
Full-text available
The technical world of today fundamentally relies on structural analysis in the form of design and structural mechanic simulations. A traditional and robust simulation method is the physics-based finite element method (FEM) simulation. FEM simulations in structural mechanics are known to be very accurate; however, the higher the desired resolution,...
Article
Full-text available
Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that...
Article
In this work, we investigate the use of three information-theoretic quantities--entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler (KL) divergence--to understand and study the behavior of already trained fully connected feedforward neural networks (NNs). We analyze the connection between t...
Article
We review the current literature concerned with information plane (IP) analyses of neural network (NN) classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review...
Article
Full-text available
Information plane analysis, describing the mutual information between the input and a hidden layer and between a hidden layer and the target over time, has recently been proposed to analyze the training of neural networks. Since the activations of a hidden layer are typically continuous-valued, this mutual information cannot be computed analyticall...
Article
Full-text available
Automated construction of location graphs is instrumental but challenging, particularly in logistics optimisation problems and agent-based movement simulations. Hence, we propose an algorithm for automated construction of location graphs, in which vertices correspond to geographic locations of interest and edges to direct travelling routes between...
Preprint
Full-text available
Recently a new type of deep learning method has emerged, called physics-informed neural networks. Despite their success in solving problems that are governed by partial differential equations, physics-informed neural networks are often difficult to train. Frequently reported convergence issues are still poorly understood and complicate the inferenc...
Preprint
Full-text available
Drive towards improved performance of machine learning models has led to the creation of complex features representing a database of condensed matter systems. The complex features, however, do not offer an intuitive explanation on which physical attributes do improve the performance. The effect of the database on the performance of the trained mode...
Preprint
Complex systems, abstractly represented as networks, are ubiquitous in everyday life. Analyzing and understanding these systems requires, among others, tools for community detection. As no single best community detection algorithm can exist, robustness across a wide variety of problem settings is desirable. In this work, we present Synwalk, a rando...
Preprint
Full-text available
Automated construction of location graphs is instrumental but challenging, particularly in logistics optimisation problems and agent-based movement simulations. Hence, we propose an algorithm for automated construction of location graphs, in which vertices correspond to geographic locations of interest and edges to direct travelling routes between...
Article
Full-text available
The phenomenon of knock is an abnormal combustion occurring in spark-ignition (SI) engines and forms a barrier that prevents an increase in thermal efficiency while simultaneously reducing CO2 emissions. Since knocking combustion is highly stochastic, a cyclic analysis of in-cylinder pressure is necessary. In this study we propose an approach for e...
Article
Full-text available
The information bottleneck (IB) framework, proposed in [...]
Article
Full-text available
In this short note, we relate the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional, respectively. Although the two functionals were shown to be equivalent, it was empirically observed that optimizing bounds on the CEB functional achieves...
Article
We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and is implemented in a typical Wasserstein autoencoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian comp...
Preprint
Distance-based classification is among the most competitive classification methods for time series data. The most critical component of distance-based classification is the selected distance function. Past research has proposed various different distance metrics or measures dedicated to particular aspects of real-world time series data, yet there i...
Article
Full-text available
Abstract Spectator periodicals contributed to spreading the ideas of the Age of Enlightenment, a turning point in human history and the foundation of our modern societies. In this work, we study the spirit and atmosphere captured in the spectator periodicals about important social issues from the 18th century by analyzing text sentiment of those pe...
Chapter
Full-text available
Accurate digital twinning of the global challenges (GC) leads to computationally expensive coupled simulations. These simulations bring together not only different models, but also various sources of massive static and streaming data sets. In this paper, we explore ways to bridge the gap between traditional high performance computing (HPC) and data...
Conference Paper
Full-text available
Accurate digital twinning of the global challenges (GC) leads to computationally expensive coupled simulations. These simulations bring together not only different models, but also various sources of massive static and streaming data sets. In this paper, we explore ways to bridge the gap between traditional high performance computing (HPC) and data...
Preprint
We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functi...
Preprint
We review the current literature concerned with information plane analyses of neural network classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review this evide...
Technical Report
This paper presents a hybrid model for the prediction of magnetostriction in power transformers by leveraging the strengths of a data-driven approach and a physics-based model. Specifically, a non-linear physics-based model for magnetostriction as a function of the magnetic field is employed, the parameters of which are estimated as linear combinat...
Preprint
We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussi...
Preprint
In this draft, which reports on work in progress, we 1) adapt the information bottleneck functional by replacing the compression term by class-conditional compression, 2) relax this functional using a variational bound related to class-conditional disentanglement, 3) consider this functional as a training objective for stochastic neural networks, a...
Article
Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this work, we go one step further and analyse test data of devices still within the limits of th...
Preprint
This short note presents results about the symmetric Jensen-Shannon divergence between two discrete mixture distributions $p_1$ and $p_2$. Specifically, for $i=1,2$, $p_i$ is the mixture of a common distribution $q$ and a distribution $\tilde{p}_i$ with mixture proportion $\lambda_i$. In general, $\tilde{p}_1\neq \tilde{p}_2$ and $\lambda_1\neq\lam...
Chapter
The examples in the previous chapter showed that relative information loss yields counter-intuitive results for many practical systems such as quantizers, center clippers, etc. Also PCA was shown not to be useful in information-theoretic terms, at least if we do not know anything about the input data. All these results can be traced back to the fac...
Chapter
This chapter contains the last generalization of information loss in this work: The generalization of relevant information loss from RVs to stationary stochastic processes. While it is the scenario with the greatest practical importance, it is hard to obtain results for large classes of systems. Therefore, in what follows, only the application of a...
Chapter
In the preceding chapters we presented several quantities for the information that is lost in signal processing systems.
Chapter
In this section, we treat the class of systems that can be described by piecewise bijective functions. We call a function piecewise bijective if every output value originates from at most countably many input values, i.e., if the preimage of every output value is an at most countable set.
Chapter
In Sect. 1. 3 we analyzed the information loss in a deterministic system with a finite-entropy signal at its input. Here, we extend this analysis to discrete-time, discrete-valued, stationary stochastic processes.
Chapter
Extending the notion of an information loss rate to general processes is not trivial. It is even more difficult than generalizing the concept of information loss from discrete to continuous RVs. In this section we propose one possible generalization, making similar restrictions as in Chap. 3. Specifically, we focus on piecewise bijective functions...
Chapter
As it was already shown in Sect. 4. 2, not all systems can be described by piecewise bijective functions satisfying Definition 2. 1. We therefore extend also our measures of relative information loss to stationary stochastic processes. This endeavour is by no means easy, as there are several possible ways for such an extension, some of which we hin...
Chapter
Let X be an N-dimensional RV taking values from \(\mathcal {X}\subseteq \mathbb {R}^N\).
Chapter
So far we have looked at systems described by piecewise bijective functions.
Article
Full-text available
In this work, we characterize the outputs of individual neurons in a trained feed-forward neural network by entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler divergence. By cumulatively ablating neurons in the network, we connect these information-theoretic measures to the impact their re...
Article
Full-text available
In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, makin...
Article
Full-text available
We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain...
Book
This book introduces readers to essential tools for the measurement and analysis of information loss in signal processing systems. Employing a new information-theoretic systems theory, the book analyzes various systems in the signal processing engineer’s toolbox: polynomials, quantizers, rectifiers, linear filters with and without quantization effe...
Article
The authors have recently defined the R\'enyi information dimension rate $d(\{X_t\})$ of a stationary stochastic process $\{X_t,\,t\in\mathbb{Z}\}$ as the entropy rate of the uniformly-quantized process divided by minus the logarithm of the quantizer step size $1/m$ in the limit as $m\to\infty$ (B. Geiger and T. Koch, "On the information dimension...
Article
Full-text available
This paper proposes an information-theoretic cost function for aggregating a Markov chain via a (possibly stochastic) mapping. The cost function is motivated by two objectives: 1) The process obtained by observing the Markov chain through the mapping should be close to a Markov chain, and 2) the aggregated Markov chain should retain as much of the...