• Home
  • Christian Lorenz Müller
Christian Lorenz Müller

Christian Lorenz Müller
Helmholtz Zentrum Munich; Ludwig-Maximilians-University of Munich; Flatiron Institute New York

PhD
Statistical method development, microbiome data analysis, convex optimization

About

124
Publications
24,267
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,335
Citations
Citations since 2016
89 Research Items
2181 Citations
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
Introduction
I am interested in developing novel computational statistics tools and their applications in (microbial) systems biology.
Additional affiliations
November 2019 - present
Ludwig-Maximilians-University of Munich
Position
  • Professor
September 2014 - June 2016
Simons Center for Data Analysis
Position
  • Researcher
October 2012 - September 2014
New York University
Position
  • PostDoc Position

Publications

Publications (124)
Preprint
Full-text available
Global biogeochemical ocean models are invaluable tools to examine how physical, chemical, and biological processes interact in the ocean. Satellite-derived ocean-color properties, on the other hand, provide observations of the surface ocean with unprecedented coverage and resolution. Advances in our understanding of marine ecosystems and biogeoche...
Preprint
In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and migh...
Article
We present a statistical learning framework for robust identification of differential equations from noisy spatio-temporal data. We address two issues that have so far limited the application of such methods, namely their robustness against noise and the need for manual parameter tuning, by proposing stability-based model selection to determine the...
Article
Full-text available
Remote sensing observations from satellites and global biogeochemical models have combined to revolutionize the study of ocean biogeochemical cycling, but comparing the two data streams to each other and across time remains challenging due to the strong spatial-temporal structuring of the ocean. Here, we show that the Wasserstein distance provides...
Preprint
Full-text available
Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects an...
Article
Full-text available
Statistical analysis of microbial genomic data within epidemiological cohort studies holds the promise to assess the influence of environmental exposures on both the host and the host-associated microbiome. However, the observational character of prospective cohort data and the intricate characteristics of microbiome data make it challenging to dis...
Article
Full-text available
The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host‐microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dyn...
Article
Full-text available
Extensive microdiversity within Prochlorococcus , the most abundant marine cyanobacterium, occurs at scales from a single droplet of seawater to ocean basins. To interpret the structuring role of variations in genetic potential, as well as metabolic and physiological acclimation, we developed a mechanistic constraint-based modeling framework that i...
Preprint
Full-text available
Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using t...
Article
Full-text available
Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing....
Preprint
Full-text available
The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dyn...
Article
Full-text available
Compositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA ( https://github.com/theislab/scCODA ), a Bayesian model addressing these issues enabling the study of complex cell type effects...
Preprint
Full-text available
Modern ocean datasets are large, multi-dimensional, and inherently spatiotemporal. A common oceanographic analysis task is the comparison of such datasets along one or several dimensions of latitude, longitude, depth, time as well as across different data modalities. Here, we show that the Wasserstein distance, also known as earth mover's distance,...
Article
Full-text available
We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving...
Preprint
Full-text available
We introduce GGLasso, a Python package for solving General Graphical Lasso problems. The Graphical Lasso scheme, introduced by (Friedman 2007) (see also (Yuan 2007; Banerjee 2008)), estimates a sparse inverse covariance matrix $\Theta$ from multivariate Gaussian data $\mathcal{X} \sim \mathcal{N}(\mu, \Sigma) \in \mathbb{R}^p$. Originally proposed...
Preprint
Full-text available
Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing....
Preprint
Full-text available
We present `latentcor`, an R package for correlation estimation from data with mixed variable types. Mixed variables types, including continuous, binary, ordinal, zero-inflated, or truncated data are routinely collected in many areas of science. Accurate estimation of correlations among such variables is often the first critical step in statistical...
Article
Full-text available
Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this c...
Preprint
Full-text available
We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving...
Article
Many biological high-throughput datasets, such as targeted amplicon-based and metagenomic sequencing data, are compositional. A common exploratory data analysis task is to infer robust statistical associations between high-dimensional microbial compositions and habitat- or host-related covariates. To address this, a general robust statistical regre...
Preprint
Full-text available
Many scientific datasets are compositional in nature. Important examples include species abundances in ecology, rock compositions in geology, topic compositions in large-scale text corpora, and sequencing count data in molecular biology. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition a...
Article
Full-text available
We propose a statistical learning framework based on group-sparse regression that can be used to (i) enforce conservation laws, (ii) ensure model equivalence, and (iii) guarantee symmetries when learning or inferring differential-equation models from data. Directly learning interpretable mathematical models from data has emerged as a valuable model...
Preprint
Full-text available
This paper describes the implementation of semi-structured deep distributional regression, a flexible framework to learn distributions based on a combination of additive regression models and deep neural networks. deepregression is implemented in both R and Python, using the deep learning libraries TensorFlow and PyTorch, respectively. The implemen...
Preprint
Full-text available
Statistical analysis of microbial genomic data within epidemiological cohort studies holds the promise to assess the influence of environmental exposures on both the host and the host-associated microbiome. The observational character of prospective cohort data and the intricate characteristics of microbiome data make it, however, challenging to di...
Article
Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having...
Preprint
Full-text available
Numerical methods for approximately solving partial differential equations (PDE) are at the core of scientific computing. Often, this requires high-resolution or adaptive discretization grids to capture relevant spatio-temporal features in the PDE solution, e.g., in applications like turbulence, combustion, and shock propagation. Numerical approxim...
Conference Paper
Full-text available
Numerical methods for approximately solving partial differential equations (PDE) are at the core of scientific computing. Often, this requires high-resolution or adaptive discretization grids to capture relevant spatio-temporal features in the PDE solution, e.g., in applications like turbulence, combustion, and shock propagation. Numerical approxim...
Article
Full-text available
Estimation of statistical associations in microbial genomic survey count data is fundamental to microbiome research. Experimental limitations, including count compositionality, low sample sizes and technical variability, obstruct standard application of association measures and require data normalization prior to statistical estimation. Here, we in...
Preprint
Full-text available
Compositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in...
Preprint
Full-text available
We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning $\textit{interpretable}$ mathematical models from data has emerged as...
Article
Full-text available
Motivation Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data...
Preprint
Full-text available
We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: \[ y = X \beta + \sigma \epsilon \qquad \textrm{subject to} \qquad C\beta=0 \] Here, $X \in \mathbb{R}^{n\times d}$is a given de...
Preprint
Full-text available
Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this c...
Preprint
Full-text available
Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalizati...
Preprint
Full-text available
Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having...
Article
Full-text available
Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize...
Preprint
Full-text available
Detecting community-wide statistical relationships from targeted amplicon-based and metagenomic profiling of microbes in their natural environment is an important step toward understanding the organization and function of these communities. We present a robust and computationally tractable latent graphical model inference scheme that allows simulta...
Preprint
Many high-throughput sequencing data sets in biology are compositional in nature. A prominent example is microbiome profiling data, including targeted amplicon-based and metagenomic sequencing data. These profiling data comprises surveys of microbial communities in their natural habitat and sparse proportional (or compositional) read counts that re...
Preprint
Full-text available
We present a statistical learning framework for robust identification of partial differential equations from noisy spatiotemporal data. Extending previous sparse regression approaches for inferring PDE models from simulated data, we address key issues that have thus far limited the application of these methods to noisy experimental data, namely the...
Article
Full-text available
The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more gene...
Article
Full-text available
High-throughput microbial sequencing techniques, such as targeted amplicon-based and metagenomic profiling, provide low-cost genomic survey data of microbial communities in their natural environment, ranging from marine ecosystems to host-associated habitats. While standard microbiome profiling data can provide sparse relative abundances of operati...
Preprint
Full-text available
Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize...
Preprint
Full-text available
High-throughput microbial sequencing techniques, such as targeted amplicon-based and metagenomic profiling, provide low-cost genomic survey data of microbial communities in their natural environment, ranging from marine ecosystems to host-associated habitats. While standard microbiome profiling data can provide sparse relative abundances of operati...
Preprint
Full-text available
Consistent normalization of microbial genomic survey count data is fundamental to modern microbiome research. Technical artifacts in these data often obstruct standard comparison of microbial composition across samples and experiments. To correct for sampling bias, library size, and technical variability, a number of different normalization methods...
Preprint
Full-text available
We introduce an optimization model for maximum likelihood-type estimation (M-estimation) that generalizes a large class of existing statistical models, including Huber's concomitant M-estimation model, Owen's Huber/Berhu concomitant model, the scaled lasso, support vector machine regression, and penalized estimation with structured sparsity. The mo...
Article
Full-text available
Background No microbe exists in isolation, and few live in environments with only members of their own kingdom or domain. As microbiome studies become increasingly more interested in the interactions between microbes than in cataloging which microbes are present, the variety of microbes in the community should be considered. However, the majority o...
Article
Full-text available
The design of systems or models that work robustly under uncertainty and environmental fluctuations is a key challenge in both engineering and science. This is formalized in the design-centering problem, which is defined as finding a design that fulfills given specifications and has a high probability of still doing so if the system parameters or t...
Article
Full-text available
Broad-spectrum antibiotics are frequently prescribed to children. Early childhood represents a dynamic period for the intestinal microbial ecosystem, which is readily shaped by environmental cues; antibiotic-induced disruption of this sensitive community may have long-lasting host consequences. Here we demonstrate that a single pulsed macrolide ant...