Martin MächlerETH Zurich | ETH Zürich · Department of Mathematics
Martin Mächler
Ph.D. math
About
65
Publications
99,476
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
124,240
Citations
Introduction
Senior Scientist and Lecturer in Compational Statistics, R Programming, Clustering, Data Mining, Robust Statistics.
R Core member, author of > 20 R packages; Secretary General of the R Foundation.
Additional affiliations
September 1991 - present
January 1990 - present
Publications
Publications (65)
This chapter presents graphical diagnostics and statistical tests, and discusses model selection for copulas.
This chapter introduces the main copula classes and the corresponding sampling procedures, along with some copula transformations that are important for practical purposes.
This chapter is concerned with more advanced topics in copula modeling such as the handling of ties, time series, and covariates (in a regression-like setting).
This chapter offers a basic introduction to copulas and presents their main properties along with the most important theoretical results such as the Fréchet-Hoeffding bounds, Sklar’s Theorem, and the invariance principle.
This chapter addresses the estimation of copulas from a parametric, semi-parametric, and nonparametric perspective.
Count data can be analyzed using generalized linear mixed models when observations are correlated in ways that require random effects. However, count data are often zero-inflated, containing more zeros than would be expected from the typical error distributions. We present a new package, glmmTMB, and compare it to other R packages that fit zero-inf...
Ecological phenomena are often measured in the form of count data. These data can be analyzed using generalized linear mixed models (GLMMs) when observations are correlated in ways that require random effects. However, count data are often zero-inflated , containing more zeros than would be expected from the standard error distributions used in GLM...
Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".
https://cran.r-project.org/package=glmmTMB
In this note, we explain how f (a) = log(1 − e −a) = log(1 − exp(−a)) can be computed accurately, in a simple and optimal manner, building on the two related auxiliary functions log1p(x) (= log(1 + x)) and expm1(x) (= exp(x) − 1 = e x − 1). The cutoff, a 0 , in use in R since 2004, is shown to be optimal both theoretically and empirically, using Rm...
Maximum likelihood or restricted maximum likelihood (REML) estimates of the
parameters in linear mixed-effects models can be determined using the lmer
function in the lme4 package for R. As for most model-fitting functions in R,
the model is described in an lmer call by a formula, in this case including
both fixed- and random-effects terms. The for...
This article introduces a graphical goodness-of-fit test for copulas in more than two dimensions. The test is based on pairs of variables and can thus be interpreted as a first-order approximation of the underlying dependence structure. The idea is to first transform pairs of data columns with the Rosenblatt transform to bivariate standard uniform...
https://cran.r-project.org/package=DescTools
Description Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' ``glue''.
It is shown how to set up, conduct, and analyze large simulation studies with
the new R package simsalapar = simulations simplified and launched parallel. A
simulation study typically starts with determining a collection of input
variables and their values on which the study depends, such as sample sizes,
dimensions, types and degrees of dependence...
The performance of known and new parametric estimators for Archimedean
copulas is investigated, with special focus on large dimensions and numerical
difficulties. In particular, method-of-moments-like estimators based on
pairwise Kendall's tau, a multivariate extension of Blomqvist's beta, minimum
distance estimators, the maximum-likelihood estimat...
The pcalg package for R (R Development Core Team (2010))is a tool for estimat-ing intervention effects when the true underlying causal structure is unknown. To this end, pcalg contains functions for estimating the causal structure using graphical models (functions skeleton, pc and fci) and functions for estimating intervention effects given an esti...
We present a tutorial and new publicly available computational tools for variable length Markov chains (vlmc). vlmc's are Markov chains with the additional at- tractive structure that their memories depend on a variable number of lagged values, depending on how the actual past (the lagged values) looks like. They build a very flexible class of tree...
Computer programming is an important component of statistical research and data analysis. It is a necessary skill for using sophisticated statistical packages and for writing custom scripts and software to perform data analysis using modern statistical methods. Emacs Speaks Statistics (ESS) provides an intelligent and consistent interface between t...
R package, available on CRAN
Explicit functional forms for the generator derivatives of well-known one-parameter Archimedean copulas are derived. These derivatives are essential for likelihood inference as they appear in the copula density, conditional distribution functions, and the Kendall distribution function. They are also required for several asymmetric extensions of Arc...
The management of time and holidays can prove crucial in applications that rely on historical data. Atypical example is the aggregation of a data set recorded in different time zones and under different daylight saving time rules. Besides the time zone conversion function, which is well supported by default classes in R, one might need functions to...
The package copula (formerly nacopula) has provided functionality for Archimedean copulas, one of them the "Frank copula". Recently, explicit formulas for the density of those copulas have allowed for maximum likelihood estimation in high (e.g., d = 150)) dimensions, (Hofert, Mächler, and McNeil (2012)). However for non-small dimensions, the evalua...
The package nacopula provides procedures for constructing nested Archimedean copulas in any dimensions and with any kind of nesting structure, generating vectors of random variates from the constructed objects, computing function values and probabilities of falling into hypercubes, as well as evaluation of characteristics such as Kendall's tau...
This is an R package (a piece of Software) to fit and do inference on mixed-effects models.
The package is Free Software (hence open-source) and the package and much documentation about it is freely available from CRAN at
https://cran.r-project.org/package=lme4
R package, now with formula interface, a nice plot() method, further predict(), residuals(), fitted() the same as other regression functions in R.
Linear algebra is at the core of many areas of statistical computing and from its inception the S lan- guage has supported numerical linear algebra via a matrix data type and several functions and operators, such as %*%, qr, chol, and solve. However, these data types and functions do not provide direct access to all of the facilities for ecient man...
Many proposals have been made to estimate a multivariate scatter matrix (sometimes “Covariance matrix”) robustly, i.e., from X1, X2,..., Xn, where Xj ∈ R p. Some authors, including ourselves have emphasized the importance of affine equivariance and high-breakdown point. Others have emphasized speed for largish p and have often relaxed or dropped th...
We implement a fast and efficient algorithm to compute qualitatively constrained smoothing and regression splines for quantile regression, exploiting the sparse struc-ture of the design matrices involved in the method. In a previous implementation, the linear program involved was solved using a simplex-like algorithm for quantile smooth-ing splines...
https://cran.r-project.org/package=gplots
r which the likelihood function is to be maximized as a function of d. h size of finite di#erence interval for numerical derivatives. By default (or if negative), h = min(0.1, eps.5 * (1+ abs(cllf))), where clff := log. max.likelihood (as returned) and eps.5 := sqrt(.Machine$double.neg.eps) (typically 1.05e-8). This only influences the cov, cor, an...
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achiev...
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in...
Scatterplot3d is an R package for the visualization of multivariate data in a three dimensional space. R is a "language for data analysis and graphics".
In this paper we discuss the features of the package. It is designed by exclusively making use of already existing functions of R and its graphics system and thus shows the extensibility of the R...
Scatterplot3d is an R package for the visualization of multivariate data in a three dimensional space. R is a “language for data analysis and graphics”. In this paper we discuss the features of the package. It is designed by exclusively making use of already existing functions of R and its graphics system and thus shows the extensibility of the R g...
Introduction to ESS The S and Splus packages provide sophisticated statistical and graphical routines for manipulating data. S-mode, the package on which ESS was based, provided a programming environment for data analysis and statistical programming, as well as an intelligent interface to the S process. The ESS (:= Emacs Speaks Statistics) package...
ing with a global input bandwidth. The default value is deriv=0. n.out number of output design points where the function has to be estimated; default is n.out=300. x.out vector of output design points where the function has to be estimated. The default is an equidistant grid of n.out points from min(x) to max(x). korder nonnegative integer giving t...
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iterativel...
Apart from kernel estimators, there have been quite a few different approaches of "generalized splines" for density estimation. In the present paper, Maximum Penalized Likelihood (mpl) approaches are reviewed. In conclusion, penalizing the log density seems most promising. In my "wp" approach for semi-parametric density estimation, a novel roughnes...
A new approach for non- or semiparametric density estimation allows to specify modes and antimodes. The new smoother is a Maximum Penalized Likelihood (MPL) estimate with a novel roughness penalty. It penalizes a relative change of curvature which allows considering modes and inflection points. For a given number of modes, the score function, l ' =...
A general class of maximum penalized likelihood (MPL) problems for curve estimation is considered. For the case of regression, robust k-th order smoothing splines are one example. For a very general form of the roughness penalty, there is a convenient (boundary value) differential equation characterizing the solution of the corresponding maximum pe...
Usual nonparametric regression estimators often show many little wiggles which do not appear to be necessary for a good description of the data. The new "Wp" smoother is a maximum penalized likelihood (MPL) estimate with a novel roughness penalty. It penalizes a relative change of curvature. This leads to disjoint classes of functions, each with gi...
We study and compare two types of connectionist learning methods
for model-free regression problems: 1) the backpropagation learning
(BPL); and 2) the projection pursuit learning (PPL) emerged in recent
years in the statistical estimation literature. Both the BPL and the PPL
are based on projections of the data in directions determined from
interco...
We studied and compared two types of connectionist learning methods for model-free regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the...
Usual non-parametric regression estimators such as smoothing splines or kernel estimators are good tools for optimal mean squared error approximation of many smooth functions. However, they often show many little wiggles which do not appear to be necessary for a good description of the data. The new "Wp" smoother is a Maximum Penalized Likelihood e...
Two types of learning networks for nonparametric regression
problems are studied and compared: one is the parametric two-layer
perceptron type neural network, which is well known in artificial neural
network (ANN) literature; the other is the semiparametric projection
pursuit network (PPN), which has emerged in recent years in the
statistical estim...