Martin Mächler

Martin Mächler
ETH Zurich | ETH Zürich · Department of Mathematics

Ph.D. math

About

65
Publications
99,476
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
124,240
Citations
Introduction
Senior Scientist and Lecturer in Compational Statistics, R Programming, Clustering, Data Mining, Robust Statistics. R Core member, author of > 20 R packages; Secretary General of the R Foundation.
Additional affiliations
September 1991 - present
ETH Zurich
Position
  • Senior Scientist; Lecturer
January 1990 - present
University of Washington

Publications

Publications (65)
Chapter
This chapter presents graphical diagnostics and statistical tests, and discusses model selection for copulas.
Chapter
This chapter introduces the main copula classes and the corresponding sampling procedures, along with some copula transformations that are important for practical purposes.
Chapter
This chapter is concerned with more advanced topics in copula modeling such as the handling of ties, time series, and covariates (in a regression-like setting).
Chapter
This chapter offers a basic introduction to copulas and presents their main properties along with the most important theoretical results such as the Fréchet-Hoeffding bounds, Sklar’s Theorem, and the invariance principle.
Chapter
This chapter addresses the estimation of copulas from a parametric, semi-parametric, and nonparametric perspective.
Article
Full-text available
Count data can be analyzed using generalized linear mixed models when observations are correlated in ways that require random effects. However, count data are often zero-inflated, containing more zeros than would be expected from the typical error distributions. We present a new package, glmmTMB, and compare it to other R packages that fit zero-inf...
Preprint
Full-text available
Ecological phenomena are often measured in the form of count data. These data can be analyzed using generalized linear mixed models (GLMMs) when observations are correlated in ways that require random effects. However, count data are often zero-inflated , containing more zeros than would be expected from the standard error distributions used in GLM...
Article
Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".
Technical Report
Full-text available
In this note, we explain how f (a) = log(1 − e −a) = log(1 − exp(−a)) can be computed accurately, in a simple and optimal manner, building on the two related auxiliary functions log1p(x) (= log(1 + x)) and expm1(x) (= exp(x) − 1 = e x − 1). The cutoff, a 0 , in use in R since 2004, is shown to be optimal both theoretically and empirically, using Rm...
Article
Full-text available
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The for...
Article
This article introduces a graphical goodness-of-fit test for copulas in more than two dimensions. The test is based on pairs of variables and can thus be interpreted as a first-order approximation of the underlying dependence structure. The idea is to first transform pairs of data columns with the Rosenblatt transform to bivariate standard uniform...
Code
https://cran.r-project.org/package=DescTools
Technical Report
Full-text available
Description Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' ``glue''.
Article
Full-text available
It is shown how to set up, conduct, and analyze large simulation studies with the new R package simsalapar = simulations simplified and launched parallel. A simulation study typically starts with determining a collection of input variables and their values on which the study depends, such as sample sizes, dimensions, types and degrees of dependence...
Article
Full-text available
The performance of known and new parametric estimators for Archimedean copulas is investigated, with special focus on large dimensions and numerical difficulties. In particular, method-of-moments-like estimators based on pairwise Kendall's tau, a multivariate extension of Blomqvist's beta, minimum distance estimators, the maximum-likelihood estimat...
Article
Full-text available
The pcalg package for R (R Development Core Team (2010))is a tool for estimat-ing intervention effects when the true underlying causal structure is unknown. To this end, pcalg contains functions for estimating the causal structure using graphical models (functions skeleton, pc and fci) and functions for estimating intervention effects given an esti...
Article
Full-text available
We present a tutorial and new publicly available computational tools for variable length Markov chains (vlmc). vlmc's are Markov chains with the additional at- tractive structure that their memories depend on a variable number of lagged values, depending on how the actual past (the lagged values) looks like. They build a very flexible class of tree...
Article
Full-text available
Computer programming is an important component of statistical research and data analysis. It is a necessary skill for using sophisticated statistical packages and for writing custom scripts and software to perform data analysis using modern statistical methods. Emacs Speaks Statistics (ESS) provides an intelligent and consistent interface between t...
Article
Explicit functional forms for the generator derivatives of well-known one-parameter Archimedean copulas are derived. These derivatives are essential for likelihood inference as they appear in the copula density, conditional distribution functions, and the Kendall distribution function. They are also required for several asymmetric extensions of Arc...
Article
The management of time and holidays can prove crucial in applications that rely on historical data. Atypical example is the aggregation of a data set recorded in different time zones and under different daylight saving time rules. Besides the time zone conversion function, which is well supported by default classes in R, one might need functions to...
Technical Report
Full-text available
The package copula (formerly nacopula) has provided functionality for Archimedean copulas, one of them the "Frank copula". Recently, explicit formulas for the density of those copulas have allowed for maximum likelihood estimation in high (e.g., d = 150)) dimensions, (Hofert, Mächler, and McNeil (2012)). However for non-small dimensions, the evalua...
Article
Full-text available
The package nacopula provides procedures for constructing nested Archimedean copulas in any dimensions and with any kind of nesting structure, generating vectors of random variates from the constructed objects, computing function values and probabilities of falling into hypercubes, as well as evaluation of characteristics such as Kendall's tau...
Book
This is an R package (a piece of Software) to fit and do inference on mixed-effects models. The package is Free Software (hence open-source) and the package and much documentation about it is freely available from CRAN at https://cran.r-project.org/package=lme4
Book
Full-text available
R package, now with formula interface, a nice plot() method, further predict(), residuals(), fitted() the same as other regression functions in R.
Article
Full-text available
Linear algebra is at the core of many areas of statistical computing and from its inception the S lan- guage has supported numerical linear algebra via a matrix data type and several functions and operators, such as %*%, qr, chol, and solve. However, these data types and functions do not provide direct access to all of the facilities for ecient man...
Article
Full-text available
Many proposals have been made to estimate a multivariate scatter matrix (sometimes “Covariance matrix”) robustly, i.e., from X1, X2,..., Xn, where Xj ∈ R p. Some authors, including ourselves have emphasized the importance of affine equivariance and high-breakdown point. Others have emphasized speed for largish p and have often relaxed or dropped th...
Article
Full-text available
We implement a fast and efficient algorithm to compute qualitatively constrained smoothing and regression splines for quantile regression, exploiting the sparse struc-ture of the design matrices involved in the method. In a previous implementation, the linear program involved was solved using a simplex-like algorithm for quantile smooth-ing splines...
Code
Full-text available
https://cran.r-project.org/package=gplots
Article
r which the likelihood function is to be maximized as a function of d. h size of finite di#erence interval for numerical derivatives. By default (or if negative), h = min(0.1, eps.5 * (1+ abs(cllf))), where clff := log. max.likelihood (as returned) and eps.5 := sqrt(.Machine$double.neg.eps) (typically 1.05e-8). This only influences the cov, cor, an...
Article
Full-text available
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achiev...
Article
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in...
Article
Full-text available
Scatterplot3d is an R package for the visualization of multivariate data in a three dimensional space. R is a "language for data analysis and graphics". In this paper we discuss the features of the package. It is designed by exclusively making use of already existing functions of R and its graphics system and thus shows the extensibility of the R...
Article
Full-text available
Scatterplot3d is an R package for the visualization of multivariate data in a three dimensional space. R is a “language for data analysis and graphics”. In this paper we discuss the features of the package. It is designed by exclusively making use of already existing functions of R and its graphics system and thus shows the extensibility of the R g...
Article
Introduction to ESS The S and Splus packages provide sophisticated statistical and graphical routines for manipulating data. S-mode, the package on which ESS was based, provided a programming environment for data analysis and statistical programming, as well as an intelligent interface to the S process. The ESS (:= Emacs Speaks Statistics) package...
Article
Full-text available
ing with a global input bandwidth. The default value is deriv=0. n.out number of output design points where the function has to be estimated; default is n.out=300. x.out vector of output design points where the function has to be estimated. The default is an equidistant grid of n.out points from min(x) to max(x). korder nonnegative integer giving t...
Article
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iterativel...
Article
Full-text available
Apart from kernel estimators, there have been quite a few different approaches of "generalized splines" for density estimation. In the present paper, Maximum Penalized Likelihood (mpl) approaches are reviewed. In conclusion, penalizing the log density seems most promising. In my "wp" approach for semi-parametric density estimation, a novel roughnes...
Article
A new approach for non- or semiparametric density estimation allows to specify modes and antimodes. The new smoother is a Maximum Penalized Likelihood (MPL) estimate with a novel roughness penalty. It penalizes a relative change of curvature which allows considering modes and inflection points. For a given number of modes, the score function, l ' =...
Article
A general class of maximum penalized likelihood (MPL) problems for curve estimation is considered. For the case of regression, robust k-th order smoothing splines are one example. For a very general form of the roughness penalty, there is a convenient (boundary value) differential equation characterizing the solution of the corresponding maximum pe...
Article
Full-text available
Usual nonparametric regression estimators often show many little wiggles which do not appear to be necessary for a good description of the data. The new "Wp" smoother is a maximum penalized likelihood (MPL) estimate with a novel roughness penalty. It penalizes a relative change of curvature. This leads to disjoint classes of functions, each with gi...
Article
Full-text available
We study and compare two types of connectionist learning methods for model-free regression problems: 1) the backpropagation learning (BPL); and 2) the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interco...
Article
We studied and compared two types of connectionist learning methods for model-free regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the...
Article
Full-text available
Usual non-parametric regression estimators such as smoothing splines or kernel estimators are good tools for optimal mean squared error approximation of many smooth functions. However, they often show many little wiggles which do not appear to be necessary for a good description of the data. The new "Wp" smoother is a Maximum Penalized Likelihood e...
Conference Paper
Two types of learning networks for nonparametric regression problems are studied and compared: one is the parametric two-layer perceptron type neural network, which is well known in artificial neural network (ANN) literature; the other is the semiparametric projection pursuit network (PPN), which has emerged in recent years in the statistical estim...

Network

Cited By