Book

# Modern Applied Statistics with S

Authors:

## Abstract

A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods. The emphasis is on presenting practical problems and full analyses of real data sets.

## Chapters (6)

Statistics is fundamentally about understanding data. We start by looking at how data are represented in S, then move on to importing, exporting and manipulating data.
In linear regression the mean surface is a plane in sample space; in non-linear regression it may be an arbitrary curved surface but in all other respects the models are the same. Fortunately the mean surface in most non-linear regression models met in practice will be approximately planar in the region of highest likelihood, allowing some good approximations based on linear regression to be used, but non-linear regression models can still present tricky computational and inferential problems.
We collect together several ways to handle linear and non-linear models with random effects, possibly as well as fixed effects.
Multivariate analysis is concerned with datasets that have more than one response variable for each observational or experimental unit. The datasets can be summarized by data matrices X with n rows and p columns, the rows representing the observations or cases, and the columns the variables. The matrix can be viewed either way, depending on whether the main interest is in the relationships between the cases or between the variables. Note that for consistency we represent the variables of a case by the row vector x.
Classification is an increasingly important application of modern methods in statistics. In the statistical literature the word is used in two distinct senses. The entry (Hartigan, 1982) in the original Encyclopedia of Statistical Sciences uses the sense of cluster analysis discussed in Section 11.2. Modern usage is leaning to the other meaning (Ripley, 1997) of allocating future cases to one of g prespecified classes. Medical diagnosis is an archetypal classification problem in the modern sense. (The older statistical literature sometimes refers to this as allocation.)
Statisticians1 often under-estimate the usefulness of general optimization methods in maximizing likelihoods and in other model-fitting problems. Not only are the general-purpose methods available in the S environments quick to use, they also often outperform the specialized methods that are available. A lot of the software we have illustrated in earlier chapters is based on the functions described in this. Code that seemed slow when the first edition was being prepared in 1993 now seems almost instant.