ArticlePDF Available

Abstract and Figures

This paper reviews our work in the development of visualization methods (implemented in R) for understanding and interpreting the effects of predictors in multivariate linear models (MLMs) of the form Y = XB + U, and some of their recent extensions. We begin with a description of and examples from the Hypothesis-error (HE) plots framework (utilizing the heplots package), wherein multivariate tests can be visualized via ellipsoids in 2D, 3D or all pairwise views for the Hypothesis and Error Sum of Squares and Products (SSP) matrices used in hypothesis tests. Such HE plots provide visual tests of significance: a term is significant by Roy’s test if and only if its H ellipsoid projects somewhere outside the E ellipsoid. These ideas extend naturally to repeated measures designs in the multivariate context. When the rank of the hypothesis matrix for a term exceeds 2, these effects can also be visualized in a reduced-rank canonical space via the candisc package, which also provides new data plots for canonical correlation problems. Finally, we discuss some recent work-in-progress: the extension of these methods to robust MLMs, development of generalizations of influence measures and diagnostic plots for MLMs (in the mvinfluence package).
Content may be subject to copyright.
A preview of the PDF is not available
... Clusters and data ellipse. To represent the clusters of events in biplot space we used the data or concentration ellipse 42 . The data ellipse represents a visual summary of a scatter plot indicating the means, standard deviations, correlation, and the slope of the regression line for two variables 43 . ...
Article
Full-text available
In this study, we reexamine the effect of two types of El Niño Southern Oscillation (ENSO) modes on Madden Julian Oscillation (MJO) activity in terms of the frequency of MJO phases. Evaluating all-season data, we identify two dominant zonal patterns of MJO frequency exhibiting prominent interannual variability. These patterns are structurally similar to the Wheeler and Hendon (Mon. Weather Rev. 132:1917–1932, 2004) RMM1 and RMM2 spatial patterns. The first pattern explains a higher frequency of MJO activity over the Maritime Continent and a lower frequency over the central Pacific Ocean and the western Indian Ocean, or vice versa. The second pattern is associated with a higher frequency of MJO active days over the eastern Indian Ocean and a lower frequency over the western Pacific, or vice versa. We find that these two types of MJO frequency patterns are related to the central Pacific and eastern Pacific ENSO modes. From the positive to the negative ENSO (central Pacific or eastern Pacific) phases, the respective MJO frequency patterns change their sign. The MJO frequency patterns are the lag response of the underlying ocean state. The coupling between ocean and atmosphere is exceedingly complex. The first MJO frequency pattern is most prominent during the negative central-Pacific (CP-type) ENSO phases (specifically during September–November and December-February seasons). The second MJO frequency pattern is most evident during the positive eastern-Pacific (EP-type) ENSO phases (specifically during March–May, June–August and September–November). Different zonal circulation patterns during CP-type and EP-type ENSO phases alter the mean moisture distribution throughout the tropics. The horizontal convergence of mean background moisture through intraseasonal winds are responsible for the MJO frequency anomalies during the two types of ENSO phases. The results here show how the MJO activity gets modulated on a regional scale in the presence of two types of ENSO events and can be useful in anticipating the seasonal MJO conditions from a predicted ENSO state.
... The function (candiscList) generalized CDA for all terms in a multivariate linear model; computing canonical scores and vectors [9]. In the past the CDA is restricted to a one way MANOVA [10]. The package (candisc) is generalized CDA for one term in a multivariate linear model computing canonical scores and vectors [1]. ...
... To analyze the response of each variable to the environments studied, the data were submitted to an analysis of variance (ANOVA) and, when a significant (Ftest p \ 0.05), the mean values were compared using the Tukey test (p \ 0.05). Subsequently, in order to obtain an integrated assessment of cyclic water stress in eucalyptus, the data were subjected to a multivariate analysis of variance (MANOVA) and to canonical discriminant analysis (CDA) using the Candisc package (Friendly and Sigal 2014). From the CDA data, a biplot graphic was generated (Fig. 6) to evaluate the multivariate differences between the treatments and to hierarchize the contribution (weight) of each variable on physiological responses after subjection to cyclic water stress. ...
Article
Drought is considered the main environmental factor limiting productivity in eucalyptus plantations in Brazil. However, recent studies have reported that exposure to water deficit conditions enables plants to respond to subsequent stresses. Thus, this study investigates the ecophysiological acclimatization of eucalyptus clones submitted to recurrent water deficit cycles. Eucalyptus seedlings were submitted to three recurrent water deficit cycles and anatomical, morphological and physiological changes were analyzed. The results were: (1) Eucalyptus seedlings responded to water deficits by directing carbohydrates to root and stem growth; (2) Size and number of stomata were reduced; (3) Stomatal conductance decreased which allowed the plants to reduce water losses through transpiration, increasing instantaneous water use efficiency; (4) The relationship between gas exchanges and available water contents allowed the seedlings to uptake the retained soil water at higher tensions; and, (5) Physiological recovery from subsequent water deficits became faster. As a result of these changes, the eucalyptus seedlings recovered from the same degree of water stress more rapidly.
... In the MANOVA setting, a collection of such univariate plots for each of the responses can be useful, but they do not show how the responses vary jointly. In this section we introduce some multivariate graphical methods (HE plots and canonical views) for the comparison of means (for further details, see Fox, Friendly, & Monette, 2009;Friendly, 2007;and, Friendly & Sigal, 2014) that can also be usefully applied to the comparison of covariance matrices. ...
Article
Full-text available
This is the supplemental appendix to “Visualizing Tests for Equality of Covariance Matrices,” in press, The American Statistician. It covers topics of interest that were considered too long or not sufficiently essential to include in the paper.
... In this section, we describe three simple, yet fundamental ideas behind our approach to visualizing data in relation to MLMs. These methods are explained in more detail in Friendly and Sigal (2014) and Fox, Friendly, and Monette (2009). (a) For any multivariate normal data, the graphical analog of the minimally sufficient statistics (µ, Σ) (mean vector and covariance matrix) is a data ellipsoid centered at µ whose size and shape is determined by Σ, and which can be viewed in 2D, 3D, and by other means. ...
Article
Full-text available
The development of more drought-tolerant cultivars is essential for the maintenance of global agricultural production. This study aimed to perform an early selection of drought-tolerant Coffea arabica genotypes by evaluating their functional divergence using morphological, anatomical and physiological analyses. Seedlings of 14 genotypes were subjected to the drought stress imposed by irrigation for 18 days. Growth and anatomical parameters, leaf water potential and gas exchanges were measured. Under irrigated conditions and prolonged drought (18 days), the divergence among the genotypes was determined mainly by morphological traits, such as leaf area, stem diameter and, consequently, shoot dry mass. Under moderate drought (14 days), parameters such as water potential, cuticle thickness, stomatal density, number of xylem vessels and water-use efficiency were important for the divergence of the group with the highest ability to maintain its water status. The genotypes 1, 2, 4, 11 and 12 have characteristics that contributed to the maintenance of water status, such as greater cuticle thickness, stomatal density, smaller number of xylem vessels and phloem thickness, bigger root length and greater water-use efficiency. The functional divergence combining morphological, anatomical and physiological analyses in response to the moderate drought indicated the early selection of the genotypes 1, 2,4, 11 and 12 as more drought tolerant during the seedling stage. KEYWORDS: Coffee; water-use efficiency; leaf water potential
Article
Full-text available
Statisticians recommend graphical displays but often use tables to present their own research results. Could graphs do better? We study the question by going through the tables in a recent issue of the Journal of the American Statistical Association. We show how it is possible to improve the presentations using graphs that actually take up less space than the original tables. We find a particularly effective tool to be multiple repeated line plots, with comparisons of interest connected by lines and separate comparisons isolated on different plots.
Article
The multivariate linear model is Y(n×m) = X (n×p) B (p×m) + E (n×m) The multivariate linear model can be fit with the lm function in R, where the left-hand side of the model comprises a matrix of response variables, and the right-hand side is specified exactly as for a univariate linear model (i.e., with a single response variable). This paper explains how to use the Anova and linearHypothesis functions in the car package to perform convenient hypothesis tests for parameters in multivariate linear models, including models for repeated-measures data.
Article
Many of the existing measures for influential subsets in univariate ordinary least squares (OLS) regression analysis have natural extensions to the multivariate regression setting. Such measures may be characterized by functions of the submatrices HI of the hat matrix H, where I is an index set of deleted cases, and QI, the submatrix of Q = E(EE) E , where E is the matrix of ordinary residuals. Two classes of measures are considered: f(·)tr[HIQI(I − HI − QI)(I − HI)] and f(·)det[(I − HI − QI)(I − HI)], where f is a scalar function of the dimensions of matrices and a and b are integers. These characterizations motivate us to consider separable leverage and residual components for multiple-case influence and are shown to have advantages in computing influence measures for subsets. In the recent statistical literature on regression analysis, much attention has been given to problems of detecting observations that, individually or jointly, exert a disproportionate influence on the outcome of univariate linear regression analysis and to assessing the influence of such cases, individually or jointly. By far the most popular approach is that of measuring the change in some feature of the analysis upon deletion of one or more of the cases. Various measures have been proposed that emphasize different aspects of influence on the regression. For a review of such methods, see Cook (1977, 1979), Belsley, Kuh, and Welsch (1980), Cook and Weisberg (1982), and Chatterjee and Hadi (1986, 1988). In this article we generalize some of the univariate measures of influence to the multivariate regression setting and then show that the generalized measures are special cases of two general classes of influence measures. There are other approaches to influence measures in regression diagnostics (see, for example, Cook 1986) that are not special cases of our general classes. The majority of the existing measures, however, are.