Content uploaded by Daniel Lüdecke

Author content

All content in this area was uploaded by Daniel Lüdecke on Aug 09, 2021

Content may be subject to copyright.

see: An R Package for Visualizing Statistical Models

Daniel Lüdecke1, Indrajeet Patil2, Mattan S. Ben-Shachar3, Brenton

M. Wiernik4, Philip Waggoner5, and Dominique Makowski6

1University Medical Center Hamburg-Eppendorf, Germany 2Center for Humans and Machines,

Max Planck Institute for Human Development, Berlin, Germany 3Ben-Gurion University of the

Negev, Israel 4Department of Psychology, University of South Florida, USA 5University of

Chicago, USA 6Nanyang Technological University, Singapore

DOI: 10.21105/joss.03393

Software

•Review

•Repository

•Archive

Editor: Frederick Boehm

Reviewers:

•@MatthewSmith430

•@jakobbossek

Submitted: 15 June 2021

Published: 06 August 2021

License

Authors of papers retain

copyright and release the work

under a Creative Commons

Attribution 4.0 International

License (CC BY 4.0).

Summary

The see package is embedded in the easystats ecosystem, a collection of R packages that

operate in synergy to provide a consistent and intuitive syntax when working with statistical

models in the R programming language (R Core Team, 2021). Most easystats packages

return comprehensive numeric summaries of model parameters and performance. The see

package complements these numeric summaries with a host of functions and tools to produce

a range of publication-ready visualizations for model parameters, predictions, and performance

diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization

for more informative, communicable, and well-rounded scientic reporting.

Statement of Need

The grammar of graphics (Wilkinson, 2012), largely due to its implementation in the ggplot2

package (Wickham, 2016), has become the dominant approach to visualization in R. Building

a model visualization with ggplot2 is somewhat disconnected from the model tting and

evaluation process. Generally, this process entails:

1. Fitting a model.

2. Extracting desired results from the model (e.g., model parameters and intervals, model

predictions, diagnostic statistics) and arranging them into a dataframe.

3. Passing the results dataframe to ggplot() and specifying the graphical parameters.

For example:

library(ggplot2)

# step-1

model <- lm(mpg ~factor(cyl) *wt, data = mtcars)

# step-2

results <- fortify(model)

# step-3

ggplot(results) +

geom_point(aes(x = wt, y = mpg, color = factor(cyl))) +

geom_line(aes(x = wt, y = .fitted, color = `factor(cyl)`))

Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:

//doi.org/10.21105/joss.03393

1

A number of packages have been developed to extend ggplot2 and assist with model visu-

alization.1Some of these packages provide functions for additional geoms, annotations, or

common visualization types without linking them to a specic statistical analysis or fundamen-

tally changing the ggplot2 workow (e.g., ggrepel,ggalluvial,ggridges,ggdist,ggpubr, etc.).

Other ggplot2 extensions provide functions to generate publication-ready visualizations for

specic types of models (e.g., metaviz,tidymv,sjPlot,survminer ). For example, the ggstat-

splot package (Patil, 2021) oers visualizations for statistical analysis of one-way factorial

designs, and the plotmm package (Waggoner, 2020) supports specic types of mixture model

objects. The fortify() function from ggfortify package (Horikoshi & Tang, 2018)does

oer a unied plotting framework for a wide range of statistical models, although it is not

as comprehensive as the see package because the easystats ecosystem covers a much larger

collection of statistical models.

The aim of the see package is to produce visualizations for a wide variety of models and

statistical analyses in a way that is tightly linked with the model tting process and requires

minimal interruption of users’ workow. see accomplishes this aim by providing a single

plot() method for objects created by the other easystats packages, such as parameters

tables, modelbased predictions, performance diagnostic tests, correlation matrices, and so on.

The easystats packages compute numeric results for a wide range of statistical models, and the

see package acts as a visual support to the entire easystats ecosystem. As such, visualizations

corresponding to all stages of statistical analysis, from model tting to diagnostics to reporting,

can be easily created using see.see plots are compatible with other ggplot2 functions for

further customization (e.g., labs() for a plot title). In addition, see provides several aesthetic

utilities to embellish both easystats plots and other ggplot2 plots. The result is a package

that minimizes the barrier to producing high-quality statistical visualizations in R.

The central goal of easystats is to make the task of doing statistics in R as easy as possi-

ble. This goal is realized through intuitive and consistent syntax, consistent and transparent

argument names, comprehensive documentation, informative warnings and error messages,

and smart functions with sensible default parameter values. The see package follows this

philosophy by using a single access point—the generic plot() method—for visualization of

all manner of statistical results supported by easystats.

Features

Below we present one or two plotting methods for each easystats package, but many other

methods are available. Interested readers are encouraged to explore the range of examples on

the package website, https://easystats.github.io/see/.

Themes and Palettes

The package includes dierent ggplot2 themes that one can set for each plot, or generally as

shown below:

ggplot2::theme_set(see::theme_modern())

The package provides also color palettes, such as scale_color_material or scale_color

_flat for material and at design colors (https://www.materialui.co/colors), respectively.

1For a sampling of these packages, visit https://exts.ggplot2.tidyverse.org/gallery/

Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:

//doi.org/10.21105/joss.03393

2

Visualizing Model Parameters

The parameters package converts summaries of regression model objects into dataframes

(Lüdecke et al., 2020). The see package can take this transformed object and, for example,

create a dot-and-whisker plot for the extracted regression estimates simply by passing the

parameters class object to plot().

library(parameters)

library(see)

library(ggplot2)

model <- lm(wt ~am *cyl, data = mtcars)

plot(parameters(model))

am * cyl

cyl

am

−2 −1 0

Coefficient

Parameter

Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:

//doi.org/10.21105/joss.03393

3

As see outputs objects of class ggplot,ggplot2 functions can be added as layers to the plot

the same as with all other ggplot2 visualizations. For example, we might add a title using

labs() from ggplot2.

library(parameters)

library(see)

model <- lm(wt ~am *cyl, data = mtcars)

# changing title and axis labels using ggplot2 functions

plot(parameters(model)) +

labs(title = "A Dot-and-Whisker Plot")+

scale_y_discrete(labels = c(

"transmission * cylinders",

"cylinders",

"transmission"

))

am * cyl

cyl

am

Coefficient

Parameter

A Dot−and−Whisker Plot

//doi.org/10.21105/joss.03393

4

Similarly, for Bayesian regression model objects, which are handled by the bayestestR pack-

age (Makowski et al., 2019), the see package provides special plotting methods relevant for

Bayesian models (e.g., Highest Density Interval, or HDI). Users can t the model and pass

the model results, extracted via bayestestR, to plot().

library(bayestestR)

library(rstanarm)

library(see)

model <- stan_glm(wt ~mpg, data = mtcars, refresh = 0)

result <- hdi(model, ci = c(0.5,0.75,0.89,0.95))

plot(result)

mpg

−0.20 −0.15 −0.10 −0.05 0.00

Possible parameter values

Parameters

HDI

50%

75%

89%

95%

100%

Highest Density Interval (HDI)

//doi.org/10.21105/joss.03393

5

Visualizing Model Performance and Diagnostic Checks

The performance package is primarily concerned with checking regression model assumptions

(Lüdecke et al., 2021). The see package oers tools to visualize these assumption checks,

such as the normality of residuals. Users simply pass the t model object to the relevant

performance function (check_normality() in the example below). Then, this result can

be passed to plot() to produce a ggplot2 visualization of the check on normality of the

residuals.

library(performance)

library(see)

model <- lm(wt ~mpg, data = mtcars)

check <- check_normality(model)

#> Warning: Non-normality of residuals detected (p = 0.016).

plot(check, type = "qq")

−2

0

2

−2 −1 0 1 2

Standard Normal Distribution Quantiles

Sample − Normal Distribution Quantiles

Dots should fall along the line

Normality of Residuals

//doi.org/10.21105/joss.03393

6

Visualizing Eect Sizes

The eectsize package computes a variety of eect size metrics for tted models to assesses

the practical importance of observed eects (Ben-Shachar et al., 2020). In conjunction with

see, users are able to visualize the magnitude and uncertainty of eect sizes by passing the

model object to the relevant eectsize function (omega_squared() in the following example),

and then to plot().

library(effectsize)

library(see)

model <- aov(wt ~am *cyl, data = mtcars)

plot(omega_squared(model))

am:cyl

cyl

am

0.0 0.2 0.4 0.6

Omega2 (partial)

Parameter

//doi.org/10.21105/joss.03393

7

Visualizing Model Predictions and Marginal Eects

The modelbased package computes model-based estimates and predictions from tted models

(Makowski et al., 2020a). see provides methods to quickly visualize these model predictions

using estimate_prediction().estimate_means() computes marginal means, i.e. the

mean at each factor level averaged over other predictors.

library(modelbased)

library(see)

model <- lm(mpg ~wt *as.factor(cyl), data = mtcars)

means <- estimate_means(model)

plot(means)

10

15

20

25

30

35

468

cyl

mpg

Estimated Means (mpg ~ wt * as.factor(cyl))

//doi.org/10.21105/joss.03393

8

Visualizing Correlation Matrices

The correlation package provides a unied syntax and human-readable code to carry out many

types of correlation analysis (Makowski et al., 2020b). A user can run summary(correlati

on(data)) to create a construct a correlation matrix for the variables in a dataframe. With

see, this matrix can be passed to plot() to visualize these correlations in a correlation matrix.

library(correlation)

library(see)

results <- summary(correlation(iris))

plot(results)

Petal.Width Petal.Length Sepal.Width

Petal.Length

Sepal.Width

Sepal.Length

−1.0

−0.5

0.0

0.5

1.0

r

Licensing and Availability

see is licensed under the GNU General Public License (v3.0), with all source code openly devel-

oped and stored at GitHub (https://github.com/easystats/see), along with a corresponding

issue tracker for bug reporting and feature enhancements. In the spirit of honest and open

science, we encourage requests, tips for xes, feature updates, as well as general questions

and concerns via direct interaction with contributors and developers.

Acknowledgments

see is part of the collaborative easystats ecosystem. Thus, we thank the members of easystats

as well as the users.

//doi.org/10.21105/joss.03393

9

References

Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). eectsize: Estimation of eect

size indices and standardized parameters. Journal of Open Source Software,5(56), 2815.

https://doi.org/10.21105/joss.02815

Horikoshi, M., & Tang, Y. (2018). ggfortify: Data visualization tools for statistical analysis

results.https://CRAN.R-project.org/package=ggfortify

Lüdecke, D., Ben-Shachar, M. S., Patil, I., & Makowski, D. (2020). Extracting, computing and

exploring the parameters of statistical models using R. Journal of Open Source Software,

5(53), 2445. https://doi.org/10.21105/joss.02445

Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). per-

formance: An R package for assessment, comparison and testing of statistical models.

Journal of Open Source Software,6(60), 3139. https://doi.org/10.21105/joss.03139

Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR: Describing eects

and their uncertainty, existence and signicance within the Bayesian framework. Journal

of Open Source Software,4(40), 1541. https://doi.org/10.21105/joss.01541

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2020a). Estimation of

model-based predictions, contrasts and means. CRAN.https://github.com/easystats/

modelbased

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2020b). Methods and algorithms

for correlation analysis in R. Journal of Open Source Software,5(51), 2306. https://doi.

org/10.21105/joss.02306

Patil, I. (2021). Visualizations with statistical details: The ’ggstatsplot’ approach. Journal of

Open Source Software,6(61), 3167. https://doi.org/10.21105/joss.03167

R Core Team. (2021). R: A language and environment for statistical computing. R Foundation

for Statistical Computing. https://www.R-project.org/

Waggoner, P. D. (2020). plotmm: Tidy tools for visualizing mixture models.https://CRAN.

R-project.org/package=plotmm

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.

ISBN: 978-3-319-24277-4

Wilkinson, L. (2012). The Grammar of Graphics. In Handbook of computational statistics

(pp. 375–414). Springer. ISBN: 978-3540404644

//doi.org/10.21105/joss.03393

10