PreprintPDF Available

Abstract

The see package is embedded in the easystats ecosystem, a collection of R packages that operate in synergy to provide a consistent and intuitive syntax when working with statistical models in the R programming language (R Core Team, 2021). Most easystats packages return comprehensive numeric summaries of model parameters and performance. The see package complements these numeric summaries with a host of functions and tools to produce a range of publication-ready visualizations for model parameters, predictions, and performance diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.
see: An R Package for Visualizing Statistical Models
Daniel Lüdecke1, Indrajeet Patil2, Mattan S. Ben-Shachar3, Brenton
M. Wiernik4, Philip Waggoner5, and Dominique Makowski6
1University Medical Center Hamburg-Eppendorf, Germany 2Center for Humans and Machines,
Max Planck Institute for Human Development, Berlin, Germany 3Ben-Gurion University of the
Negev, Israel 4Department of Psychology, University of South Florida, USA 5University of
Chicago, USA 6Nanyang Technological University, Singapore
DOI: 10.21105/joss.03393
Software
Review
Repository
Archive
Editor: Frederick Boehm
Reviewers:
@MatthewSmith430
@jakobbossek
Submitted: 15 June 2021
Published: 06 August 2021
License
Authors of papers retain
copyright and release the work
under a Creative Commons
Attribution 4.0 International
License (CC BY 4.0).
Summary
The see package is embedded in the easystats ecosystem, a collection of R packages that
operate in synergy to provide a consistent and intuitive syntax when working with statistical
models in the R programming language (R Core Team, 2021). Most easystats packages
return comprehensive numeric summaries of model parameters and performance. The see
package complements these numeric summaries with a host of functions and tools to produce
a range of publication-ready visualizations for model parameters, predictions, and performance
diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization
for more informative, communicable, and well-rounded scientic reporting.
Statement of Need
The grammar of graphics (Wilkinson, 2012), largely due to its implementation in the ggplot2
package (Wickham, 2016), has become the dominant approach to visualization in R. Building
a model visualization with ggplot2 is somewhat disconnected from the model tting and
evaluation process. Generally, this process entails:
1. Fitting a model.
2. Extracting desired results from the model (e.g., model parameters and intervals, model
predictions, diagnostic statistics) and arranging them into a dataframe.
3. Passing the results dataframe to ggplot() and specifying the graphical parameters.
For example:
library(ggplot2)
# step-1
model <- lm(mpg ~factor(cyl) *wt, data = mtcars)
# step-2
results <- fortify(model)
# step-3
ggplot(results) +
geom_point(aes(x = wt, y = mpg, color = factor(cyl))) +
geom_line(aes(x = wt, y = .fitted, color = `factor(cyl)`))
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
1
A number of packages have been developed to extend ggplot2 and assist with model visu-
alization.1Some of these packages provide functions for additional geoms, annotations, or
common visualization types without linking them to a specic statistical analysis or fundamen-
tally changing the ggplot2 workow (e.g., ggrepel,ggalluvial,ggridges,ggdist,ggpubr, etc.).
Other ggplot2 extensions provide functions to generate publication-ready visualizations for
specic types of models (e.g., metaviz,tidymv,sjPlot,survminer ). For example, the ggstat-
splot package (Patil, 2021) oers visualizations for statistical analysis of one-way factorial
designs, and the plotmm package (Waggoner, 2020) supports specic types of mixture model
objects. The fortify() function from ggfortify package (Horikoshi & Tang, 2018)does
oer a unied plotting framework for a wide range of statistical models, although it is not
as comprehensive as the see package because the easystats ecosystem covers a much larger
collection of statistical models.
The aim of the see package is to produce visualizations for a wide variety of models and
statistical analyses in a way that is tightly linked with the model tting process and requires
minimal interruption of users’ workow. see accomplishes this aim by providing a single
plot() method for objects created by the other easystats packages, such as parameters
tables, modelbased predictions, performance diagnostic tests, correlation matrices, and so on.
The easystats packages compute numeric results for a wide range of statistical models, and the
see package acts as a visual support to the entire easystats ecosystem. As such, visualizations
corresponding to all stages of statistical analysis, from model tting to diagnostics to reporting,
can be easily created using see.see plots are compatible with other ggplot2 functions for
further customization (e.g., labs() for a plot title). In addition, see provides several aesthetic
utilities to embellish both easystats plots and other ggplot2 plots. The result is a package
that minimizes the barrier to producing high-quality statistical visualizations in R.
The central goal of easystats is to make the task of doing statistics in R as easy as possi-
ble. This goal is realized through intuitive and consistent syntax, consistent and transparent
argument names, comprehensive documentation, informative warnings and error messages,
and smart functions with sensible default parameter values. The see package follows this
philosophy by using a single access point—the generic plot() method—for visualization of
all manner of statistical results supported by easystats.
Features
Below we present one or two plotting methods for each easystats package, but many other
methods are available. Interested readers are encouraged to explore the range of examples on
the package website, https://easystats.github.io/see/.
Themes and Palettes
The package includes dierent ggplot2 themes that one can set for each plot, or generally as
shown below:
ggplot2::theme_set(see::theme_modern())
The package provides also color palettes, such as scale_color_material or scale_color
_flat for material and at design colors (https://www.materialui.co/colors), respectively.
1For a sampling of these packages, visit https://exts.ggplot2.tidyverse.org/gallery/
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
2
Visualizing Model Parameters
The parameters package converts summaries of regression model objects into dataframes
(Lüdecke et al., 2020). The see package can take this transformed object and, for example,
create a dot-and-whisker plot for the extracted regression estimates simply by passing the
parameters class object to plot().
library(parameters)
library(see)
library(ggplot2)
model <- lm(wt ~am *cyl, data = mtcars)
plot(parameters(model))
am * cyl
cyl
am
−2 −1 0
Coefficient
Parameter
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
3
As see outputs objects of class ggplot,ggplot2 functions can be added as layers to the plot
the same as with all other ggplot2 visualizations. For example, we might add a title using
labs() from ggplot2.
library(parameters)
library(see)
model <- lm(wt ~am *cyl, data = mtcars)
# changing title and axis labels using ggplot2 functions
plot(parameters(model)) +
labs(title = "A Dot-and-Whisker Plot")+
scale_y_discrete(labels = c(
"transmission * cylinders",
"cylinders",
"transmission"
))
am * cyl
cyl
am
Coefficient
Parameter
A Dot−and−Whisker Plot
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
4
Similarly, for Bayesian regression model objects, which are handled by the bayestestR pack-
age (Makowski et al., 2019), the see package provides special plotting methods relevant for
Bayesian models (e.g., Highest Density Interval, or HDI). Users can t the model and pass
the model results, extracted via bayestestR, to plot().
library(bayestestR)
library(rstanarm)
library(see)
model <- stan_glm(wt ~mpg, data = mtcars, refresh = 0)
result <- hdi(model, ci = c(0.5,0.75,0.89,0.95))
plot(result)
mpg
−0.20 −0.15 −0.10 −0.05 0.00
Possible parameter values
Parameters
HDI
50%
75%
89%
95%
100%
Highest Density Interval (HDI)
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
5
Visualizing Model Performance and Diagnostic Checks
The performance package is primarily concerned with checking regression model assumptions
(Lüdecke et al., 2021). The see package oers tools to visualize these assumption checks,
such as the normality of residuals. Users simply pass the t model object to the relevant
performance function (check_normality() in the example below). Then, this result can
be passed to plot() to produce a ggplot2 visualization of the check on normality of the
residuals.
library(performance)
library(see)
model <- lm(wt ~mpg, data = mtcars)
check <- check_normality(model)
#> Warning: Non-normality of residuals detected (p = 0.016).
plot(check, type = "qq")
−2
0
2
−2 −1 0 1 2
Standard Normal Distribution Quantiles
Sample − Normal Distribution Quantiles
Dots should fall along the line
Normality of Residuals
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
6
Visualizing Eect Sizes
The eectsize package computes a variety of eect size metrics for tted models to assesses
the practical importance of observed eects (Ben-Shachar et al., 2020). In conjunction with
see, users are able to visualize the magnitude and uncertainty of eect sizes by passing the
model object to the relevant eectsize function (omega_squared() in the following example),
and then to plot().
library(effectsize)
library(see)
model <- aov(wt ~am *cyl, data = mtcars)
plot(omega_squared(model))
am:cyl
cyl
am
0.0 0.2 0.4 0.6
Omega2 (partial)
Parameter
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
7
Visualizing Model Predictions and Marginal Eects
The modelbased package computes model-based estimates and predictions from tted models
(Makowski et al., 2020a). see provides methods to quickly visualize these model predictions
using estimate_prediction().estimate_means() computes marginal means, i.e. the
mean at each factor level averaged over other predictors.
library(modelbased)
library(see)
model <- lm(mpg ~wt *as.factor(cyl), data = mtcars)
means <- estimate_means(model)
plot(means)
10
15
20
25
30
35
468
cyl
mpg
Estimated Means (mpg ~ wt * as.factor(cyl))
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
8
Visualizing Correlation Matrices
The correlation package provides a unied syntax and human-readable code to carry out many
types of correlation analysis (Makowski et al., 2020b). A user can run summary(correlati
on(data)) to create a construct a correlation matrix for the variables in a dataframe. With
see, this matrix can be passed to plot() to visualize these correlations in a correlation matrix.
library(correlation)
library(see)
results <- summary(correlation(iris))
plot(results)
Petal.Width Petal.Length Sepal.Width
Petal.Length
Sepal.Width
Sepal.Length
−1.0
−0.5
0.0
0.5
1.0
r
Licensing and Availability
see is licensed under the GNU General Public License (v3.0), with all source code openly devel-
oped and stored at GitHub (https://github.com/easystats/see), along with a corresponding
issue tracker for bug reporting and feature enhancements. In the spirit of honest and open
science, we encourage requests, tips for xes, feature updates, as well as general questions
and concerns via direct interaction with contributors and developers.
Acknowledgments
see is part of the collaborative easystats ecosystem. Thus, we thank the members of easystats
as well as the users.
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
9
References
Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). eectsize: Estimation of eect
size indices and standardized parameters. Journal of Open Source Software,5(56), 2815.
https://doi.org/10.21105/joss.02815
Horikoshi, M., & Tang, Y. (2018). ggfortify: Data visualization tools for statistical analysis
results.https://CRAN.R-project.org/package=ggfortify
Lüdecke, D., Ben-Shachar, M. S., Patil, I., & Makowski, D. (2020). Extracting, computing and
exploring the parameters of statistical models using R. Journal of Open Source Software,
5(53), 2445. https://doi.org/10.21105/joss.02445
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). per-
formance: An R package for assessment, comparison and testing of statistical models.
Journal of Open Source Software,6(60), 3139. https://doi.org/10.21105/joss.03139
Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR: Describing eects
and their uncertainty, existence and signicance within the Bayesian framework. Journal
of Open Source Software,4(40), 1541. https://doi.org/10.21105/joss.01541
Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2020a). Estimation of
model-based predictions, contrasts and means. CRAN.https://github.com/easystats/
modelbased
Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2020b). Methods and algorithms
for correlation analysis in R. Journal of Open Source Software,5(51), 2306. https://doi.
org/10.21105/joss.02306
Patil, I. (2021). Visualizations with statistical details: The ’ggstatsplot’ approach. Journal of
Open Source Software,6(61), 3167. https://doi.org/10.21105/joss.03167
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation
for Statistical Computing. https://www.R-project.org/
Waggoner, P. D. (2020). plotmm: Tidy tools for visualizing mixture models.https://CRAN.
R-project.org/package=plotmm
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
ISBN: 978-3-319-24277-4
Wilkinson, L. (2012). The Grammar of Graphics. In Handbook of computational statistics
(pp. 375–414). Springer. ISBN: 978-3540404644
Lüdecke et al., (2021). see: An R Package for Visualizing Statistical Models. Journal of Open Source Software, 6(64), 3393. https:
//doi.org/10.21105/joss.03393
10
Article
Full-text available
First identified in 1947, Zika virus took roughly 70 years to cause a pandemic unusually associated with virus-induced brain damage in newborns. Zika virus is transmitted by mosquitoes, mainly Aedes aegypti, and secondarily, Aedes albopictus, both colonizing a large strip encompassing tropical and temperate regions. As part of the international project ZIKAlliance initiated in 2016, 50 mosquito populations from six species collected in 12 countries were experimentally infected with different Zika viruses. Here, we show that Ae. aegypti is mainly responsible for Zika virus transmission having the highest susceptibility to viral infections. Other species play a secondary role in transmission while Culex mosquitoes are largely non-susceptible. Zika strain is expected to significantly modulate transmission efficiency with African strains being more likely to cause an outbreak. As the distribution of Ae. aegypti will doubtless expand with climate change and without new marketed vaccines, all the ingredients are in place to relive a new pandemic of Zika.
Preprint
Full-text available
First identified in 1947, Zika virus took roughly 70 years to cause a pandemic unusually associated with virus-induced brain damage in newborns. Zika virus is transmitted by mosquitoes, mainly Aedes aegypti , and secondarily, Aedes albopictus , both colonizing a large strip encompassing tropical and temperate regions. As part of the international project ZIKAlliance initiated in 2016, 50 mosquito populations from six species collected in 12 countries were experimentally infected with different Zika viruses. Here, we show that Ae. aegypti is mainly responsible for Zika virus transmission having the highest susceptibility to viral infections. Other species play a secondary role in transmission while Culex mosquitoes are largely non-susceptible. Zika strain is expected to significantly modulate transmission efficiency with African strains being more likely to cause an outbreak. As the distribution of Ae. aegypti will doubtless expand with climate change and without new marketed vaccines, all the ingredients are in place to relive a new pandemic of Zika.
Article
Full-text available
Graphical displays can reveal problems in a statistical model that might not be apparent from purely numerical summaries. Such visualizations can also be helpful for the reader to evaluate the validity of a model if it is reported in a scholarly publication or report. But, given the onerous costs involved, researchers often avoid preparing information-rich graphics and exploring several statistical approaches or tests available. The ggstatsplot package in the R programming language (R Core Team, 2021) provides a one-line syntax to enrich ggplot2-based visualizations with the results from statistical analysis embedded in the visualization itself. In doing so, the package helps researchers adopt a rigorous, reliable, and robust data exploratory and reporting workflow.
Article
Full-text available
A crucial part of statistical analysis is evaluating a model's quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort. The performance R-package (R Core Team, 2021) provides utilities for computing measures to assess model quality, many of which are not directly provided by R's base or stats packages. These include measures like R 2 , intraclass correlation coefficient (ICC), root mean squared error (RMSE), or functions to check for vexing issues like overdispersion, singularity, or zero-inflation. These functions support a large variety of regression models including generalized linear models, (generalized) mixed-effects models, their Bayesian cousins, and many others. Statement of Need While functions to build and produce diagnostic plots or to compute fit statistics exist, these are located across many packages, which results in a lack of a unique and consistent approach to assess the performance of many types of models. The result is a difficult-to-navigate, unorganized ecosystem of individual packages with different syntax, making it onerous for researchers to locate and use fit indices relevant for their unique purposes. The performance package in R fills this gap by offering researchers a suite of intuitive functions with consistent syntax for computing, building, and presenting regression model fit statistics and visualizations. performance is part of the easystats ecosystem, which is a collaborative project focused on facilitating simple and intuitive usage of R for statistical analysis (Ben-Shachar et al., performance package offers functions for checking validity and model quality systematically and comprehensively for many regression model objects such as (generalized) linear models, mixed-effects models, and Bayesian models. performance also offers functions to compare and test multiple models simultaneously to evaluate the best fitting model to the data.
Article
Full-text available
In both theoretical and applied research, it is often of interest to assess the strength of an observed association. This is typically done to allow the judgment of the magnitude of an effect (especially when units of measurement are not meaningful, e.g., in the use of estimated latent variables; Bollen, 1989), to facilitate comparing between predictors' importance within a given model, or both. Though some indices of effect size, such as the correlation coefficient (itself a standardized covariance coefficient) are readily available, other measures are often harder to obtain. effectsize is an R package (R Core Team, 2020) that fills this important gap, providing utilities for easily estimating a wide variety of standardized effect sizes (i.e., effect sizes that are not tied to the units of measurement of the variables of interest) and their confidence intervals (CIs), from a variety of statistical models. effectsize provides easy-to-use functions, with full documentation and explanation of the various effect sizes offered, and is also used by developers of other R packages as the back-end for effect size computation, such as parameters (Lüdecke et al., 2020), ggstatsplot (Patil, 2018), gtsummary (Sjoberg et al., 2020) and more. Comparison to Other Packages effectsize's functionality is in part comparable to packages like lm.beta (Behrendt, 2014), MOTE (Buchanan et al., 2019), and MBESS (K. Kelley, 2020). Yet, there are some notable differences, e.g.: • lm.beta provides standardized regression coefficients for linear models, based on post-hoc model matrix standardization. However, the functionality is available only for a limited number of models (models inheriting from the lm class), whereas effectsize provides support for many types of models, including (generalized) linear mixed models, Bayesian models, and more. Additionally, in additional to post-hoc model matrix standardization, effectsize offers other methods of standardization (see below). • Both MOTE and MBESS provide functions for computing effect sizes such as Cohen's d and effect sizes for ANOVAs (Cohen, 1988), and their confidence intervals. However, both require manual input of For t-statistics, degrees of freedom, and sums of squares for the computation the effect sizes, whereas effectsize can automatically extract this information from the provided models, thus allowing for better ease-of-use as well as reducing any potential for error.
Article
Full-text available
The recent growth of data science is partly fueled by the ever-growing amount of data and the joint important developments in statistical modeling, with new and powerful models and frameworks becoming accessible to users. Although there exist some generic functions to obtain model summaries and parameters, many package-specific modeling functions do not provide such methods to allow users to access such valuable information. Aims of the Package parameters is an R-package (R Core Team, 2020) that fills this important gap. Its primary goal is to provide utilities for processing the parameters of various statistical models. Beyond computing p-values, standard errors, confidence intervals (CI), Bayesian indices and other measures for a wide variety of models, this package implements features like parameters bootstrapping and engineering (such as variables reduction and/or selection), as well as tools for data reduction like functions to perform cluster, factor or principal component analysis. Another important goal of the parameters package is to facilitate and streamline the process of reporting results of statistical models, which includes the easy and intuitive calculation of standardized estimates in addition to robust standard errors and p-values. parameters therefor offers a simple and unified syntax to process a large variety of (model) objects from many different packages. parameters is part of the easystats ecosystem, a collaborative project created to facilitate the usage of R for statistical analyses.
Article
Full-text available
Correlations tests are arguably one of the most commonly used statistical procedures, and are used as a basis in many applications such as exploratory data analysis, structural modelling, data engineering etc. In this context, we present correlation, a toolbox for the R language (R Core Team, 2019) and part of the easystats collection, focused on correlation analysis. Its goal is to be lightweight, easy to use, and allows for the computation of many different kinds of correlations.
Article
Full-text available
The Bayesian framework for statistics is quickly gaining in popularity among scientists, for reasons such as reliability and accuracy (particularly in noisy data and small samples), the possibility of incorporating prior knowledge into the analysis, and the intuitive interpretation of results (Andrews & Baguley, 2013; Etz & Vandekerckhove, 2016; Kruschke, 2010; Kruschke, Aguinis, & Joo, 2012; Wagenmakers et al., 2017). Adopting the Bayesian framework is more of a shift in the paradigm than a change in the methodology; all the common statistical procedures (t-tests, correlations, ANOVAs, regressions, etc.) can also be achieved within the Bayesian framework. One of the core difference is that in the frequentist view, the effects are fixed (but unknown) and data are random. On the other hand, instead of having single estimates of the “true effect”, the Bayesian inference process computes the probability of different effects given the observed data, resulting in a distribution of possible values for the parameters, called the posterior distribution. The bayestestR package provides tools to describe these posterior distributions.
ggfortify: Data visualization tools for statistical analysis results
  • M Horikoshi
  • Y Tang
Horikoshi, M., & Tang, Y. (2018). ggfortify: Data visualization tools for statistical analysis results. https://CRAN.R-project.org/package=ggfortify