Variable binned scatter plots.
ABSTRACT The scatter plot is a well-known method of visualizing pairs of two continuous variables. Scatter plots are intuitive and easy-to-use, but often have a high degree of overlap which may occlude a significant portion of the data. To analyze a dense non-uniform data set, a recursive drill-down is required for detailed analysis. In this article, we propose variable binned scatter plots to allow the visualization of large amounts of data without overlapping. The basic idea is to use a non-uniform (variable) binning of the x and y dimensions and to plot all data points that are located within each bin into the corresponding squares. In the visualization, each data point is then represented by a small cell (pixel). Users are able to interact with individual data points for record level information. To analyze an interesting area of the scatter plot, the variable binned scatter plots with a refined scale for the subarea can be generated recursively as needed. Furthermore, we map a third attribute to color to obtain a visual clustering. We have applied variable binned scatter plots to solve real-world problems in the areas of credit card fraud and data center energy consumption to visualize their data distributions and cause-effect relationships among multiple attributes. A comparison of our methods with two recent scatter plot variants is included.
- SourceAvailable from: sciencedirect.com[show abstract] [hide abstract]
ABSTRACT: Smoothing techniques such as density estimation and nonparametric regression are widely used in applied work and the basic estimation procedures can be implemented relatively easily in standard statistical computing environments. However, computationally efficient procedures quickly become necessary with large datasets, many evaluation points or more than one covariate. Further computational issues arise in the use of smoothing techniques for inferential, rather than simply descriptive, purposes. These issues are addressed in two ways by (i) deriving efficient matrix formulations of nonparametric smoothing methods and (ii) by describing further simple modifications to these for the use of ‘binned’ data when sample sizes are large. The implications for other graphical and inferential aspects of the estimators are also discussed. These issues are dealt with in an algorithmic manner, to allow implementation in any programming environment, but particularly those which are geared towards vector and matrix representations of data. Specific examples of S-Plus code from the sm library of Bowman and Azzalini (Applied Smoothing Techniques for Data Analysis: the Kernel Approach With S-Plus Illustrations, Oxford University Press, Oxford, 1997) are given in an appendix as illustrations.Computational Statistics & Data Analysis. 01/2003;
Article: The grammar of graphics[show abstract] [hide abstract]
ABSTRACT: The grammar of graphics (GoG) denotes a system with seven classes embedded in a data flow. This data flow specifies a strict order in which data are transformed from a raw dataset to a statistical graphic. Each class contains multiple methods, each of which is a function executed at the step in the data flow corresponding to that class. The classes are orthogonal, in the sense that the product set of all classes (every possible sequence of class methods) defines a space of graphics which is meaningful at every point. The meaning of a statistical graphic is thus determined by the mapping produced by the function chain linking data and graphic. WIREs Comp Stat 2010 2 673–677 DOI: 10.1002/wics.118For further resources related to this article, please visit the WIREs website.Wiley Interdisciplinary Reviews: Computational Statistics. 10/2010; 2(6):673 - 677.
Article: Continuous scatterplots.[show abstract] [hide abstract]
ABSTRACT: Scatterplots are well established means of visualizing discrete data values with two data variables as a collection of discrete points. We aim at generalizing the concept of scatterplots to the visualization of spatially continuous input data by a continuous and dense plot. An example of a continuous input field is data defined on an n-D spatial grid with respective interpolation or reconstruction of in-between values. We propose a rigorous, accurate, and generic mathematical model of continuous scatterplots that considers an arbitrary density defined on an input field on an n-D domain and that maps this density to m-D scatterplots. Special cases are derived from this generic model and discussed in detail: scatterplots where the n-D spatial domain and the m-D data attribute domain have identical dimension, 1-D scatterplots as a way to define continuous histograms, and 2-D scatterplots of data on 3-D spatial grids. We show how continuous histograms are related to traditional discrete histograms and to the histograms of isosurface statistics. Based on the mathematical model of continuous scatterplots, respective visualization algorithms are derived, in particular for 2-D scatterplots of data from 3-D tetrahedral grids. For several visualization tasks, we show the applicability of continuous scatterplots. Since continuous scatterplots do not only sample data at grid points but interpolate data values within cells, a dense and complete visualization of the data set is achieved that scales well with increasing data set size. Especially for irregular grids with varying cell size, improved results are obtained when compared to conventional scatterplots. Therefore, continuous scatterplots are a suitable extension of a statistics visualization technique to be applied to typical data from scientific computation.IEEE Transactions on Visualization and Computer Graphics 01/2008; 14(6):1428-35. · 1.90 Impact Factor