Frequency tables are often constructed on intervals of irregular width. When plotted as bar charts, the underlying true density information may be quite distorted. The majority of introductory statistics texts recommend tabulating data into intervals of equal width, but seldom caution the consequences of failing to do so. An occasional introductory text correctly emphasizes that area rather than frequency should be plotted. Nevertheless, the correctly scaled density figure is often visually less informative than one might expect, with wide bins at constant height. In many cases, the rightmost bin interval has no well-defined end-point, making its depiction somewhat arbitrary. In this note, we introduce a regular histogram approximation that matches the frequencies and also minimizes a roughness criterion for visual and exploratory appeal. The resulting estimate can reveal the density structure much more clearly. We also formulate an alternative criterion that explicitly takes account of the uncertainty in the bin frequencies.
[Show abstract][Hide abstract] ABSTRACT: When constructing a histogram, it is common to make all bars the same width. One could also choose to make them all have the same area. These two options have complementary strengths and weaknesses; the equal-width histogram oversmooths in regions of high density, and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density, and so does not identify outliers. We describe a compromise approach which avoids both of these defects. We regard the histogram as an exploratory device, rather than as an estimate of a density. We argue that relying on the asymptotics of integrated mean squared error leads to inappropriate recommendations for choosing bin-widths.
Journal of Computational and Graphical Statistics 03/2009; 18(1):21-31. DOI:10.1198/jcgs.2009.0002 · 1.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The e-a histogram dominates, under square error loss, a fixed bin width histogram when both are assigned the same number of bins. That is, the fixed bin width histogram is inadmissible and the dominating alternative is the e-a histogram.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.