ArticlePDF Available

Natural Image Statistics for Computer Graphics


Abstract and Figures

The class of all natural images is an infinitely small fraction of all possible images. The structure of natural images can be statisti- cally modeled, revealing striking regularities. The human visual system appears to be optimized to view natural images, as opposed to any possible image, and therefore expects to interpret images which conform to these statistics. Research has shown that images that do not statistically behave as natural images are harder for the human visual system to interpret. This paper reviews the statis- tics of natural images and the implications for computer graphics in general are assessed. We argue that these statistics are important for graphics applications and finally, we provide a direct application of these findings to random subdivision terrain modeling.
Content may be subject to copyright.
Natural image statistics for
computer graphics
Erik Reinhard, Peter Shirley and Tom
School of Computing
University of Utah
Salt Lake City, UT 84112 USA
March 26, 2001
The class of all natural images is an infinitely small fraction of all possible images. The
structure of natural images can be statistically modeled, revealing striking regularities. The
human visual system appears to be optimized to view natural images, as opposed to any
possible image, and therefore expects to interpret images which conform to these statistics.
Research has shown that images that do not statistically behave as natural images are harder
for the human visual system to interpret. This paper reviews the statistics of natural images
and the implications for computer graphics in general are assessed. We argue that these
statistics are important for graphics applications and finally, we provide a direct application
of these findings to random subdivision terrain modeling.
TR UUCS-01-002
Natural image statistics for computer graphics
Erik Reinhard
Peter Shirley
Tom Troscianko
The class of all natural images is an infinitely small fraction of all
possible images. The structure of natural images can be statisti-
cally modeled, revealing striking regularities. The human visual
system appears to be optimized to view natural images, as opposed
to any possible image, and therefore expects to interpret images
which conform to these statistics. Research has shown that images
that do not statistically behave as natural images are harder for the
human visual system to interpret. This paper reviews the statis-
tics of natural images and the implications for computer graphics in
general are assessed. We argue that these statistics are important for
graphics applications and finally, we provide a direct application of
these findings to random subdivision terrain modeling.
CR Categories: G.3 [Mathematics of Computing]: Probabil-
ity and Statistics; I.3.7 [Computing Methodologies]: Computer
Graphics—3D Graphics; I.4.10 [Computing Methodologies]: Im-
age Processing and Computer Vision—Image Representation
Keywords: Natural image statistics, Visual perception, Power
spectra, Fractal terrains
1 Introduction
The set of all 1000 by 1000 pixel images occupies a million di-
mensional space. The set of natural images, which are those that
appear in our world, form a sparse subset of all possible images
that can be identified using statistical approaches. The statistics of
natural images have been studied to understand how their properties
influence the human visual system (HVS). To assess invariances in
natural images, a large set of calibrated images are collected into an
ensemble and statistics are computed on these ensembles. The en-
semble should be chosen such that it is representative for all natural
images. This cannot generally be achieved with a single image.
Natural image statistics can be characterized by their order.
In particular, first, second and higher order statistics are distin-
guished [33]:
First order statistics treat each pixel independently, so that for ex-
ample the distribution of intensities encountered in natural im-
ages can be estimated.
University of Utah
University of Utah
University of Sussex
Second order statistics measure the dependencies between pairs
of pixels, which are usually expressed in terms of the power
spectrum (see section 2).
Higher order statistics are used to extract properties of natural
scenes which can not be modeled using first and second or-
der statistics. These properties include lines and edges.
First order statistics capture little structure that aids in understand-
ing the HVS, nor does it distinguish natural images from arbitrary
images very well. We therefore choose to largely ignore first order
In this paper we review current knowledge of image statistics,
and argue that this knowledge is likely to be useful for some com-
puter graphics applications. We first review the basics of image
statistics (Section 2). We thendiscuss the mechanics of power spec-
tra computation and how to analyze their structure (Section 3), This
is followed by experiments on image ensembles (Section 4). Next,
we provide evidence that second order statistics are sensitive to ge-
ometric variations rather than differences between lighting simu-
lations (Sections 5 and 6). The results are then applied to fractal
terrain modeling (Section 7), followed by conclusions (Section 8).
2 Background: second order statistics
The most remarkable and salient natural image statistic that has
been discovered so far is that the slope of the power spectrum tends
to be close to 2. The power spectrum of an M by M image is
computed as:
S(u, v)=
|F (u, v)|
, (1)
where F is the Fourier transform of the image. By represent-
ing the two-dimensional frequencies u and v in polar coordinates
(u = f cos φ and v = f sin φ) and averaging over all directions φ
and all images in the ensemble, it is found that on log-log scale am-
plitude as function of frequency f lies approximately on a straight
line [7, 11,31–33]. This means that spectral power as function of
spatial frequency behavesaccording to a power law function. More-
over, fitting a line through the data points yields a slope α of ap-
proximately 2 for natural images:
. (2)
Here, A is a constant determining the overall image contrast, α is
the spectral slopeand η isits deviation from 2. This result is true for
ensembles, but may not hold for the lowest few frequencies when
applied to individual images [3,20].
Although this spectral slope varies subtly between different stud-
ies [7, 10,12,32,40], it appears to be extremely robust against dis-
tortions and transformations and it is therefore concluded that this
spectral behavior is a consequence of the images themselves, rather
than of particular methods of camera calibration or exact computa-
tion of the spectral slope.
TR UUCS-01-002
However, the precise value of the spectral slope depends some-
what on the type of scenes that make up the ensemble. Most studies
in this field use images of natural objects such as trees and shrubs
because it is argued that the HVS evolved when only natural objects
were present. Some studies show that the spectral slope for scenes
containing man-made objects is slightly different [42]. Even if no
manufactured objects are present, the statistics vary dependent on
what is predominantly in the images. The second order statistics
for sky are for example very different from those of trees.
One way in which this becomes apparent is when the power spec-
tra are not circularly averaged, but when the log average power is
plotted against angle. For natural image ensembles all angles show
more or less straight power spectra, although most of the power
is concentrated in horizontal and vertical angles [31,33] (see also
Figure 5). The horizon and the presence of tree-trunks are said to
be factors in this, although this behavior is also likely to occur in
man-made environments.
The power spectrum is related to the auto-correlation function
through the Wiener-Khintchine theorem, which states that the auto-
correlation function and the power spectrum form a Fourier trans-
form pair [21]. Hence, power spectral behavior can be equiva-
lently understood in terms of correlations between pairs of pixel-
A related image statistic is contrast, normally defined as the stan-
dard deviation of all pixel intensities divided by the mean intensity
(σ/µ). This measure can either be computed directly from the im-
age data, or through Parceval’s theorem it can be derived from the
power spectrum [33]:
S(u, v). (3)
This particular contrast computation can be modified to compute
contrast in different frequency bands. Frequency-conscious vari-
ance can then be thresholded, yielding a measure which can detect
blur [13]. This is useful as lack of contrast can also be caused by
the absence of sufficient detail in a sharp image.
The above second order statistics are usually collected for lumi-
nance images only, as luminance is believed to carry the greatest
amount of information. However, chrominance channels are shown
to exhibit similar spectral behavior [24], and therefore all subse-
quent qualitativearguments are expectedto be true for color as well.
2.1 Interpretation
The fact that the spectral behavior of natural images yields a straight
line with a slope of around 2 is important. Recent unrelated stud-
ies have found that in image interpretation tasks, the HVS performs
best when the images conform to this second order statistic. In one
such study, images of acar and a bull were morphed into each other,
with varying distances between the images in the sequence [25].
Different sequences were generated with modified spectral slopes.
The minimum distance where participants could still distinguish
consecutive images in each morph sequence was measured. This
distance was found to be smallest when the spectral slope of the
images in the morph sequence was close to 2. Deviation of the
spectral slope in either direction reduced the ability to distinguish
between morphed images.
In a different study the effect of spectral slope on the detection
of mirror symmetry in images was assessed [29]. Here, white noise
patterns with varying degrees of vertical symmetry were created
and subsequently filtered to alter the spectral slope. This experi-
ment, in which participants had to detect if symmetry was present,
revealed that performance was optimal for images with a spectral
slope of 2.
These studies confirm the hypothesis that the HVS is tuned
to natural images. This is consistent with psychophysical mea-
surements, which indicate that the HVS expects to see images
with a 1/f
power spectrum and subsequently whitens them non-
adaptively [1]. Whitening, or flattening the power spectrum, results
in a spectral slope of α =0, which can be obtained by letting
S(f). Hence, it makes sense for computer graphics
applications to anticipate this type of visual processing and ensure
that images have 1/f
power spectra. Investigation of second order
image statistics is therefore the main goal of this paper.
2.2 Higher order statistics
One of the disadvantages of using amplitude information in the fre-
quency domain is that phase information is completely discarded,
thus ignoring the position of edges and objects. For this reason
higher order statistics have been applied to natural image ensem-
bles. The simplestglobal nth-order statistics that may capture phase
structure are the third and fourth moments, commonly referred to
as skew and kurtosis. They are computed as [21]:
s =
k =
These dimensionless measures are by definition zero for pure Gaus-
sian distributions. Skew can be interpreted as an indicator for the
difference between the mean and the median of a dataset. Kurtosis
is based on the size of a distribution’s tails relative to Gaussian. A
positive value, associated with long tails in the distribution of inten-
sities, is usually associated with natural image ensembles. This is
for example evident when plotting log-contrast histograms, which
plot the probability of a particular contrast value appearing. These
plots are typically non-Gaussian with positive kurtosis [31].
Thomson [39] pointed out that for kurtosis to be meaningful
for natural image analysis, the data should be decorrelated, or
whitened, prior to any computation of higher order statistics. This
can be accomplished by flattening the power spectrum, which en-
sures that second order statistics do not also capture higher order
statistics. Thus, skew and kurtosis become measures of variations
in the phase spectra. Regularities are found when these measures
are applied to the pixel histogram of an image ensemble [38,39].
This appears to indicate that the HVS may exploit higher order
phase structure after whitening the image. Understanding the phase
structure is therefore an important avenue of research in the field of
natural image statistics.
2.3 Response properties of cortical cells
One of the reasons to study the statistics of natural images is to un-
derstand how the HVS codes these images. Because natural images
are not completely random, there is a significant amount of redun-
dancy. The HVS may represent such scenes using a sparse set of
active elements. These elements will then be largely independent.
In this section it is assumed that an image can be represented as a
linear superposition of basis functions. Efficiently encoding images
now involves finding a set of basis functions that spans image space
and ensures that the coefficients across an image ensemble are sta-
tistically as independent as possible [23]. The resulting basis func-
tions (or filters) can then be compared to the response properties of
cortical cells in order to explain the early stages of human visual
processing. This has resulted in a number of different representa-
tions for basis functions, including Principal Components Analysis
TR UUCS-01-002
(PCA) [2,15], Independent Components Analysis (ICA) [4,5, 22],
Gabor basis functions [11,12] and wavelets [12,18,23].
The PCA algorithm finds a set of basis functions which maxi-
mally decorrelates pairs of coefficients, which is achieved by com-
puting the eigenvalues of the covariance matrix (between pairs of
pixel intensities). The corresponding eigenvectors represent a set
of orthogonal coefficients, whereby the eigenvector with the largest
associated eigenvalue accounts for the greatest part of the covari-
ance. By using only the first few coefficients, it is possible to en-
code an image with a large reduction in free parameters [11]. If
the image statistics are the same in all regions of an image, i.e. if
stationarity is assumed, then PCA produces coefficients which are
similar to the Fourier transform [6]. Indeed, PCA is strictly a sec-
ond order method assuming Gaussian signals.
Unfortunately, decorrelation does not guarantee independence.
Also, intensities in natural images are do not have a Gaussian dis-
tribution, and therefore PCA yields basis functions which do not
capture higher order structure very well [23]. In particular, the ba-
sis functions tend to be orientationally and frequency sensitive, but
have global extent. This is in contrast to cells in the human primary
cortex which are spatially localized (as well as being localized in
frequency and orientation).
In contrast to PCA, ICA constitutes a transformation resulting in
basis functions which are non-orthogonal, localized in space, fre-
quency and orientation and aims to extract higher order information
from images [4,5]. ICA finds basis functions which are as indepen-
dent as possible [19]. To avoid second order statistics to influence
the result of ICA, the data is usually first decorrelated (also called
whitened), for example using a PCA algorithm. Filters can then
be found that produce extrema of the kurtosis [33]. A kurtotic am-
plitude distribution is produced by cortical simple cells, leading to
sparse coding. Hence, ICA is believed to be a better model than
PCA for the output of simple cortical cells.
The receptive fields of simple cells in the mammalian striate cor-
tex are localized in space, oriented and bandpass. They are there-
fore similar to the basis functions of wavelet transforms [12, 23].
For natural images, strong correlations between wavelet coeffi-
cients at neighboring spatial locations, orientations and scales have
been shown using conditional histograms of the coefficients’ log
magnitudes [35]. These results were successfully used to synthe-
size textures [27,36] and to denoise images [34].
2.4 Motivation
For statistical approaches to be useful in computer graphics appli-
cations, it is important to be able to attach a physical meaning to
the statistics that are found. For the higher order methods in the
previous sections, we would be interested in the statistics of the
coefficients, rather than the appearance of the basis functions. In
the absence of sufficient data on these statistics, we choose to ig-
nore these results for the moment and instead focus on the widely
accepted second order statistics as discussed in section 2. In this
paper we study which aspects of rendering and modeling influence
second order statistics.
3 Calculation of image statistics
To verify our approach, three different image ensembles were used.
Two of these ensembles consist of synthetic images, while the
third ensemble consists of natural images from the van Hateren
database [22]. This last ensemble consists of just over 4000 cali-
brated images of outdoor scenes, 133 of which were randomly cho-
sen to form our natural image ensemble. Below, we will refer to
this as ensemble “A”.
For the artificial image ensembles, research images from vari-
ous internet sites were collected. Images that had obvious copy-
Figure 1: Example images drawn from ensemble “A”.
Figure 2: Example images drawn from ensemble “B”.
Figure 3: Example images drawn from ensemble “C”.
right notices were rejected, as well as images that were smaller
than 512x512 pixels. Further, both the rendering and the model-
ing had to be of high subjective quality. Images which had obvious
markings of post-processing, as well as camera specific artifacts
(depth of field), were also rejected. The images thus collected were
subdivided into two groups, where the classification was based on
the subjective quality of the modeling. The high quality modeling
ensemble (ensemble “B”) consists of 30 images and the ensemble
with slightly simpler modeling has 18 images (ensemble “C”). Ex-
ample images of all three ensembles can be found in Figures 1 to 3.
The images in the natural image ensemble are available as lumi-
nance images. The synthetic images are usually given in someRGB
format. Where appropriate, these wereconverted to a 24 bit uncom-
pressed file format (theinfluence oflossy compressionon thepower
spectra is assessed in Section 5). All subsequent computations were
performed on the Y (luminance) channel, after conversion to YUV.
For these image ensembles (as well as for individual images), the
power spectra were computed and the spectral slope was estimated.
The power spectrum computation proceeds as follows (after [33]):
For images that are larger than 512x512 pixels, a window of this
size was cut out of the middle of the image upon which further
processing was applied. Then, the weighted mean intensity µ was
subtracted to avoid leakage from the DC-component of the image,
with µ defined as:
TR UUCS-01-002
µ =
L(x, y)w(x, y)
w(x, y)
. (4)
Next, the images were prefiltered to avoid boundary effects. This
is accomplished by applying a circular Kaiser-Bessel window func-
tion (with parameter α =2) to the image [16]:
w(x, y)=
1.0 (
+ y
Here, I
is the modified zero-order Bessel function of the first kind
and N is the window size (512 pixels). In addition, this weight
function was normalized by letting:
w(x, y)
=1. (5)
This windowing function was chosen for its near-optimal trade-off
between side-lobe level, main-lobe width and computability [16].
The resulting images were then Fourier transformed:
F (u, v)=
L(x, y) µ
w(x, y)e
. (6)
Finally, thepower spectrum wascomputed as per equation1 and the
resulting data points plotted. Although frequencies up to 256 cycles
per image are computed, only the 127 lowest frequencies were used
to estimate the spectral slope. Higher frequencies may suffer from
aliasing, noise and low modulation transfer [33]. The estimation of
the spectral slope was performed by fitting a straight line through
the logarithm of these data points as function of the logarithm of
1/f. This method was chosen over other slope estimation tech-
niques such as the Hill estimator [17] and the scaling method [9]
to maintain compatibility with [33]. In addition, the number of
data points (127 frequencies) is insufficient for the scaling method,
which requires at least 1,000 data points to yield reliable estimates.
4 Image ensembles
For the three image ensembles, second order statistics were ex-
tracted as detailed above. The 1.87 (±0.43 standard deviation)
spectral slope for the van Hateren database was confirmed: we
found 1.88 ± 0.42 s.d. for our subset of 133 images. The devia-
tions from this value for the artificial image ensembles are depicted
in Figure 4. These results show that our synthetic image ensembles
produce straight power spectra, although the total amount of con-
trast and the value of the slope vary. It is also interesting to note that
the subjective classification into two separate synthetic image en-
sembles using quality of modeling as criterion, has produced power
spectra that are markedly different.
Especially the area under the data points, which can be inter-
preted as a measure for contrast, is an order of magnitude higher for
the high quality syntheticensemble. It is possible that the subjective
selection criterion was subconsciously guided by the human visual
system’s tendency to prefer viewing areas with high contrast [30].
The angular distribution of power tends to show peaks near hor-
izontal and vertical angles. Both the natural image ensemble A
and the high quality synthetic image ensemble “B” show this (Fig-
ure 5). The lower quality synthetic ensemble “C” shows a distinct
lack of power in horizontal directions, but has peaks present in ver-
tical directions.
Power vs. spatial frequency
Spatial frequency (cycles/image)
Log10 Power
A: Slope 1.88
B: Slope 1.92
C: Slope 1.97
Figure 4: Spectral slopes for the three image ensembles: A natural
image ensemble, B high quality synthetic ensemble and C lower
quality synthetic ensemble. The double lines indicate ±2 standard
deviations for each ensemble.
0 30 60 90 120 150 180 210 240 270 300 330 360
Average power per direction
Orientation (deg)
log average power
Figure 5: Log average power as function of angle.
Differences between the articial image ensembles and the nat-
ural ensemble may be partially explained by the fact that rendered
imagery predominantly depicts indoor scenes, whereas ensemble
Ais exclusively outdoors.
While the two synthetic ensembles produce straight lines with
spectral slopes that fall well within the range of slopes observed
in natural images, individual images tend to deviate fairly strongly
from the average. In the following sections, we assess which parts
of the rendering pipeline affect second order statistics, which goes
towards establishing these statistics as a tool for graphics applica-
5 Image manipulation
In this and the following sections we empirically evaluate which
aspects of rendering and modeling affect the spectral slope. Start-
ing with issues relating to image space, possible artifacts may arise
from lossy compression for le formats, such as gif and jpeg,
gamma correction and aliasing. Each of these is discussed in turn,
using a high quality rendering created using Radiance [41]. The
TR UUCS-01-002
Figure 6: Test scene, modeled to a spatial resolution of 1 mm.
modeling for this image (depicted in Figure 6) was done by hand to
a resolution of 1 mm. This provides a detailed scene as encountered
in many graphics applications.
The lighting simulation included diffuse inter-reection and soft
shadows. An image was created at 512x512 pixels with 64 super-
samples per pixel, which was subsequently converted to PPM. The
tests in the following subsections all pertain to this rendered image,
which has a measured spectral slope of α =2.36.
5.1 File formats
Lossy compression, as employed by various different le formats,
may cause the frequency content of images to change. To see
whether substantial modications occur, the above PPM image was
converted to GIF and JPEG (the latter using different levels of qual-
ity and smoothing) using XV. With the exception of smoothing,
which destroys high frequency content, the effect of le conver-
sion on the spectral slope is generally benign with deviations less
than 1%. For different levels of smoothing, the spectral slope varies
linearly with the amount of smoothing applied during le conver-
Hence, moderate levels of compression do not have an apprecia-
ble effect on the measured spectral slope. We therefore conclude
that the different le formats used in image ensembles B and C
above, did not signicantly affect our results.
5.2 Gamma correction
Gamma correction is a non-linear transformation to adapt the ap-
pearance of an image to a particular display device. As such, the
spectral information present in the image may be affected due to
the non-linear nature of the transformation. However, our measure-
ments using the gamma correction option in XV show that the spec-
tral slope is only weakly dependent on gamma correction value, as
can be seen in Figure 7. The largest deviations occur for extreme
gamma correction values that strongly darken the image
It is therefore concluded that gamma correction does not consti-
tute a signicant factor in the determination of spectral slopes and
therefore does not substantially inuence our results of Section 4.
5.3 Aliasing
Aliasing is another factor which may affect the spectral slope by
projecting frequencies above the Nyquist limit to lower frequen-
0.4 0.8 1.2 1.6 2 2.4 2.8
Spectral slope vs. gamma correction
Spectral slope
Figure 7: Spectral slope vs. gamma correction value.
Super-samples/pixel ασ
1x1 2.23 0.15
2x2 2.32 0.19
4x4 2.35 0.20
8x8 2.36 0.20
Table 1: Spectral slope α and standard deviation σ as function of
super-sampling. Slope α changes less than 1% if 16 or more super-
samples are computed per pixel.
cies. Despite careful consideration of this issue by using only
the 127 lowest frequencies in the spectral computations, as ex-
plained in Section 3, the rendering process itself may still cause
aliasing at lower frequencies. Images with different numbers of
super-samples were computed, suppressing aliasing by different
amounts. Table 1 shows how the spectral slope depends on the
number of super-samples. In order to minimize aliasing artifacts
in the Fourier domain, it appears that at least 16 super-samples per
pixel are needed. All the renderings in this and following section
use 64 super-samples, eliminating aliasing as a possible factor.
6 Modeling or rendering?
From the previous section it is clear that most of the distortions
regularly applied to synthetic images do not unduly affect our spec-
tral analysis. In this section we extend our analysis to the spatial
domain and answer the question whether second order statistics ap-
ply to lighting simulation or modeling. The scene as described in
the previous section was used for rendering images using differ-
ent lighting simulations: with and without diffuse inter-reection
as well as with and without shadow rays.
6.1 Shadows
Depending on the nature of the scene rendered, shadows can make
an important contribution to the overall appearance of a scene.
Hence, the accuracy with which shadows are rendered may affect
the statistics of the resulting image. By varying the size of the light
sources and adjusting their emission to maintain constant light lev-
els, the effect of varying soft shadows on second order statistics
was measured. Table 2 shows that for our room scene, shadows do
not appear to be an overly important factor for computing image
statistics. However, it should be noted that special cases could be
constructed which prove the contrary.
6.2 Diffuse inter-reflection
We have also assessed the inuence of diffuse inter-reection on
second order statistics. As diffusely reecting surfaces are unlikely
TR UUCS-01-002
Size Energy ασ
1/8 64 2.35 0.20
1/4 16 2.35 0.20
1/2 4 2.35 0.20
1 1 2.36 0.20
2 1/4 2.36 0.20
4 1/16 2.37 0.21
Table 2: Spectral slope α and its standard deviation σ as function
of light source size and energy (both multiplication factors).
to produce high spatial frequencies due to their illumination, we ex-
pect an even smaller effect than for the light source tests above. The
largest difference occurs when switching from lighting simulations
without diffuse inter-reection to those with diffuse inter-reection.
This changed the spectral slope by 5.5%.
As adding diffuse inter-reection to the lighting computations
produced a marked effect, we were wondering whether this is due
to an overall increase in energy in the environment. Ray tracing al-
lows a constant ambient to be added to each shading result, which
increases image intensities. It was found that this produced the
same effect. Hence, we conclude that absolute energy levels are
more important than the low-frequency distribution associated with
diffuse inter-reection.
6.3 Discussion
It appears that the particular details of the lighting simulation,
whether it be soft or hard shadows, diffuse inter-reection etc. do
not signicantly inuence second order image statistics. Differ-
ences between renderers, suchas ray tracing and OpenGL rendering
with Phong shading, also did not prove essential (data not shown).
We therefore conclude that these statistics are invariant to rendering
details. Although second order statistics can not be used to differ-
entiate between lighting simulations, this makes them ideal tools
for assessing quality of modeling. We provide evidence for this
hypothesis through the example of fractal terrain modeling.
7 Sample application: fractal terrains
We would expect natural image statistics to be useful in any real-
istic rendering application that normally involves parameter tuning
such as solid texture creation [26], procedural plant modeling [28]
or displacement mapping [8, 37]. We have chosen procedural ter-
rain modeling as our test for whether natural image statistics can be
useful for automatic parameter selection.
We implemented the midpoint subdivision algorithm of Fournier
et al. [14]. That algorithm iteratively adds smaller and smaller ran-
dom displacements to smaller and smaller spatial scales. As the
scale decreases by a factor of 2, the magnitude of displacement de-
creases by a factor k. Smaller values of k result in rougher ter-
rain, and larger values result in smoother terrain. Figure 10 shows
twelve terrain models using k =1.5 to k =2.6 after applying 10
iterations. The terrains, each consisting of 524, 288 triangles, were
rendered in Radiance with diffuse inter-reection and an 11 oclock
sky model [41]. The resulting images have an approximately linear
relationship between spatial frequency and spectral slope. Figure 8
demonstrates this behavior, along with the observation that for all
values of parameter k, the spectral slope decreases predictably with
each iteration of the algorithm. The relationship between division
parameter k and spectral slope shows after 10 iterations a minimum
of 1.40 for parameter k =2.1. The value of the spectral slope in-
creases slightly for rougher terrains, which we believe to be caused
by self-shadowing. However, this effect is small and does not inter-
fere with our results.
Division parameter
Spectral slope
Figure 8: The spectral slope for each iteration of the terrain gener-
ation process and for each of the 12 resulting terrains of Figure 10.
k = 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6
Slope =
Number of responses
Figure 9: Responses from 52 participants using prints and 31 par-
ticipants who performed the web based experiment. The histogram
shows both the spectral slope and the division parameter k associ-
ated with the images in Figure 10.
If we wereto set k automatically to produce an image withstatis-
tics tuned to the human visual system, we would use a k which pro-
duces images with spectral slopes near 1.88, as this was the value
found for the natural image ensemble A (Section 4). This corre-
sponds to a value of k close to 2.2. To evaluate this selection of
k we asked 30 people to pick the most realistic image from Fig-
ure 10. The images, rendered at a resolution of 1024
pixels, were
each printed on a single sheet at a resolution of 300dpi, yielding
cm. prints. These prints were then presented to the partici-
pants in a randomly ordered pile. We asked the subjects to select
the image that looked most realistic. To make sure that geographic
location did not bias the participants, the experiment was repeated
using participants from both North America and Europe. This lat-
ter experiment was conducted using aweb page toshow the images.
Again, the question asked was to selected the most realistic image.
The results of both experiments are shown in Figure 9. This
histogram shows the number of responses obtained for each terrain.
The peaks of the histograms lie close to k =2.1 and k =2.2.
This corresponds to a spectral slope of between 1.70 and 1.86. The
selection that ourparticipants made correlates to theselection based
on spectral analysis of the terrain images (which would be k =
2.2). Additionally, the mean value lies well within one standard
deviation of the natural image ensemble A (α =1.88±0.42 s.d.),
suggesting that people would select images with natural statistics
if they were given the choice.
We therefore conclude that this analysis can be successfully used
for parameter selection in fractal terrains. The agreement between
TR UUCS-01-002
user and algorithmic image selection suggests that this statistical
approach lends itself to wider usage in graphics applications, espe-
cially those applications where computer generated imagery needs
to be evaluated for realism or parameters need to be tuned.
8 Conclusions
Because computer graphics applications should produce cues ap-
propriate for the HVS, it is important to understand what kind of
images the HVS expects. We have surveyed natural image statis-
tics and the most salient one, 1/f
power spectra, was investigated
further. We established that this image statistic is sensitive to geom-
etry, while the entire rendering pipeline seems to not signicantly
affect this statistic. Hence, modeling applications can benet from
applying second order statistics, as shown by an example appli-
cation in the form of fractal terrain modeling. We envisage these
statistical tools to be able to provide criteria for a wide range of
graphics applications, as argued in the previous section.
While this paper has shown that second order statistics are use-
ful, it is by no means the only statistic and we deem adherence to
these particular statistics necessary but perhaps not sufcient for
conveying information to the HVS. We therefore intend to continue
this line of research by better understanding the mechanisms that
cause these and other statistics.
[1] J. J. ATICK AND N. A. REDLICH, What does the retina
know about natural scenes?, Neural Computation, 4 (1992),
pp. 196210.
[2] R. J. B
ADDELEY AND P. J. B. HANCOCK, A statistical anal-
ysis of natural images matches psychophysically derived ori-
entation tuning curves, Proc. Roy. Soc. Lond. B, 246 (1991),
pp. 219223.
Occlusions contribute to scaling in natural images, Vision
Research, (2001). In press.
[4] A. J. B
ELL AND T. J. SEJNOWSKI, Edges are the ’indepen-
dent components’ of natural scenes, Advances in Neural In-
formation Processing Systems, 9 (1996).
, The independent components of natural scenes are
edge filters, Vision Research, 37 (1997), pp. 33273338.
[6] T. B
OSSOMAIER AND A. W. SNYDER, Why spatial fre-
quency processing in the visual cortex?, Vision Research, 26
(1986), pp. 13071309.
[7] G. J. B
URTON AND I. R. MOORHEAD, Color and spa-
tial structure in natural scenes, Applied Optics, 26 (1987),
pp. 157170.
[8] R. L. C
reyes image rendering architecture, Computer Graphics (SIG-
GRAPH 87 Proceedings), 21 (1987), pp. 95102.
[9] M. E. C
ROVELLA AND M. S. TAQQU, Estimating the heavy
tail index from scaling properties, Methodology and Comput-
ing in Applied Probability, 1 (1999), pp. 5579.
[10] D. W. D
ONG AND J. J. ATICK, Statistics of natural time-
varying images, Network: Computation in Neural Systems, 6
(1995), pp. 345358.
[11] D. J. F
IELD, Relations between the statistics of natural im-
ages and the response properties of cortical cells, J. Opt. Soc.
Am. A, 4 (1987), pp. 23792394.
[12] D. J. F
IELD, Scale-invariance and self-similar ’wavelet’
transforms: An analysis of natural scenes and mammalian
visual systems, in Wavelets, fractals and Fourier transforms,
M. Farge, J. C. R. Hunt, and J. C. Vassilicos, eds., Clarendon
Press, Oxford, 1993, pp. 151193.
[13] D. J. F
IELD AND N. BRADY, Visual sensitivity, blur and
the sources of variability in the amplitude spectra of natural
scenes, Vision Research, 37 (1997), pp. 33673383.
[14] A. F
puter rendering of stochastic models, Communications of the
ACM, 25 (1982), pp. 371384.
[15] P. J. B. H
principle components of natural images, Network, 3 (1992),
pp. 6170.
[16] F. J. H
ARRIS, On the use of windows for harmonic analysis
with the discrete fourier transform, Proc. IEEE, 66 (1978),
pp. 5184.
[17] B. M. H
ILL, A simple general approach to inference about
the tail of a distribution, The Annals of Statistics, 3 (1975),
pp. 11631174.
[18] J. H
ARINEN, AND E. OJA, Wavelets and natu-
ral image statistics, in Proc. of 10th Scandinavian Conference
on Image Analysis, June 1997, pp. 1318.
[19] A. H
ARINEN, Survey on independent components analy-
sis, Neural Computing Surveys, 2 (1999), pp. 94128.
[20] M. S. L
ANGER, Large-scale failures of f
scaling in natu-
ral image spectra, J. Opt. Soc. Am. A, 17 (2000), pp. 2833.
[21] C. L. N
IKIAS AND A. P. PETROPULU, Higher-order spectra
analysis, Signal Processing Series, Prentice Hall, 1993.
[22] H.
component filters of natural images compared with simple
cells in primary visual cortex, Proc. R. Soc. Lond. B, 265
(1998), pp. 359366.
[23] B. A. O
LSHAUSEN AND D. J. FIELD, Emergence of simple-
cell receptive field properties by learning a sparse code for
natural images, Nature, 381 (1996), pp. 607609.
[24] C. A. P
Color and luminance information in natural scenes, J. Opt.
Soc. Am. A, 15 (1998), pp. 563569.
[25] C. A. P
The human visual system is optimised for processing the spa-
tial information in natural visual images, Current Biology, 10
(2000), pp. 3538.
[26] K. P
ERLIN, An image synthesizer, Computer Graphics (Pro-
ceedings of SIGGRAPH 85), 19 (1985), pp. 287296.
[27] J. P
ORTILLA AND E. P. SIMONCELLI, A parametric texture
model based on joint statistics of complexwavelet coefficients,
Intl Journal of Computer Vision, 40 (2000), pp. 4971.
[28] P. P
mic Beauty of Plants, Springer-Verlag, 1990.
TR UUCS-01-002
1.5 1.62 1.6 1.55 1.7 1.48 1.8 1.41
1.9 1.40 2.0 1.54 2.1 1.70 2.2 1.86
2.3 2.01 2.4 2.14 2.5 2.28 2.6 2.40
Figure 10: Fractal terrains. The numbers for each image are the division parameter k (left) and the spectral slope of the image (right).
[29] S. J. M. R
AINVILLE AND F. A. A. KINGDOM, Spatial-scale
contribution to the detection of mirror symmetry in fractal
noise, J. Opt. Soc. Am. A, 16 (1999), pp. 21122123.
[30] P. R
EINAGEL AND A. M. ZADOW, Natural scene statistics at
the centreof gaze, Network: Comput. NeuralSyst., 10 (1999),
pp. 110.
[31] D. L. R
UDERMAN, The statistics ofnatural images, Network:
Computation in Neural Systems, 5 (1997), pp. 517548.
[32] D. L. R
UDERMAN AND W. BIALEK, Statistics of natural
images: Scaling in the woods, Physical Review Letters, 73
(1994), pp. 814817.
[33] A.
VAN DER SCHAAF, Natural image statistics and vi-
sual processing, PhD thesis, Rijksuniversiteit Groningen, The
Netherlands, March 1998.
[34] E. P. S
IMONCELLI, Bayesian denoising of visual images in
the wavelet domain, in Bayesian Inference in Wavelet Based
Models, P. M
uller and B. Vidakovic, eds., vol. 141 of Lecture
Notes in Statistics, New York, July 1999, Springer-Verlag,
pp. 291308.
, Modelling the joint statistics of images in the wavelet
domain, in Proc. SPIE 44th Anual Meeting, vol. 3813, July
1999, pp. 188195.
[36] E. P. S
IMONCELLI AND J. PORTILLA, Texture characteriza-
tion via joint statistics of wavelet coefficient magnitudes,in
Proc. 5th Intl Conf. on Image Processing, October 1998.
[37] B. S
MITS,P.SHIRLEY, AND M. M. STARK, Direct ray trac-
ing of displacement mapped triangles, in Proceedings Eu-
rographics Workshop on Rendering, Brno, Czech Republic,
June 2000, pp. 307318.
[38] M. G. A. T
HOMSON, Higher-order structure in natural
scenes, J. Opt. Soc. Am. A, 16 (1999), pp. 15491553.
, Visual coding and the phase structure of natural
scenes, Network: Computation in Neural Systems, 10 (1999),
pp. 123132.
[40] D. J. T
spectra of natural images, Ophthalmic and Physiological Op-
tics, 12 (1992), pp. 229232.
[41] G. W
with Radiance, Morgan Kaufmann Publ., 1998.
[42] C. Z
IEGAUS AND E. W. LANG, Statistical invariances in ar-
tificial, natural and urban images, Z Naturforsch, 53a (1998),
pp. 10091021.
... The whole image set is available on the web [11]. Some empirical evidences support the idea that human spatial vision is optimized to natural images and therefore expects to interpret images that conform to natural image statistics [12][13][14]. ...
... The structure of a natural image reveals consistent statistical properties that reflect some of its typical properties [9]. Natural image statistics can be classified by their order [10,12,13]. First order statistics consider each location in the image independently, second order statistics include two separate positions dependency, and higher order statistics describe mutual relations between three or more positions in the image. ...
... The most popular second order statistics representations of a natural image are the autocorrelation function defining the correlation between two points in an image, and the power spectrum derived from the Fourier transform of the autocorrelation function [15]. An M × M pixels image normalized power spectrum P (u, v) can be obtained by [10,13,16,17]: ...
Full-text available
The quality of images may be severely degraded in various situations such as imaging during motion, sensing through a diffusive medium, and low signal to noise. Often in such cases, the ideal un-degraded image is not available (no reference exists). This paper overviews past methods that dealt with no-reference (NR) image quality assessment, and then proposes a new NR method for the identification of image distortions and quantification of their impacts on image quality. The proposed method considers both noise and blur distortion types that may exist in the image. The same methodology employed in the spatial frequency domain is used to evaluate both distortion impacts on image quality, while noise power is further independently estimated in the spatial domain. Specific distortions addressed here include additive white noise, Gaussian blur and de-focus blur. Estimation results are compared to the true distortion quantities, over a set of 75 different images.
... On the other hand, most people assisting to movies or playing videogames set in modern animated worlds would say that the scenes and lifelike characters are quite photo-realistic. In fact, is not only a matter of the overall appearance of objects and environment (silhouettes, pose, etc.), but also in terms of low-level features as texture, since involved Computer Graphics algorithms try to approach the power spectrum of real images [88,97,98]. ...
Full-text available
Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.
... representing lights as spherical environment maps allows the use of lighting captured from real environments [26]. This use of natural lighting conditions has been found to improve the perceived realism of an image in certain circumstances [100,110], though this topic These global illumination effects all provide significant amounts of realism to computer generated scenes, but studies have shown that accuracy does not matter in all cases [144]. ...
We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process.
Full-text available
Recent years have seen a resurgent interest in eye movements during natural scene viewing. Aspects of eye movements that are driven by low-level image properties are of particular interest due to their applicability to biologically motivated artificial vision and surveillance systems. In this paper, we report an experiment in which we recorded observers" eye movements while they viewed calibrated greyscale images of natural scenes. Immediately after viewing each image, observers were shown a test patch and asked to indicate if they thought it was part of the image they had just seen. The test patch was either randomly selected from a different image from the same database or, unbeknownst to the observer, selected from either the first or last location fixated on the image just viewed. We find that several low-level image properties differed significantly relative to the observers" ability to successfully designate each patch. We also find that the differences between patch statistics for first and last fixations are small compared to the differences between hit and miss responses. The goal of the paper was to, in a non-cognitive natural setting, measure the image properties that facilitate visual memory, additionally observing the role that temporal location (first or last fixation) of the test patch played. We propose that a memorability map of a complex natural scene may be constructed to represent the low-level memorability of local regions in a similar fashion to the familiar saliency map, which records bottom-up fixation attractors.
Abstract Geometric scaling transformations do not respect the biological processes which govern the size and shape of living creatures. In this paper, we describe an approach to scaling which can be related to biological function. We use known biological laws of allometry which are expressed as power laws to control the mesh deformation in the frequency domain. This approach is motivated by the relation between fractal biological systems and their underlying power-law spectra. We demonstrate our approach to biology-aware character scaling on triangle meshes representing quadrupedal mammals.
Full-text available
Conference Paper
Natural images exhibit statistical regularities that differentiate them from random collections of pixels. Moreover, the human visual system appears to have evolved to exploit such statistical regularities. As computer graphics is concerned with producing imagery for observation by humans, it would be prudent to understand which statistical regularities occur in nature, so they can be emulated by image synthesis methods. In this course we introduce all aspects of natural image statistics, ranging from data collection to analysis and finally their applications in computer graphics, computational photography, visualization and image processing.
The statistics of natural images have attracted the attention of researchers in a variety of fields and have been used as a means to better understand the human visual system and its processes. A number of algorithms in computer graphics, vision and image processing take advantage of such statistical findings to create visually more plausible results. With this report we aim to review the state of the art in image statistics and discuss existing and potential applications within computer graphics and related areas.
Full-text available
To answer the question about the way our visual system processes images it has to work with every day, it is necessary to investigate the statistical structure of these pictures. For this purpose we investigated several ensembles of artificial and real-world greyscale images to find different invariance properties: translation invariance by determining an average pair-correlation function, scale invariance by investigating the power spectrum and the coarse graining of the images, and a new hierarchical invariance recently proposed [D. L. Ruderman, Network 5, 517 (1994)]. The results of our work indicated that the assumption of translational invariance can be taken for granted. Our results concerning the scale invariance are qualitatively the same as those found by Ruderman and others. The deviations of the distributions of the logarithmically transformed images from a Gaussian distribution cannot be seen as clearly as stated by Ruderman. This results from the fact that for a correct determination of the deviations the non-linear transformation must be considered. Depending on the preprocessing of the images the results concerning the hierarchical invariance differed widely. It seems that this new invariance can be confirmed only for logarithmically transformed images.
This paper deals with the estimation of the tail index α for empirical heavy-tailed distributions, such as have been encountered in telecommunication systems. We present a method (called the “scaling estimator”) based on the scaling properties of sums of heavy-tailed random variables. It has the advantages of being nonparametric, of being easy to apply, of yielding a single value, and of being relatively accurate on synthetic datasets. Since the method relies on the scaling of sums, it measures a property that is often one of the most important effects of heavy-tailed behavior. Most importantly, we present evidence that the scaling estimator appears to increase in accuracy as the size of the dataset grows. It is thus particularly suited for large datasets, as are increasingly encountered in measurements of telecommunications and computing systems.
Recently there has been a resurgence of interest in the properties of natural images. Their statistics are important not only in image compression but also for the study of sensory processing in biology, which can be viewed as satisfying certain ‘design criteria’. This review summarizes previous work on image statistics and presents our own data. Perhaps the most notable property of natural images is an invariance to scale. We present data to support this claim as well as evidence for a hierarchical invariance in natural scenes. These symmetries provide a powerful description of natural images as they greatly restrict the class of allowed distributions.
The processing of spatial patterns by the mammalian visual system shows a number of similarities to the ‘wavelet transforms’ which have recently attracted considerable interest outside of the study of sensory systems. At the level of the primary visual cortex, these visual systems consist of arrays of neurons selective to local regions of space, spatial frequency and orientation. The spatial frequency bandwidths of these neurons increase with frequency resulting in a set of approximately self- similar “receptive fields”. In this paper, we look at the question of why this strategy of representing the visual environment would evolve. The question is approached by looking at the statistical structure of natural scenes and observing how this structure relates to the visual system’s representation of spatial patterns. It is proposed that natural scenes are approximately scale invariant with regards to both their power spectra and their phase spectra. Principally, because of the phase spectra, wavelet-like transforms are capable of producing a sparse, informative representation of these images. It is suggested that self- similar codes like the wavelet are effective for so many natural phenomena because such phenomena show similar structures to those found in these natural scenes.
Recently there has been a resurgence of interest in the properties of natural images. Their statistics are important not only in image compression but also for the study of sensory processing in biology, which can be viewed as satisfying certain ‘design criteria’. This review summarizes previous work on image statistics and presents our own data. Perhaps the most notable property of natural images is an invariance to scale. We present data to support this claim as well as evidence for a hierarchical invariance in natural scenes. These symmetries provide a powerful description of natural images as they greatly restrict the class of allowed distributions.
Real-world visual scenes display consistent first- and second-order statistical regularities to which visual neural representations may be perceptually matched, but these lower-order regularities stem from constraints on image power spectra, which appear to carry much less perceptual information than image phase spectra. Natural scenes are shown also to display consistent higher-order statistical regularities, and an analysis of these regularities in terms of fourth-order spectra shows that they are strongly dependent on spatial frequency. These findings have important consequences for the design of a visual system that aims to maximize sparseness in neural representations.