Parallel four-dimensional Haralick texture analysis for disk-resident image datasets.
- SourceAvailable from: Joel Saltz[show abstract] [hide abstract]
ABSTRACT: We describe a framework, called DataCutter, that is designed to provide support for subsetting and processing of datasets in a distributed and heterogeneous environment. We illustrate the use of DataCutter with several data-intensive applications from diverse fields, and present experimental results.Parallel Computing. 01/2001;
- [show abstract] [hide abstract]
ABSTRACT: This paper presents an out-of-core approach for interactive streamline construction on large unstructured tetrahedral meshes containing millions of elements. The out-of-core algorithm uses an octree to partition and restructure the raw data into subsets stored into disk files for fast data retrieval. A memory management policy tailored to the streamline calculations is used such that, during the streamline construction, only a very small amount of data are brought into the main memory on demand. By carefully scheduling computation and data fetching, the overhead of reading data from the disk is significantly reduced and good memory performance results. This out-of-core algorithm makes possible interactive streamline visualization of large unstructured-grid data sets on a single mid-range workstation with relatively low main-memory capacity: 5-15 megabytes. We also demonstrate that this approach is much more efficient than relying on virtual memory and operating system's paging algorithmsIEEE Transactions on Visualization and Computer Graphics 11/1997; · 1.90 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Texture is one of the important characteristics used in identifying objects or regions of interest in an image, whether the image be a photomicrograph, an aerial photograph, or a satellite image. This paper describes some easily computable textural features based on gray-tone spatial dependancies, and illustrates their application in category-identification tasks of three different kinds of image data: photomicrographs of five kinds of sandstones, 1:20 000 panchromatic aerial photographs of eight land-use categories, and Earth Resources Technology Satellite (ERTS) multispecial imagery containing seven land-use categories. We use two kinds of decision rules: one for which the decision regions are convex polyhedra (a piecewise linear decision rule), and one for which the decision regions are rectangular parallelpipeds (a min-max decision rule). In each experiment the data set was divided into two parts, a training set and a test set. Test set identification accuracy is 89 percent for the photomicrographs, 82 percent for the aerial photographic imagery, and 83 percent for the satellite imagery. These results indicate that the easily computable textural features probably have a general applicability for a wide variety of image-classification applications.IEEE Transactions on Systems Man and Cybernetics 12/1973;
A Parallel Implementation of 4-Dimensional Haralick Texture Analysis for
Disk-resident Image Datasets
?, Bradley Clymer
?, Joel Saltz
?, Tahsin Kurc
Dept. of Electrical Engineering
The Ohio State University
Columbus, OH, 43210
Dept. of Biomedical Informatics
The Ohio State University
Columbus, OH, 43210
Texture analysis is one possible method to detect fea-
tures in biomedical images. During texture analysis, tex-
ture related information is found by examining local vari-
ations in image brightness. 4-dimensional (4D) Haralick
texture analysis is a method that extracts local variations
along space and time dimensions and represents them as
a collection of fourteen statistical parameters. However,
the application of the 4D Haralick method on large time-
dependent 2D and 3D image datasets is hindered by com-
putation and memory requirements. This paper presents a
parallel implementation of 4D Haralick texture analysis on
PC clusters. We present a performance evaluation of our
implementation on a cluster of PCs. Our results show that
good performance can be achieved for this application via
combined use of task- and data-parallelism.
The quality and usefulness of medical imaging is con-
stantly evolving, leading to better patient care and more
reliance on advanced imaging techniques. For example, a
current method of cancer research uses dynamic contrast
enhancedmagneticresonanceimaging(DCE-MRI) [25, 26,
35], which is also the main motivating application for this
work, for detection and monitoring of tumors.
a DCE-MRI scan, the patient is injected with a contrast
medium. A series of 3D MRI scans of a region of inter-
?This research was supported in part by the National Science Foun-
dation under Grants #ACI-9619020 (UC Subcontract #10152408),
Grant #B517095 (UC Subcontract #10184497), NIH NIBIB BISTI
#P20EB000591, Ohio Board of Regents BRTTC #BRTT02-0003.
0-7695-2153-3/04 $20.00 (c)2004 IEEE.
est, such as the breast, are taken at specific time intervals.
ents have been consumed, the blood and contrast agent are
by MRI scans over many time steps. In addition, follow-up
studies, which acquire multiple image datasets at different
dates, can be conducted to monitor the progression and re-
sponse to treatment of the tumor. Extraction and analysis of
features from these images over multiple time steps can be
used to detect tumors, by characterizing, for instance, con-
trast uptake and elimination in a region, and examine their
progression over time.
In medical imaging, the diagnostic problemin the region
of interest can often be associated with a variation in image
brightness . Texture analysis is one possible method to
detect and examine such variations. During texture analy-
sis, texture related information is found by examining local
variationsinimagebrightness. Haralicktextureanalysisis a
form of statistical texture analysis that represents local vari-
ationsas acollectionofuptofourteenstatistical parameters,
such as contrast and entropy .
Using texture to analyze DCE-MRI datasets has shown
great potential in tumor detection. Images that have been
analyzed by radiologists can be used along with the results
of texture analysis to train a neural network. Once trained,
the neural network becomes a convenient tool for discover-
ing cancerous tissue given the texture analysis results. The
effectiveness of using 4D Haralick-based texture analysis
for cancer detection will be discussed in a future paper.
As advances in imaging technologies allow a researcher
to capture higher quality images and acquire more images
in a shorter period of time, the amount of data that must
be stored and processed increases as well. Obtaining addi-
tional data by acquiring images over many time steps pro-
vides a more complete view of the patient’s physiology,but
it can also create a quantity of data that may be impossible
to process on a single workstation. With increasing resolu-
tion of medical imaging devices, a dataset, which consists
of many time steps, may not fit in memory. For example,
a digitizing microscope can scan pathology slides at 40x
magnification, resulting in images of multiple gigabytes in
size. Similarly, high-resolution MRI scanners are capable
of acquiring 3D volumes of
many time steps. In addition, texture analysis is a compu-
tationally intensive process. These issues may be addressed
using distributed computing. Current methods of analyzing
DCE-MRI datasets can be tedious and time consuming for
radiologists. These methods often involve cinematic view-
ing of the contrast agent flow, observation of a color-coded
representation of the vascular permeability characteristics,
and examination of the time versus intensity plots of indi-
vidual pixels . Automating the DCE-MRI data analy-
sis process using distributed computing may also allow the
radiologist to have the results of the DCE-MRI procedure
before the patient exits the MRI facility. The ability to eval-
uate the patient on the same day as DCE-MRI procedure is
a major motivation for using distributed computing.
Haralick texture analysis for disk-resident datasets. Our ap-
proach involves combined use of task- and data-parallelism
to leverage distributed computing power and storage space
onPC clusters. We performa performanceevaluationof the
implementation using a cluster of PCs.
????????????????? -pixelimages over
2 Related Work
Parallel image processing and visualization is a widely
studied area. In this section, we describe some of the previ-
ous work in parallel visualization and image analysis.
In this paper, we target efficient use of general purpose
CPUs on PC clusters by use of task- and data-parallelism;
we did not assume availability of graphics processing units
(GPUs) on compute hosts and did not take advantage of
GPUs in carrying out Haralick texture computations. There
has been an increasing interest in applying programmable
GPUs to speed up parts of image processing operations
and general purpose computations [8, 9, 18, 12, 22, 29].
A future extension to our work could investigate how the
Haralick-basedtexture computations could be mapped onto
GPUs; in such an implementation, we anticipate that com-
bined use of functional decomposition and data parallelism
(the approach taken in this paper) will be an efficient
approach as it can enable decomposition and placement
of processing operations across multiple processing units
(CPUs and GPUs).
We are not aware of any parallel implementations of 4D
Haralick texture analysis. Fleig et.al. [17, 27] implemented
a parallel haralick texture analysis program that worked on
2D slices of a 3D volume. Each slice was treated sepa-
rately and processed by a single function in memory. Un-
like [17, 27], the implementation described in this paper
handles disk-resident 4D datasets and can carry out Har-
alick texture computations in 4-dimensions.
Chiang and Silva  propose methods for iso-surface
extraction for datasets that cannot fit in memory. They in-
troduce several techniques and indexing structures to ef-
ficiently search for cells through which the iso-surface
passes, andto reduceI/O costs anddisk spacerequirements.
Cox and Ellsworth  show that relying on operating sys-
tem virtual memory results in poor performance. They pro-
pose a paged system and algorithms for memory manage-
ment and paging for out-of-core visualization. Their results
show that application controlled paging can substantially
improve application performance.
Ueng et. al  present algorithms for streamline visu-
alization of large unstructured tetrahedral meshes. They
employ an octree to partition an out-of-core dataset into
ing operations and managing memory for streamline visu-
alization. Arge et. al.  present efficient external memory
algorithms for applications that make use of grid-based ter-
rains in Geographic Information Systems. Bajaj et. al. 
presenta parallel algorithmfor out-of-coreisocontouringof
scientific datasets for visualization. In , several image-
space partitioning algorithms are evaluated on parallel sys-
tems for visualization of unstructured grids.
Manolakos and Funk  describe a Java-based tool for
rapid prototyping of image processing operations.
tool uses a component-based framework, called JavaPorts,
and implements a master-worker mechanism.
ber  presents an infrastructure for remote execution of
image processing applications using SGI ImageVision li-
brary, which is developedto run on SGI machines, and Net-
SCIRun  is a problem solving environment that en-
ables implementation and execution of visualization and
image processing applications from connected modules.
Dv  is a framework for developing applications for dis-
tributed visualization of large scientific datasets. It is based
on the notion of active frames, which are application level
mobile objects. An active frame contains application data,
called frame data, and a frame program that processes the
data. Active frames are executed by active frame servers
ings et. al.  present a toolkit for implementation and
execution of image analysis applications as a network of
ITK  and VTK  functions in a distributed environ-
3 Haralick Texture Analysis
The goal of texture analysis is to quantify the dependen-
cies between neighboringpixels and patterns of variation in
image brightness within a region of interest [24, 38, 13]. In
texture analysis, useful information can be found by exam-
ining local variations in image brightness. Haralick texture
analysis  is a form of a statistical texture analysis that
utilizes co-occurrence matrices. It relies on the joint statis-
tics ofneighboringpixelsorvoxelsin thedataset ratherthan
a structure definition.
The basis behindthe method is the study of second order
statistics relating neighbouring pixels at various spacings
and directions. A second-order joint conditional probabil-
ity density function is computed, given a specific distance
between pixels and a specific direction. This second-order
joint conditional probability density function is referred to
as a co-occurrencematrix. A co-occurrencematrix can also
be thought of as a joint histogram of two random variables.
the graylevel of its neighboringpixel (
borhood between two pixels is defined by a user-specified
distance and direction. This represents a joint probability
distribution function (p.d.f.), which gives the probability of
neighboringpixels changing from the intensity
ber of occurrences in which two neighboring pixels, one
with gray level
description of co-occurance matrix computation is given in
There are three notable properties of the co-occurrence
matrix. First, the relationships between neighboring pixels
occurin boththe forwardand backwarddirection. Consider
a 2D case; there are 8 total directions: 0, 45, 90, 135, 180,
225, 270, and 315 degrees. However, opposite angles yield
the same co-occurrence matrix. Therefore only four unique
directions exist (see Figure 12 in the Appendix for the eight
possible directions and the four unique vectors). Along the
same idea, there is symmetry in the co-occurrence matrix
because the gray level relationships between the pixels oc-
curin boththeforwardandbackwarddirections. Thevalues
along the diagonal of the co-occurrence matrix are unique;
however, the values above the diagonal match the values
below the diagonal in the co-occurrence matrix. The co-
occurrence matrix is also a square matrix and is always
in size, where is the total numberof graylevels
possible. Therefore, the size of the co-occurrence matrix is
fixed by the total number of gray levels and is independent
of distance and direction values.
Once a co-occurence matrix is computed, statistical pa-
rameters can be calculated from the matrix. The fourteen
textural features described by Haralick  provide a wide
??? ), where the neigh-
??? to the in-
? . The co-occurrence matrix measures the num-
? and the other with gray level
? , occur
? away and along a certain direction. A short
Figure 1. Raster Scanning: A ROI window
scans through the image.
range of parameters that can be used in medical imaging
In medical images, many localized texture changes de-
noting tumors, capillaries, and differing tissues may exist.
Thus, it is often necessary to apply a series of texture cal-
culations with each calculationperformedon a localized re-
gion of interest. This process is known as raster scanning.
Raster scanning begins with a fixed, specified region of in-
terest (ROI), where the size of the ROI depends on the size
of important structures within the image. Raster scanning
begins at the first pixel in the image set. Figure 1 illustrates
raster scanning for 2D case. The region within the ROI is
usedto generatea co-occurrencematrix. Oneormoreofthe
Haralick parameters is then calculated and sent to a storage
buffer. TheROIwindowis thenshiftedtoanadjacentvoxel.
Again, a co-occurrencematrix is generated based on the re-
gion around that point. Haralick parameters are calculated
and sent to a storage buffer. This scanning window process
continues for all points in which the ROI occurs within the
boundaryof the image. The series of output parameters can
be used in computer aided diagnosis, stored to disk, or used
to construct a graphical view of the results. A pseudo-code
summarizing the 4D Haralick texture analysis algorithm is
given in Figure 2.
4Parallel 4D Haralick Texture Analysis
The 4D Haralick texture analysis application can be
modeled as four major stages. The first stages is to read
in the 4D raw image dataset from the storage system and
pass it to texture analysis operations. The second stage is
– ROI lengths in each dimension.
– set of selected Haralick functions.
foreach x in
foreach y in
foreach z in
foreach t in
Compute co-occurrence matrix for
foreach f in F do
Compute haralick parameter f using
Figure 2. Sequential 4D Haralick texture analysis algorithm.
pixels/voxels in each dimension, creating local regions of interests (ROIs). Note that the entire ROI
must be contained within the dataset. For each ROI, a co-occurrence matrix is computed. Using the
co-occurrence matrix, the selected subset of Haralick parameters is calculated.
The algorithm iterates over all the
to compute the co-occurrence matrices. The calculation of
Haralick texture parameters from the 4D data is the third
stage in the processing structure. The resulting output is a
4Ddataset foreachHaralick parametercomputed. The final
stage is to output the 4D Haralick texture analysis results in
a user specified format. Based on this modeling of Haral-
ick texture analysis computations,we developeda task- and
data-parallel implementation. Data parallelism is achieved
by distributing data across the nodes in the system for both
storage and computing purposes. The task parallelism is
obtained by implementation and execution of the four ma-
jor stages as separate tasks using a component-basedframe-
work [6, 5, 7].In this section, we first briefly present
the underlying runtime framework used in our implementa-
tion. We then describe how data is distributed across nodes
for storage, individual components implementing different
stages of the texture analysis algorithm, and optimizations
for data retrieval and processing.
4.1 Runtime Middleware
In this project, distributed computing is accomplished
through a middleware framework, called DataCutter, de-
signed to process large datasets [6, 5, 7]. DataCutter is
based on a filter-stream programming model that repre-
sents operations of a data-intensive application as a set
of filters .Data are exchanged between filters using
streams, which are unidirectional pipes. Streams deliver
datafromproducerfilters toconsumerfilters in user-defined
data chunks (data buffers). To achieve distributed comput-
ing, operational tasks are divided among a series of filters.
Each filter can be executedon a separate processor/machine
in the environment, or filters can be co-located. When fil-
ters are executed on separate processors, data exchange is
done using TCP/IP sockets. When filters are co-located, the
runtime system transfers a data buffer from a producer fil-
ter to a consumer filter by copying the pointer to the data
buffer. Consumer and producer filters can run concurrently
and process data chunks in a pipelined fashion.
Filters may be replicated and placed on different nodes.
Each replicated filter can process data independent of the
other identical filters. Data parallelism can be made pos-
sible by distributing data buffers among replicated filters
on-the-fly. Partitioning data into data chunks can help to
achieve load balance and reduce the memory requirements
of the nodes. Either explicit or transparent copies of a filter
can be instantiated and executed. If the copies of a filter are
transparent, the DataCutter scheduler controls which of the
identical filter copies receives a data buffer. The DataCut-
ter scheduler can schedule data buffers to transparent filter
copies in either round robin or demand driven sequences.
In a round robin distribution, the scheduler assigns data to
each transparent filter in turn. Thus, each transparent filter
receives roughly the same amount of data to process. In a
demand driven scheduling of data buffers, the DataCutter
scheduler assigns the distribution based on the buffer con-
sumption rate of the transparent filter copies. The goal of
the demand driven approach is to send data to the trans-
parent filter copies that can process them the fastest. Ex-
plicit filters are used to give the user control over which
consumer filter receives which data chunk from a producer
filter. Explicit filters are useful in situations where assign-
ment of data chunks to filter copies in a user-defined way is
required or can improve performance.
4.2Data Distribution Among Storage Nodes
In MRI studies, a 4D image dataset can be composed
of a series of 3D volumes. A 3D volume is made up of
a number of image slices, each of which is usually small
to medium size (i.e.,
3D volume can consist of many image slices (e.g., 1024
slices) and image acquisition can be performed over a large
number of time steps. Thus, it is possible that a 4D dataset
may not fit on a single storage node and need to be dis-
tributed across multiple nodes. In addition, distributing the
input dataset across multiple storage nodes has the advan-
tage that data retrieval can be parallelized. A number of
techniqueshavebeendevelopedfor partitioningand declus-
tering multi-dimensional datasets [15, 16, 31, 33]. Obvi-
ously, the effectiveness of a particular distribution depends
on how well it matches the common data access and query
patterns of the target application class. In MRI studies,
common analysis queries specify entire 3D volumes over
a range of time steps. In our current implementation, 2D
image slices that make a 3D volume at a time step are dis-
tributed across storage nodes in round robin fashion. Each
2D image is assigned to a single storage node and stored
on disk in a separate file. A simple index file is created on
each storage node for the images assigned to that storage
node. In this index file, each image file is associated with a
the image slice belongs to and
image slice within the 3D volume.
?pixels). However, the
denotes the time step
? is the number of the
We have developed three sets of filters to carry out the
four stages of Haralick texture analysis operations (see Fig-
ure 3). These filter sets can be connected to form an end-
to-end Haralick texture analysis chain. For a more detailed
explanation of the filter sets developed and their implemen-
tation, please refer to the thesis by Woods . The filter
scheme also provides support for incremental development.
For instance the filter developed to read in raw DCE-MRI
data may be easily replaced by a filter which reads DICOM
In our current implementation we do not provide a
graphical user interface for composition of various filters
into filter groups, since our focus has been on evaluating
different strategies for parallel and distributed execution.
The filters are implemented in C++ using the base classes
provided by the DataCutter framework and the filter net-
work is expressed as an XML document . An extension
to our current implementation would be to investigate the
use of graphical tools, such as SCIRun  and AVS ,
which provide interfaces to compose applications from
individual modules, and of higher level languages [8, 20]
as front-end for creating and composing filters and filter
Texture Analysis Filters
Figure3. Threefiltersetstocarry outthemain
tasks in a Haralick texture analysis applica-
The purpose of the RFR filter is to read raw image data
from disk and send them to other filters for processing.
Multiple RFR filters can be executed if the image dataset
is distributed across several storage nodes. In this case, a
RFR filter is placed on each node containing image data.
Each RFR filter extracts the local data needed to build a
ROI and sends that data to the input stitch filter.
InputImageConstructor (IIC) (Input Stitch)
In order to compute a co-occurrence matrix, the com-
plete Region-of-Interest (ROI) data are needed. If the 4D
image dataset is distributed across multiple storage nodes,
then a copy of the RAWFileReader filter will retrieve and
send only the local data portions to other filters. The In-
putImageConstructor (IIC) filter reconstructs full ROIs and
distributes them to the texture analysis filters. The inputs
to the IIC filter filter are portions of the image data from
the output of different RFR filters. The IIC filter places the
input MRI portions into temporary buffers. After all data
elements needed to build a complete ROI are received, the
ROI is put into a send buffer. When the send buffer is full,
the buffer is sent to the texture analysis filters.
4.3.2Texture Analysis Filters
In our implementation, the Haralick texture analysis
algorithm can be carried out in a distributed environmentin
various ways. The Haralick texture analysis operations for
computingco-occurrencematrices and Haralick parameters
can be contained in a single filter or task-distributed among
two pipelined filters. Dividing the operations among two
filters creates another level of task parallelism, but also
introduces communicationoverheadbetween the filters that
perform the operations.
The HMP filter carries out the entire Haralick texture
analysis processing. The filter receives a buffer of ROI
image data from the IIC filter. For each of the ROIs in
the input buffer, the co-occurrence matrix is calculated
based on the image data within the ROI. The co-occurrence
matrix is then used to generate any Haralick parameters
that have been chosen by the user.
The HCC filter is responsible for calculating just the
co-occurrence matrix from the input data. For each ROI in
the input buffer, a co-occurrence matrix is calculated. The
co-occurrence information is stored in an output buffer.
When the output buffer becomes full or the end of an input
data message is received, the data in the output buffer is
passed to the HaralickParameterCalculator filter.
The HPC filter is responsibleforcalculatingthe Haralick
parameters from the co-occurrence matrices received from
a HCC filter. All user-selected Haralick parameters are cal-
culated for each matrix. Each parameter is stored in its own
output buffer. When the output buffers are full or when the
end of input data message is encountered,the data elements
stored within the output buffers are sent to an output filter.
The user may choose to send the output portions received
from texture analysis filters directly to disk. Once on disk,
the data may be postprocessed for purposes of computer
aided diagnosis. The user may also choose to store the
Haralick parameter results in a visual way. To accomplish
this, Haralick parameter output portions sent from the
texture analysis filters are received at an output stitch filter.
This filter reconstructs the parameter output portions into a
series of 4D datasets. Each 4D dataset is the output for a
single Haralick parameter. Once reconstructed, the 4D out-
putdatasets canbe writtento disk as a series of jpegimages.
The USO filter is responsible for writing the Haralick
parameter information out to disk. The input to this filter
is a stream of data elements for a Haralick parameter. The
filter then writes the parameter data out to disk.
input stream is assigned a unique file name.
opened, and the parameter values along with corresponding
positional information are stored to the file. Postprocessing
A file is
applications can then use the data stored in these files for
The HIC filter is used to build the Haralick parameter
information into images. This filter receives input streams
consisting of Haralick parameter information. Each input
stream contains a subset of the total output elements
for a single Haralick parameter. This output stitch uses
positional information stored in the input stream to place
the parameter elements into their appropriate positions
in the parameter image data structure.
elements for a Haralick parameter have been correctly
placed, a complete 4D dataset consisting of all elements
for a parameter has been built. Once the output dataset
is completely assembled, it is passed to the next filter for
Once all data
The JIW filter receives a stream of data containing el-
ements for a Haralick parameter that has been assembled
by position. The input stream also contains the minimum
and maximum values for the Haralick parameter elements.
Using the minimum and maximum values, the data can be
normalized. Therefore each value is assigned a value be-
tween zero and one. A zero results in a black pixel, and a
one results in a white pixel. Any intermediate elements are
assigned a scaled gray value. The filter then converts the
4D data into a series of 2D images that are stored in jpeg
Transparent filter copies of the RFR, HMP, HCC, HPC,
andUSO filters can beinstantiatedandexecutedin theenvi-
ronment. Figures 4 and 5 show two possible instantiations,
referred to here as the split HCC and HPC filter implemen-
tation and the HMP filter implementation, of the 4D Haral-
ick texture analysis application.
As stated earlier, a complete 4D ROI data is necessary to
build one co-occurrence matrix. Fig. 6(a) illustrates how a
2D image can be accessed by ROIs; thus, each data packet
build a co-occurrence matrix. In Fig. 6(a), ROIx and ROIy
correspond to the dimension lengths of the ROI, which are
supplied by the user. Also in Fig. 6(a) P1, P2, and P3 are
the data chunks sent to the texture analysis filters. Note that
most of the chunks contain overlapped data. If the input
data is retrieved by ROIs, the data elements in the over-
lapped regions must be retrieved and sent to the texture
analysis filters multiple times. Therefore, data retrieval in
Figure 4. An example instatiation of the split
HCC and HPC filter implementation. The in-
put data is distributed among four storage
nodes and ROIs are reconstructed using the
IIC filter. Texture analysis is performed us-
ingtransparent copies of the HCC filter
andtransparent copies of the HPC filter
thereby splitting the texture analysis opera-
tions among two filters. The output Haralick
parameters are stored to disk.
terms of ROIs creates the largest volume of communication
between the input filters and the texture analysis filters. In
order to reduce the amount of data read from disk and com-
municated between RFR and IIC filters as well as IIC and
texture analysis filters, the data are retrieved in 4D chunks,
each of which contains a subset of ROIs. In Figure 6(b) a
2D image is partitioned into four data chunks each with the
user specified dimensions
of overlap between two chunks in the
on the ROI
amount of overlap between two adjacent chunks in the
directiondependson the ROI
to Eq. 2.
?. The amount
?-dimension length according to Eq. 1, and the
The current implementation has two types of chunks; an
RFR-to-IIC chunk for data retrieved from disk and sent to
the IIC filter and a IIC-to-TEXTURE chunk for communi-
cation between the IIC and haralick texture analysis filters.
The input image data is stored as a set of image slices on
disk. A RFR filter reads a 2Dsubsectionofeachimageslice
Whenthe bufferis full, the RFR filter sends the bufferto the
IIC filter. When the IIC filter receives buffers from RFR fil-
ters, it copiesandreorganizesthecontentsofthe buffersina
set of buffers, each of which is a 4D array and corresponds
to a separate IIC-to-TEXTURE chunk. When a buffer is
full, it is sent to one of the copies of the texture analysis
Figure 5. An example instatiation of the HMP
filter implementation. The input data is dis-
tributed among four file systems and ROIs
are reconstructed using the IIC filter.
ture analysis is performed using
ent copies of the HMP filter, which combines
all texture analysis operations into one filter.
The output Haralick parameters are stored to
filters (i.e., HMP or HCC filters). An HMP or HCC fil-
ter then performs a raster scan of the chunk, received from
the IIC filter, for ROIs and computes co-occurrence matri-
ces. Having two types of chunks allow better optimization
of execution time for different types of overheads. For ex-
ample, a larger chunk size can be chosen for RFR-to-IIC
chunk so that the number of disk seek operations for data
retrieval can be reduced. With a smaller size for the IIC-to-
TEXTURE chunk, pipelining between the IIC and texture
analysis filters can be increased.
4.4.1Full vs Sparse Matrix Representation
The co-occurrence matrix is a
the intensities of neighboring pixels along a certain direc-
tion. The number of gray levels (
large, such as 65536 (16-bit) intensity levels, or relatively
small, such as requantized 32 (5-bit) levels. Our experi-
ments have shown that in some cases, matrices generated
using a typical 5
can have on average as little as 10.7 non-zero entries per
matrix (out of
matrix). We note that this average takes into account matrix
symmetry, and the symmetric entries are only stored once.
Knowing that many of the co-occurrence matrices are rela-
tively sparse leads to the following matrix storage schemes
The most obvious method of representing a co-
occurrence matrix in memory is a 2D array of
ements. In this work, we refer to such a representation as
matrix that relates
??) may be relatively
? 5 ROI and requantized 32 levels
??????? entries, about 1% of the
(a) A 2D image partitioned according to ROI.
(b) A 2D image partitioned into four data chunks.
Figure 6. Two data retrieval strategies: re-
trieving ROIs, retrieving chunks.
a full matrix storage representation. Without optimization,
all Haralick parametercalculationstreat each elementin the
matrix the same. Therefore, zero entries in the matrix are
added to running sums along with non zero entries. How-
ever, before adding an entry from the co-occurrencematrix,
the entry can be tested to see if it is zero. If the entry is zero
valued, then the entry is simply skipped. By first checking
for zero values, we are able to reduce the time needed to
process relatively sparse matrices. In fact, this optimization
allowed us to process a typical MRI dataset in one-fourth
The co-occurrence matrix may also be stored in a sparse
matrix storage representation. Only the non zero and non
duplicated (due to symmetry) entries are stored along with
positional information in memory. The positional informa-
tion is needed to map each non zero, non duplicated entry
to its position in the co-occurrence matrix. If a matrix is in
thesparse form,thenthe Haralickparametercalculationsdo
not have to check for non zero entries. Therefore, the ma-
trix can be processed directly from the sparse form, and no
conversion back to a co-occurrence array is needed. In ad-
dition, the sparse matrix representation can greatly reduce
the data traffic leaving the HCC filter, if the texture analy-
sis operations are split between the HCC and HPC filters.
If the matrices are stored in sparse form, then they are also
transmitted via the network in the sparse form.
5 Experimental Results
In the experiments presented in this paper, we used a
dataset obtained from a DCE-MRI study. This dataset con-
sists of 32 time steps. Each time step is made up of 32
images of 256
size1. The region of interest (ROI) was set according to
the dimension lengths 5
sen because previous studies on the analysis of 2D images
showed that such a ROI would be typical for an MRI appli-
cation . The number of gray levels,
tize the DCE-MRI dataset was set to 32, because in most
cases values greater than 32 do not significantly improve
the texture analysis results [24, 30].
Since each image slice in the input dataset is relatively
small, the RFR-to-IIC chunk dimension lengths used in the
experimentswereset to 256
filter can read one image slice without any disk seek oper-
ations required to retrieve smaller image regions. The IIC-
to-TEXTURE chunk dimension lengths used in data parti-
tioning for distribution to texture analysis filters were set
ing smaller chunks, the overlap between partitions created
a volume of communication that was too great. As a result,
the program execution time was unacceptably large. Larger
chunk sizes also produced poor results because the large
data portions could not be distributed to the texture analysis
filters fast enough, which left some texture analysis filters
in an idle state. Therefore, we chose a chunk size that had
a tolerable amount of overlap as a result of partitioning and
also produced a balanced data distribution among the tex-
ture analysis filters. The HCC filters were configured to
send out a packet of co-occurrence matrices whenever
of a 67
sible packet size would be the entire chunk. However, for
? 256 pixels each. Each pixel is 2 bytes in
? 5. This ROI size was cho-
??, used to requan-
? 6. Inthis way, a RFR
? 6 for all tests. When we conducted tests us-
? 6 chunk had been processed. Another pos-
1Note that this sample dataset is small enough to fi t in memory of a
processor. Hence, as an optimization the dataset can be replicated on all
of the nodes and read into memory as a whole in order to eliminate the
need for the IIC fi lter. However, for large datasets it may not be possible
to apply this optimization. Hence, in our experiments we assume that the
sample dataset is not replicated and too big to fi t in memory.
our configuration these settings result in good pipelining of
data across different stages of the filter group, but do not
cause excessive communication latencies.
For all tests, the following Haralick parameters were
calculated: Angular Second Moment, Correlation, Sum
of Squares, and Inverse Difference Moment  (see Ap-
pendix). We chose these parameters since they are four of
all Haralick parameters because a typical DCE-MRI study
would likely not need all parameters in order to generate a
5.2Homogeneous Cluster Experiments
For the experiments detailed in this section, a homoge-
neous PC cluster was used. The cluster, referred to here
as PIII, contains 24 nodes, each with a Pentium III proces-
sor and 512 MB of memory. All nodes are connected via a
FastEthernet Switch capable of transmittingdata at a rate of
100 Mbits per second.
In the first set of experiments, we investigate the impact
of using full matrix representation vs sparse matrix repre-
ter implementation and the HMP filter implementation (see
Figures 4 and 5). In the experiments, the input dataset was
distributed across 4 I/O nodes. One of the nodes in the sys-
tem was used to run the IIC filter. One USO filter was used
for output. The remaining nodes were used to run the HMP
filters or the HCC and HPC filters. Figure 7(a) shows the
execution time when the number of nodes for HMP filters
is varied from 1 to 16. In each configuration, one transpar-
ent copy of the HMP filter was placed on one node. As
is seen from the figure, the implementation using sparse
matrix representation performs worse than the implementa-
tion using full matrix representation. When HMP filters are
used, the co-occurrence matrix computation and Haralick
parameter calculation are done in the same filter, and there
is no communication overhead between the two operations.
Thus, the overhead introduced due to storing and accessing
co-occurrence matrix in sparse representation degrades the
performance. On the other hand, using sparse matrix repre-
sentation achieves better performance in the split HCC and
HPC filter case, as seen in Figure 7(b). This is mainly be-
cause of the fact that with sparse representation the com-
munication overhead is reduced significantly. In this exper-
iment, multiple transparent copies of HCC and HPC filters
are created, but only one filter is executed on one node – we
should note that for the one-node configuration, both HCC
and HPC filter copies are executed on the same node. The
number of copies for HCC and HPC filters was determined
based on their relative processing times. We observed that
the HCC filter was about 4 to 5 times more expensive then
the HPC filter on average. Hence, the number of nodes in a
given setup was partitioned so that a 4-to-1 ratio was main-
tained between HCC and HPC filters, when possible. For
example,forthe 16-nodeconfiguration,13HCC and3 HPC
filters were executed in the system.
The split HCC and HPC filter implementation provides
flexibility in that HCC and HPC filters can be executed on
separate nodes or run on the same node. The next set of
experiments examines the performanceimpact of executing
copies of HCC and HPC on the same node. When HCC and
HPC filters are placed on the same node, the communica-
tion overhead will be reduced since buffers from the HCC
filter that are delivered to the local copy of the HPC fil-
ter will incur no communication overhead(buffer exchange
between two co-located filters is done via simple pointer
copy operation). In addition, more copies of HCC and HPC
filters can be executed in the system. However, since a
node in the cluster used in these experiments has a single
processor, the CPU has to multiplex between the two fil-
ters and its power has to be shared. In Figure 8, No Over-
lap denotes the case in which no two filters are co-located,
whereas copies of HCC and HPC filters are executed on the
same node in the case of Overlap. In the experiments, the
HMP filter implementation used the full matrix representa-
tion and the split HCC and HPC filter implementation em-
ployed the sparse matrix representation for co-occurrence
matrices. As is seen in the figure, Overlap achieves better
performance compared to the HMP filter implementation
and the No Overlap HCC+HPC implementation. Although
the processing power is shared between two filters, the re-
duction in communication overhead and more copies of fil-
ters result in better performance. We also observe that in
the one-node case, the split HCC and HPC filter implemen-
tation performs better than the HMP filter implementation.
This result can be attributed to better pipelining that can be
achieved by the split implementation; when HCC or HPC
filter is waiting for send and receive operationsto complete,
the other filter can be doing computation.
Figure 9 shows the processing time of each filter (i.e.,
RFR, IIC,HCC, HPC, andUSO)forthesplitHCCandHPC
filter implementation. The read (RFR) and write (USO)
overheads are negligible compared to the time taken by
other filters. We observethat the executiontime of the HCC
and HPC filters decrease as more nodes are added. How-
ever,the IICfilter becomesa bottleneckfilter in the 16-node
configurationand adversely affects the scalability of the ap-
plicationto largernumberofnodes. Inorderto alleviatethis
bottleneck, multiple explicit copies of the IIC filter should
be instantiated. While transparent copies of the RFR, HCC,
and HPC filters can be executed to take advantage of de-
mand driven buffer scheduling, explicit copies of the IIC
filter need to be created. This is because pieces of the same
RFR-to-IIC chunk retrieved by multiple RFR filters must
Number of processors
Execution Time (seconds)
Number of Processors
Execution Time (seconds)
Figure 7. The performance impact ofusing full matrix representation vs sparse matrix representation.
(a) the HMP filter implementation. (b) The split HCC and HPC filter implementation.
Number of processors
Execution Time (seconds)
HCC+HPC No Overlap
HPC+HCC All Overlap
Figure 8. The performance impact of co-
locating HCC and HPC filters vs runningthem
on separate processors.
be assembled together to form complete IIC-to-TEXTURE
chunks and ROIs. We examined round robin distribution
of RFR-to-IIC chunks across multiple copies of the IIC fil-
ter. Our results showed that as the number of IIC filters is
increased, the processing time of each IIC filter decreases
almost linearly, as expected.
5.3Heterogeneous Environment Experiments
In this set of experiments, we investigated execution of
the parallel implementation in a heterogeneous environ-
ment. In addition to the PIII cluster used in the homo-
geneous experiments, two additional clusters were made
available. The first additional cluster, referred to here as
XEON, contains five nodes. Each node of the XEON clus-
ter has dual Xeon 2.4GHz processors and 2GB of mem-
ory. The nodes on the XEON cluster are connected by a
Gigabit Switch. The second additional cluster, referred to
Number of processors
Execution Time (seconds)
Figure 9. The processing time of each filter in
the split HCC and HPC filter implementation.
In this experiment, HCC and HPC filters are
executed on separate nodes.
here as OPTERON, contains six nodes. Each node of the
8GB of memory. The nodes on the OPTERON cluster are
also connected by a Gigabit Switch. PIII is connected to
XEON and OPTERON through a shared 100 Mbit/s net-
work. XEON and OPTERON are connected to each other
using a Gigabit network.
tions in a heterogeneous environment. The first experiment
provides a comparison of the HMP filter implementation
and the split HCC and HPC filter implementation using the
PIII and XEON clusters. In this experiment, 4 RFR filters,
4 IIC filters, and 2 USO filters were executed on the PIII
cluster. The texture analysis filters were placed across the
two clusters on a total of 18 nodes (13 nodes from the PIII
cluster and 5 from the XEON cluster). For the HMP filter
implementation,one transparentcopy of HMP filter was in-
Execution Time (seconds)
Figure 10. Performance comparison of the
HMP filter implementation and the split HCC
and HPC filter implementation in a heteroge-
stantiated on each processor. Since the XEON cluster has
10 processors (on 5 nodes), the total number of HMP filters
was 23. For the split HCC and HPC filter implementation,
one copy of HCC and one copy of HPC were co-located
on each node, which resulted in 18 copies of HCC and 18
copies of HPC filters. While the HMP filter implementation
aims to achieve good performance by spreading data across
more HMP filters, the split HPC and HCC implementation
targets a more balanceduse of task- and data-parallelismby
splitting the co-occurrence matrix computation and Haral-
ick parameter calculation operations and creating multiple
copies of individual filters. However, fewer copies of each
filter are created in the split HCC and HPC filter implemen-
tation. As is seen in Figure 10, the split implementation
achieves better performance. First, although 10 copies of
HMP filter can be created on the XEON cluster, more data
has to flow from the PIII cluster to this cluster across a rela-
tivelyslow networktomakeoptimaluseofthese copies. On
the other hand, the split HCC and HPC filter implementa-
tion can take advantage of demand driven scheduling of co-
occurrence matrix buffers across the filters within the same
cluster, once an IIC-to-TEXTURE chunk is received. Sec-
ond, better pipelining of computations and better overlap
between computation and communication can be achieved
with the split HCC and HPC filter implementation. Espe-
cially on the XEON cluster where the HCC and HPC filters
are co-located on the same node, but run on separate pro-
In the second heterogeneous environment experiment,
the XEON and OPTERON clusters were used to compare
round robin and demand driven buffer scheduling policies.
Four RFR filters, 1 IIC filter, 2 HPC filters, and 1 USO filter
were executed on separate nodes on the OPTERON cluster.
Because the HCC filter is the most computation-expensive
filter, the HCC filters were used to evaluate the round robin
Round Robin Demand Driven
Buffer Scheduling Method
Execution Time (seconds)
Figure 11. Performance comparison of the
and demand driven scheduling policies. Four HCC filters
were placed on the XEON nodes and 4 HCC filters were
placed on the OPTERON nodes. In this filter layout no
more than one filter is assigned to any processor. When
using the round robin mechanism, the DataCutter sched-
ulerassures that all transparentfilter copiesreceiveapproxi-
matelythesame numberofdatabuffers. Thedemanddriven
mechanism allows the DataCutter scheduler to assigns data
buffers to the transparent copy that will likely process the
data the fastest. As shown in Figure 11, the demand driven
method performs better than the round robin method. Filter
placement also becomes important when using the demand
driven write policy. Because the OPTERON HCC filters re-
ceive more data packets in demand driven scheduling, there
is less communicationoverheadbecause the HPC filters are
also placed on the OPTERON nodes. In this experiment,
the round robin scheduling method causes the XEON HCC
filters to receive more data packets; therefore, more HCC-
mand driven method.
bandwidth play an important role in choosing the imple-
mentation to use, filter layout, and sizes of buffers used for
transferring data between two filters. For example, if net-
work latency is high and bandwidth is low, communication
overhead incurred by transmitting small buffers can out-
weigh the gain from more pipelining. In such a case, larger
buffers might achieve better performance results. Also, fil-
ters that exchange large volumes of data can be colocated
to minimize communication volume. We plan to carry out
a more extensive investigation of the impact of architec-
ture parameters on the choice of implementation in a future
Haralick texture analysis is a computation intensive
application that involves repeated co-occurrence matrix
generationand repeated computationson the co-occurrence
matrices. The 4D datasets produced by time dependent
imaging methods also affects the amount of computation
as such datasets can be large. The storage and memory
resources available on a single computer may not be
sufficient to manage large datasets. We developeda parallel
4D Haralick texture analysis implementation to address
these challenges. Our implementation demonstrates how
operationsof the texture analysis programmay be data- and
task-distributed to allow parallel and pipelined operation.
We have evaluated different implementations and optimiza-
tions on cluster of PCs. Our results show that the split HCC
and HPC filter implementation achieves good performance
when sparse matrix representation is employed. The results
also show that in a heterogeneous computing environment,
the split HCC and HPC filter representation provides
greater flexibility and improved pipelining compared to the
Acknowledgment. We would like to express our gratitute
to the reviewers of our paper. Their comments and sugges-
tions helpedus greatly in improvingthe contentand presen-
tation of the paper.
 M. Aeschlimann, P. Dinda, J. Lopez, B. Lowekamp,
L. Kallivokas, and D. O’Hallaron.
port on the design of a framework for distributed vi-
sualization. In Proceedings of the International Con-
ference on Parallel and Distributed Processing Tech-
niques and Applications (PDPTA’99), pages 1833–
1839, Las Vegas, NV, June 1999.
 L. Arge, L. Toma, and J. S. Vitter. I/o-efficient algo-
rithms for problems on grid-based terrains. In Pro-
ceedings of 2nd Workshop on Algorithm Engineering
and Experimentation (ALENEX ’00), 2000.
 C. L. Bajaj, V. Pascucci, D. Thompson, and X. Y.
Zhang. Parallel accelerated isocontouring for out-of-
core visualization. In Proceedings of the 1999 IEEE
Symposium on Parallel Visualization and Graphics,
pages 97–104, San Francisco, CA, USA, Oct 1999.
 M. Beynon, T. Kurc, A. Sussman, and J. Saltz. Design
of a framework for data-intensive wide-area applica-
tions. In Proceedings of the 9th Heterogeneous Com-
puting Workshop (HCW2000), pages 116–130. IEEE
Computer Society Press, May 2000.
 M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and
J. Saltz. DataCutter: Middleware for filtering very
large scientific datasets on archival storage systems.
In Proceedings of the Eighth Goddard Conference on
Mass Storage Systems and Technologies/17th IEEE
Symposium on Mass Storage Systems, pages 119–133.
National Aeronautics and Space Administration, Mar.
2000. NASA/CP 2000-209888.
 M. D. Beynon, T. Kurc, U. Catalyurek, C. Chang,
A. Sussman, and J. Saltz. Distributed processing of
very large datasets with DataCutter. Parallel Comput-
ing, 27(11):1457–1478,Oct. 2001.
 BrookGPU: Brook for GPUs. A compiler and
 I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fata-
halian, M. Houston, and P. Hanrahan. Brook for gpus:
Graphics (SIGGRAPH 2004 Proceedings), 2004.
 H. Casanova and J. Dongarra.
The International Journal of Supercom-
NetSolve: a net-
 Y.-J. Chiang and C. Silva.
niques for isosurface extraction in scientific visualiza-
tion. In J. Abello and J. Vitter, editors, External Mem-
ory Algorithms and Visualization, volume 50, pages
247–277. DIMACS Book Series, American Mathe-
matical Society, 1999.
External memory tech-
 P. Colantoni, N. Boukala, and J. da Rugna.
cards. In Proceedings of 8th International Fall Work-
shop on Vision, Modeling, and Visualization 2003
(VMV 2003), Nov. 2003.
 R. W. Conners and C. A. Harlow. A theoretical com-
parison of texture algorithms. IEEE Trans. on Pattern
Analysis and Machine Intelligence, PAMI-2(3):204–
222, May 1980.
 M. Cox and D. Ellsworth. Application-controlled de-
mand paging for out-of-core visualization. In Pro-
ceedingsof the 8thIEEEVisualization’97Conference,
 D. DeWitt and J. Gray. Parallel database systems: the
future of high performance database systems. Com-
munications of the ACM, 35(6):85–98,June 1992.
 C. Faloutsos and P. Bhagwat. Declustering using frac-
tals. In Proceedings of the 2nd International Confer-
ence on Parallel andDistributed InformationSystems,
pages 18–25, Jan. 1993.
 D. Fleig. DCE-MRI medical image processing using
haralick texture analysis. Master’s thesis, Ohio State
 N. K.Govindaraju,B.Lloyd,W. Wang,M.C. Lin,and
D. Manocha. Fast database operations using graphics
processors. In Proceedings of ACM SIGMOD 2004,
 R. M. Haralick, K. Shanmugam, and I. Dinstein. Tex-
tural features for image classification. IEEE Trans. on
Systems, Man, and Cybernetics, 3(6):610–621, Nov.
APurely Functional Language.
 S. Hastings, T. Kurc, S. Langella, U. Catalyurek,
T. Pan, and J. Saltz. Image processing for the grid:
A toolkit for building grid-enabled image processing
applications. In CCGrid: IEEE International Sympo-
sium on Cluster Computing and the Grid. IEEE Press,
 M.HopfandT.Ertl. Acceleratingmorphologicalanal-
ysis with graphics hardware. In Proceedings of Work-
shop on Vision, Modeling, and Visualization (VMV
 National Library of Medicine. Insight Segmentation
and Registration Toolkit (ITK). http://www.itk.org/.
 D. C. James.
lated microcalcification effects in breast magnetic res-
onance imaging. Master’s Thesis, Ohio State Univer-
Haralick texture analysis of simu-
 M. V. Knopp, F. Giesel, H. Marcos, H. von Tengg-
Kobligk, and P. Choyke. Dynamic contrast-enhanced
magnetic resonance imaging in oncology. Topics in
Magnetic Resonance Imaging, 12(2):301–308,2001.
 M. V. Knopp,E.Weiss, H. Sinn,J. Mattern,H. Junker-
mann, J. Radeleff, A. Magener, G. Brix, S. Delorme,
I. Zuna, and G. van Kaick. Pathophysiologic basis
of contrast enhancement in breast tumors. Journal of
Magnetic Resonance Imaging, 10:260–266,1999.
 T. Kurc, S. Hastings, U. Catalyurek, J. Saltz, J. D.
Fleig, B. D. Clymer, H. von Tengg-Kobligk, K. T.
Baudendistel, R. Machiraju, and M. V. Knopp. A dis-
tributed execution environment for analysis of DCE-
MR image datasets. In The Society for Computer Ap-
plications in Radiology (SCAR 2003), published as an
 H. Kutluca, T. Kurc, and C. Aykanat. Image-space de-
composition algorithms for sort-first parallel volume
renderingof unstructuredgrids. The Journal of Super-
 A. Lefohn, J. Cates, and R. Whitaker.
gpu-based level sets for 3d segmentation. In Medi-
cal Image Computing andComputerAssistedInterven-
tion (Miccai), pages 564–572, 2003.
 R. A. Lerski, K. Straughan, L. R. Schad, D. Boyce,
S. Bluml, and I. Zuna. MR image texture analysis -
An approachto tissue characterization. Magnetic Res-
onance Imaging, 11:873–887,1993.
 D.-R. Liu and S. Shekhar. A similarity graph-based
approach to declustering problems and its application
towards parallelizing grid files. In the 11th Inter. Con-
ference on Data Engineering, pages 373–381, Taipei,
Taiwan, Mar. 1995.
 E. Manolakos and A. Funk.
component-based distributed image processing appli-
cations using javaports. In Workshop on Computer-
Industrial Collaboration Conference, 2002.
Rapid prototyping of
 B. Moon and J. H. Saltz.
of declustering methods for multidimensional range
queries. IEEE Transactions on Knowledge and Data
Engineering, 10(2):310–327,March/April 1998.
 M. Oberhuber. Distributed high-performance image
processingontheinternet. Master’s thesis, Technische
Universitat Graz, 2002.
 A. R. Padhani. Dynamic contrast-enhanced MRI in
 W. Schroeder, K. Martin, and B. Lorensen. The Vi-
sualization Toolkit: An Object-Oriented Approach To
3D Graphics. Prentice Hall, 2nd edition, 1997.
 SCIRun:A Scientific ComputingProblem Solving En-
vironment. Scientific Computing and Imaging Insti-
tute (SCI), http://software.sci.utah.edu/scirun.html.
Figure12. Thedirectionsrelative tothecenter
pixel for 2D Haralick texture analysis.
 G. D. Tourassi. Journey toward computer-aided di-
agnosis: Role of image texture analysis. Radiology,
 S.-K. Ueng, K. Sikorski, and K.-L. Ma. Out-of-core
streamline visualizationon large unstructuredmeshes.
IEEE Transactions on Visualization and Computer
Graphics, 3(4):370–380,Dec. 1997.
 B. J. Woods. 4-D haralick texture analysis of DCE-
MRI datasets using distributed computing.
graduate Honors Research Thesis, Ohio State Univer-
The Co-occurrence Matrix
The co-occurrence matrix measures the number of oc-
currences in which two neighboring pixels, one with gray
?? and the other with gray level
??? , occur a distance
? away and along a certain direction. Therefore, a neigh-
boring pixel is defined by a distance and a direction from
another pixel. In our implementation no interpolation was
used; therefore, distance refers to the number of discrete
pixels that are between two neighboring pixels. For exam-
ple, pixels that neighbor at the corner, as in the 45 degree
case, are still considered to be one unit of distance apart.
Given a pixel with gray level
can be used to determine the probability that a neighboring
pixel a certain distance away and along a certain direction
has a gray level
can be broken down according to an angle,
? , the co-occurrence matrix
?? . For a simple 2D case, the directions
?, from pixel
? . For 2D, these angles are 0, 45, 90, and 135 degrees (see
Figure 12). The co-occurrence matrix, p(
135 degrees can be defined as follows.
?), for 0 and
where # denotes the number of elements in the set,
is the gray-tone value at pixel
is the gray-tone value at pixel
Original work on texture analysis was only concerned
with applying texture analysis to 2-D images. As a re-
sult only four unique directions need to be considered for
two dimensions, and the directions were defined as angles,
such as 0, 45, 90, and 135 degrees. As the dimension in-
creases, the number of total directions also increases. In a
40 unique directions exist.
The Haralick parameter functions used in the Ex-
Angular Second Moment
The first Haralick texture parameter used in our experi-
ments is angular second moment. The moments of a ran-
dom variable help quantify the distribution of values in the
co-occurrencematrix. Given an image, angular second mo-
ment can be used to measure the amount of homogeneity
in the image. Eq. 5 describes the formula used to calculate
angular second moment.
The second parameter, correlation, measures the relation-
ship between two variables using the covariance values of
the two random variables. A high correlation value indi-
cates that the pixels in an image have similar tonal values.
Correlation is calculated according to Eq. 6.
Sum of Squares
ond order statistics. The sum of squares, which is also the
same as second central moment, measures the spread of the
distribution of the two random variables. Eq. 7 calculates
the sum of squares.
Inverse Difference Moment
Inverse difference moment also relates to the study of sec-
ond order statistics. In the calculation of this parameter,
ference moment is described by Eq. 8. Note: one is added
to the denominator to avoid dividing by zero.