Adaptive techniques for microarray image analysis with related quality assessment.
ABSTRACT We propose novel techniques for microarray image analysis. In particular, we describe an overall pipeline able to solve the most common problems of microarray image analysis. We pro pose the microarray image rotation algorithm (MIRA) and the statis tical gridding pipeline (SGRIP) as two advanced modules devoted to restoring the original microarray grid orientation and to detecting, the correct geometrical information about each spot of input mi croarray, respectively. Both solutions work by making use of statis tical observations, obtaining adaptive and reliable information about each spot property. They improve the performance of the microarray image segmentation pipeline (MISP) we recently developed. MIRA, MISP, and SGRIP modules have been developed as plugins for an advanced framework for microarray image analysis. A new quality measure able to effectively evaluate the adaptive segmentation with respect to the fixed (e.g., nonadaptive) circle segmentation of each spot is proposed. Experiments confirm the effectiveness of the pro posed techniques in terms of visual and numerical data. © 2007 SPIE

Conference Paper: Objective analysis of simple kidney cysts from CT images
[Show abstract] [Hide abstract]
ABSTRACT: Simple kidney cysts analysis from CT images is nowadays performed in a direct visual and hardly reproducible way. Computeraided measurements of simple kidney cysts from CT images may help radiologists to accomplish an objective analysis of the clinical cases under observation. We propose a semiautomatic segmentation algorithm for this task. Experiments performed on real datasets confirm the effectiveness and usefulness of the proposed method.Medical Measurements and Applications, 2009. MeMeA 2009. IEEE International Workshop on; 06/2009  SourceAvailable from: Sebastiano Battiato
Conference Paper: A bioinspired CNN with reindexing engine for lossless DNA microarray compression and segmentation
[Show abstract] [Hide abstract]
ABSTRACT: The DNA microarray images allow to analyze the natural gene expressions. In this paper we propose an advanced method to efficiently address the imaging storage as well as the performance of the algorithm used to retrieve information from DNA images. The cellular neural networks (CNNs) based core is able to provide a method to extract foreground (the DNA gene expression information) from DNA images. It is also proposed an innovative method to compress the DNA image by reorganizing the signal data belonging to the background by making use of a novel way to apply the reindexing techniques to almost Â¿uncorrelatedÂ¿ signal. Experiments confirm how the proposed method outperform previous solution in almost all cases.Image Processing (ICIP), 2009 16th IEEE International Conference on; 12/2009  SourceAvailable from: Sebastiano Battiato
Conference Paper: Neurofuzzy segmentation of microarray images
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we propose a novel microarray segmentation strategy to separate background and foreground signals in microarray images making use of a neurofuzzy processing pipeline. In particular a Kohonen Self Organizing Map followed by a Fuzzy KMean classifier are employed to properly manage critical cases like saturated spot and spike noise. To speed up the overall process a Hilbert sampling is performed together with an adhoc analysis of statistical distribution of signals. Experiments confirm the validity of the proposed technique both in terms of measured and visual inspection quality.Pattern Recognition, 2008. ICPR 2008. 19th International Conference on; 01/2009
Page 1
Adaptive techniques for microarray image analysis with
related quality assessment
Sebastiano Battiato
Gianpiero Di Blasi
Giovanni Maria Farinella
Giovanni Gallo
Giuseppe Claudio Guarnera
University of Catania
Image Processing Laboratory
Catania, Italy
Email: battiato@dmi.unict.it
Abstract. We propose novel techniques for microarray image
analysis. In particular, we describe an overall pipeline able to solve
the most common problems of microarray image analysis. We pro
pose the microarray image rotation algorithm (MIRA) and the statis
tical gridding pipeline (SGRIP) as two advanced modules devoted to
restoring the original microarray grid orientation and to detecting,
the correct geometrical information about each spot of input mi
croarray, respectively. Both solutions work by making use of statis
tical observations, obtaining adaptive and reliable information about
each spot property. They improve the performance of the microarray
image segmentation pipeline (MISP) we recently developed. MIRA,
MISP, and SGRIP modules have been developed as plugins for an
advanced framework for microarray image analysis. A new quality
measure able to effectively evaluate the adaptive segmentation with
respect to the fixed (e.g., nonadaptive) circle segmentation of each
spot is proposed. Experiments confirm the effectiveness of the pro
posed techniques in terms of visual and numerical data. © 2007 SPIE
and IS&T. ?DOI: 10.1117/1.2816445?
1
DNA microarray1,2is a fundamental biotechnology for
gene expression profiling and biomedical studies. Image
analysis has found applications in microarray technology
because it is able to extrapolate new and nontrivial knowl
edge that is partially hidden in the images. In a typical
microarray experiment, two 16bit TIFF images are ob
tained using microarray scanners. The two images are con
ventionally assigned with a red and a green channel to
elaborate them within conventional image processing soft
ware. The processing pipeline for these input data is sum
marizable in the following steps:3gridding, segmentation,
intensity extraction, and quality measures.4,5Gridding and
segmentation are crucial steps: they have a potentially large
impact on subsequent analysis ?e.g., clustering or identifi
cation of differentially expressed genes6?. In the last de
cade, many academic7and commercial microarray image
analysis software and methods have been developed. Some
Introduction
of them5,8improve a primordial simple solution9,10with
heuristic strategies. Nevertheless, human interaction is still
usually required to obtain a high level of accuracy. Mi
croarray images contain a set of grids that are organized at
two levels ?e.g., 4?4, 16?16, etc.?. As in Refs. 11 and 12,
we focus our attention on inner grids, since several authors
have successfully addressed the problem of locating and
segmenting the outer grid.
In this paper we introduce a novel algorithm, the mi
croarray image rotation algorithm ?MIRA?, to take into ac
count typical rotation problems of microarray images.
MIRA is aimed at correcting in a preprocessing phase, ro
tation problems in the microarray images, ensuring that
successive pipeline steps are not affected by such geometri
cal distortions. MIRA uses statistical analysis on the rows
and columns of a binary map to infer a correction angle.
The microarray images obtained by MIRA are hence pro
cessed by the statistical gridding pipeline ?SGRIP? and the
microarray image segmentation pipeline ?MISP13?, to per
form all the other involved steps in microarray image
analysis: gridding, segmentation, and data/quality measure
extraction.
In particular, the gridding process is realized by the
SGRIP module: Histogram analysis on the rows and the
columns of a binary mask of input data obtains statistical
information about the signal’s spatial distribution. Starting
from an initial guess, the final grid mask is refined accord
ing to local considerations about typical spot acquisition
problems ?e.g., spot overlap, comet tails, etc.?. SGRIP real
izes a fully unsupervised gridding and leads to an accurate
grid where each single spot is correctly addressed. MISP
starting from the robust gridding provided by SGRIP reli
ably performs data extraction and quality measure evalua
tion. MIRA, MISP, and SGRIP together allow us to auto
matically detect the microarray grid, to correct the rotation
angle, to assign coordinates to each spot, to discriminate
among the foreground, background, and local background,
to calculate intensity, and to extrapolate quality measures.
To test our algorithms, we have developed a framework
called microarray image analysis framework ?MIAF?. The
Paper 06056RR received Apr. 5, 2006; revised manuscript received Jun. 7,
2007; accepted for publication Jun. 23, 2007; published online Dec. 17,
2007. This paper is a extension of a paper presented at the SPIE Confer
ence on Image Processing: Algorithms and Systems, Neural Networks,
and Machine Learning, Jan. 2006, San Jose, Calif. The paper presented
there appears ?unrefereed? in SPIE Proceedings Vol. 6064.
10179909/2007/16?4?/043013/20/$25.00 © 2007 SPIE and IS&T.
Journal of Electronic Imaging 16(4), 043013 (Oct–Dec 2007)
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4)0430131
Page 2
overall architecture of the system is pluginbased ?Fig. 1?.
The pipeline has been validated through an extensive ex
perimental phase.
The paper is organized as follows: Section 2 syntheti
cally reviews microarray image analysis, focusing on the
state of the art for rotation and gridding algorithms. Section
3 describes the preprocessing step performed by MIRA,
and Section 4 summarizes the global MISP pipeline.
SGRIP is described in detail in Section 5. A quality mea
sure used for comparison of segmentation algorithms is
presented in Section 6. Finally, experiments and conclu
sions are reported in Sections 7 and 8, respectively.
2
A microarray typically consists of several blocks with the
same layout, and the printtips on the arrayer are normally
arranged in a regular array.14Assuming ideal conditions,
the spots in each block are located in an evenly spaced
lattice corresponding to the layout of the print tips. The
ideal microarray image has the following properties:15
Related Works
1. all grids are of the same size;
2. subgrids are regularly spaced;
3. the spots are centered on the intersections of the sub
grid lines;
4. the spots’ size and shape are perfectly circular for all
spots;
5. grid geometry does not change within a given type of
slide;
6. no dust or contamination affects the slide;
7. there is minimal and uniform background intensity
across the image.
In most of the real cases, the microarray images violate
some of these conditions. Variation during the printing of
the array produces a variation of the exact locations of
spots in an unpredictable way. Other problems can be re
lated to the spot position, irregularities on the spot ?shape
and size?, and contamination.
In the following subsections, we briefly review the ca
pabilities of existing microarray image analysis tools,
specifying in detail the proposed solution to manage visu
alization, segmentation, and intensity and quality measure
extraction.
Rotation and gridding processes are reported in separate
subsections to better analyze and compare existing solu
tions with SGRIP and MISP capabilities. Also, we briefly
compare the overall characteristics and major drawbacks.
2.1
Scanalyze9starts by converting each pair of input TIFF
images into a single RGB image as follows:
Gain??Cy5 _ pixel
?Normalization?
G = Gain??Cy3 _ pixel
Microarray Image Analysis Tools
R =
256?
,
256?,
B = 0.
?1?
The brightness is controlled by the Gain parameter, while
Normalization is a balancing parameter. The Scanalyze
gridding phase relies on user intervention to specify various
parameters ?e.g., separation between rows and columns?.
Scanalyze uses the fixedcircle segmentation method. The
circle size has to be userprovided and leads to the gridding
phase. All spots are considered valid even if they do not
enclose a signal; such spots can be manually excluded from
the successive information extraction phase. To estimate
the background local intensity, the median values are used.
The ratio G/R of the average foreground is obtained by
taking into account the correction of the background. The
software produces a set of information and several quality
values ?e.g., channels correlation? for each spot.
Genepix8uses a square root transformation to reduce the
dynamic range of input images. It is also possible to manu
ally select the 8 bits to be saved, or to use the predefined
options to save high, low, or central bits. Spot addressing is
automatic. In the first versions, a circlebased segmentation
with variable dimension was used, and the spot could be
classified as “not found” according to some conditions
?e.g., “spot diameter less than 6 pixels,” “spot position
overlaps another spot,” or “spot diameter is outside the
planned options limits”?. In more recent releases, this seg
mentation method has been replaced by irregular spot seg
mentation. Local and global methods are used to compute
the background and foreground intensities for each spot.
Spot circularity is also adopted as a quality measure.
Spot6uses a linear combination weighted by the median
values to obtain a single image, I=G?+?mG/mR??R?,
where G? and R? are the images obtained from G and R
using a square root transformation and mG and mR are the
corresponding median values. Spot addressing is based on
batch processing over a collection of microarray images
with the same geometric structure. Successive steps are au
tomatic and produce two estimated grids:
Fig. 1 The global architecture of the proposed MIAF framework.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4)0430132
Page 3
• fitted foreground grids: horizontal and vertical lines
passing for the centers of the estimated spots;
• fitted background grids: horizontal and vertical lines
passing through the gaps of the estimated centers be
tween spots.
In the segmentation phase, a seeded regiongrowing algo
rithm is used.16The seeds are chosen according to the es
timated grids. Background, foreground intensity, and qual
ity measures are computed similarly to GenePix.
Angulo and Serrà7combined input images using a linear
combination weighted by median. The overall pipeline
combines addressing and gridding techniques, making use
of morphological operators together with classical segmen
tation algorithms; the overall performances are evaluated in
terms of segmentation accuracy without providing quality
measures.
Matarray17uses a combination of intensity and spatial
information for spot detection and signal segmentation. The
anchor point and grid dimension are specified by the user.
Starting from a first draft identification of the spot centers,
the overall area is split in patches defining a circular mask
for each patch used for spot segmentation. An iterative pro
cess is then achieved, calculating the signal intensity and
local background to improve detection. The combined qual
ity index described in Ref. 17 is used for quality assess
ment.
MicroArray Genome Imaging and Clustering Tool
(MAGIC10) analyzes all types of gene expression data on all
major operating systems. Visualization is performed by a
weighted linear combination. Gridding requires the number
of grids and the number of rows and columns for each grid;
moreover, an indication of the coordinates of the top left
spot, top right spot, and bottom row is required. Segmenta
tion is performed by choosing one of three algorithms:
fixed circle, adaptive circle, and seeded region growing.
The fixed circle is centered in the grid square, with a user
specified radius. The adaptive circle algorithm analyzes the
signal in each square of the grid to determine the most
appropriate center and radius ?within a userspecified
range? for each circle. Finally, the adaptive circle’s center is
set to contain the largest number of “on” pixels. A seeded
regiongrowing algorithm connects each pixel to a back
ground or foreground region until all pixels have been
properly labeled. A userspecified threshold and geometric
considerations determine which pixels may be used to
“seed” the regions. The user can choose to consider the
background in computation of a green to red ratio signal.
Each spot can be ignored using manual flag selection.
MAGIC creates an “Expression file” containing foreground
and background spots’ intensity for each channel and chan
nel ratio intensity.
Only a few tools are able to consider and properly man
age the overall involved microarray image analysis steps.
Table 1 shows the main characteristics of each reviewed
tool, also reporting the presence or absence of fundamental
modules such as rotation, quality measure extraction, etc.
To better highlight differences and similarities with the pro
posed approach, we also report MIAF at the end of Table 1.
Table 1 Tool’s characteristics.
Software
Correction
of Grid
Rotation
SegmentationGridding
Manual/Automatic Type Manual/Automatic
Parameters
Required Iterative/Single Step
Scanalyze9
NoManual Fixed circle,
adaptive circle
ManualYesManual
refinement
Genepix8
No Automatic Adaptive circle,
seeded region
growing
Automatic YesSingle step
Spot6
NoAutomaticSeeded region
growing
Automatic Yes, batch
procedure
Iterative
Angulo
and Serrà7
NoAutomatic Morphological
operators and
watershed
transformation
Automatic NoSingle step
Matarray17
NoAutomatic Fixed circleManual Yes Iterative
MAGIC10
NoAutomaticFixed/adaptive
circle,
seeded region
growing
AutomaticYesSingle step
MIAFYes Automatic Ad hoc
technique
?MISP?
Automatic NoSingle step
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4) 0430133
Page 4
2.2
Microarray image analysis can be affected by several errors
when input grids are slightly rotated. Despite this, only a
few authors have tried to address and solve this problem.
In Refs. 11 and 12, the two input images are filtered
according to the orientation matching transform ?OM?,
which is aimed at detecting the candidate points for the spot
centers. To compute the angles ? and ? between the grid
directions and the axis, the Radon transform ?RT? of the
images is filtered by the OM, whose peaks are analyzed.
The directions of the projection by integrating the space
variable s in the RT are used. The algorithm computes the
two main peaks of the function below:
???? =?
s
Rotation
R2?OM?I??s,??dS?.
?2?
It is important to notice that the OM transformation used in
Refs. 11 and 12 requires the minimum and maximum spot
radii as input parameters, which are usually hard to know in
real case.
In our algorithm, we compute the peaks of a function
f?I?, but dissimilarly from Refs. 11 and 12, the algorithm
does not require parameters. In Ref. 19, the rotation detec
tion is done after a preprocessing that estimates some glo
bal variables for gridding. Subarray rotation identification
is achieved by the examination of the intensity projection
profile along the x and yaxes of a blackandwhite binary
image obtained from previous steps. A subarray is identi
fied as a “rotated region” if the size of the block is greater
than the average subarray’s width and height. To detect if
the rotation is clockwise or counterclockwise, the rotation
directions are compared to the intensity sum of the top
onethird region with the bottom onethird region along the
horizontal and vertical axes in the rotated region. To calcu
late the rotation angle, the authors of Ref. 19 iteratively
rotate the region by a quarter degree until its projection
profile is matched to the normal one. The method proposed
in Ref. 19 is affected by two main problems:
1. If a subgrid is affected by some acquisition problems
such as wide areas with very high noise level or weak
spot signals, its profile will be different from others
also in absence of a rotation. In this case, the subgrid
may be identified as rotated and, as shown in Ref. 19,
the iterative refinement may enter in a loop, since at
each iteration the subgrid profile will be different
from others’.
2. If the problems indicated above are located in the
small regions of the subgrid used for this purpose or
those regions are empty ?no spot signal?, the results
of the detection may be wrong.
2.3
One of the main challenging problems in the context of
microarray image analysis is the gridding step. The grid
ding process assigns the coordinates to each microarray
spot. This phase may be carried out manually or automati
cally. Automatic addressing increases the speed of the
analysis, but few of the common available software offer
this option.
Gridding
We decided to consider in a separate section the gridding
process, because even if almost all tools provide some heu
ristic solution, a lot of advanced and reliable ad hoc solu
tions have to be mentioned in detail.
Some of the proposed methods require user intervention
for setting the grid anchor points, grid dimension in terms
of rows and columns, etc. A good gridding method must
hence be fully automatic and fast, simple, and adaptive
with respect to real microarray structure and fluctuations of
the parameters.
In Ref. 14, the spots are located by finding a rectangle
containing pixels of each spot and using it as a valid mask
for gridding. Two main steps are performed: First, the al
gorithm sums up the intensities across the pixels in each
row ?column?; next, it finds the local minima of the
summed intensities using a sliding window whose span is
approximately equal to the width of a typical spot. Al
though this method does not require human interaction,
some parameters have to be known in advance: the number
of spots in each row, the number of the spot column, and
the size of the sliding window. In any case, the final grid
ding results do not take into account local spot irregularities
or different spots’ sizes and shapes.
Local information approximating spot size and shape
with advanced segmentation strategies is indeed a crucial
step to derive reliable gridding information: Our proposed
solution tries to move from these considerations.
The gridding algorithm proposed in Ref. 18 is fully au
tomatic. In the first stage, the algorithm is applied to the
whole image in order to find the positions of the subgrid.
Then it is used again on each subgrid to find the position of
the spots. The first step computes the average intensities
row by row and column by column on the whole image. To
remove the noise, a lowpass filter is applied. Taking into
account the regularity of the structure of the microarray
image, it assumes that the distance between adjacent cells
containing the spots should be approximately equal. The
initial “guess” is obtained by finding the minima of the
average intensity: An iterative refinement adjusts the initial
guess. The initial guess may not form a regular grid, which
is instead obtained after the final refinement, aimed at re
moving extra lines due to noise contamination and to add
ing missing lines due to lowintensity rows and columns.
Spot borders are identified using adaptive circles. The
method assumes that the axes of grids are parallel to the
borders of the image and that no rotation of the grids has
occurred during the digitization process. In this case, only
regular grids without considering spot irregularity are ob
tained. Just to overcome such a problem, our approach is
based on statistic observations of subgrid parameters that
lead, in a first phase, to an orthogonal subgrid that is ap
proximately regular. Extra or missing lines are removed or
added by means of a singlestep correction algorithm that
uses the median distance between adjacent rows or columns
as a reference parameters rather than using the average dis
tance as in Ref. 18. A second phase allows us to obtain an
adaptive subgrids, in which spot centers are correctly ad
dressed, taking into account local variations and problems
such as spot merging. We do not assume that subgrid axes
are parallel to the border of the image, since MIRA per
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)0430134
Page 5
forms a preprocessing phase to restore grid rotation. We
identify the spot border using MISP, which performs an
adaptive shape segmentation.
In Ref. 11, a gridding algorithm using the Radon trans
formation ?RT? to compute the parameters of a regular grid
is proposed. By analyzing the RT peaks, some parameters
are properly estimated. In particular, the ?x0,y0? coordinates
of the upper left spot, the directional angles ?, ?, and the
grid spacing ?x, ?y are successively used to determinate
each grid point.
Such an algorithm has been tested on synthetic and real
microarray images where the original real position on the
grids is known in advance. In Ref. 12, such work has been
extended with some new considerations. The first step,
based on the radon transformation, is aimed at generating a
grid hypothesis, while the second step accounts for local
grid deformations. To refine the grid hypothesis, a Bayesian
approach is used, to maximize the posterior probability
?MAP estimate?. The observed datum is the input image, a
raw visual representation of an ideal grid with a well
defined organization. The MAP grid estimate of the most
likely grid gives the unobserved image. A further refine
ment is achieved by means of an iterative metaheuristic
approach.
In Ref. 19, a threestep algorithm is used to detect the
information related to the grid: preprocessing, rotation de
tection, and local gridding refinement. Some global param
eters are first estimated by a simple preprocessing heuristic.
For the rotation detection, the microarray image is itera
tively rotated by a quarter degree until its projection profile
closely matches to the normal one. The gridding refinement
uses the global values obtained in the preprocessing step to
have some guess about the location of each subarray. The
parameter estimation is simple mainly when applied on
subarray structures.
In Ref. 20, an automatic iterative algorithm is proposed.
The algorithm assumes that spot centers deviate from a
sequence of similarity transformations whose parameters
vary smoothly. Using this assumption, the authors can for
mulate the spot center gridding problem as a constrained
optimization problem combining a quantitative criterion
that measures gridding result correctness with some con
straints that reduce local parameter variation. The problem
is solved by analyzing the cause of the deviation of the spot
centers, assuming that spot center deviations can be mod
eled by the following parameters: scaling, rotation, and
translation. The mean squared error eeof all matched cen
ters is defined. Also, a smoothness constraint by minimiz
ing variation is introduced together with an error measure
esof the smoothness. The problem is solved by searching
the solutions that minimize a weight sum ee+?es, where ?
is a nonnegative parameter, by a numerical iterative algo
rithm. It assumes that each block in the analyzed microar
ray has the same rotation angle, both in the initial distortion
estimation and in a treebased outlier correction, so that a
subblock with a different rotation angle is considered an
“outlier” and relative parameters are consequently adjusted.
Each step is iteratively performed. Our approach is based
on the idea that every grid in a microarray may have a
different rotation angle, which occurs independently from
the neighboring grid. Our methods for rotation detection
and gridding use a singlestep algorithm, because it as
sumes that neighboring grid parameters are independent.
The gridding method proposed in Ref. 21 uses a scheme
that combines global and local segmentation mechanisms
for defining the boundaries of each microarray spot. It ini
tially creates global boundaries, using the middle point of
two successive peaks related to the sums of R and G inten
sities along the rows and columns of the microarray image.
In the next step, the global boundaries are refined as fol
lows. The horizontal ?vertical? final boundary between two
spots is refined by locating the minimum of the sum of the
rows ?columns? and taking into account only the area left
out the global boundary grid of these spots. Working di
rectly on pixel values, this approach may be affected by
perturbation induced by the presence of noise. A more ro
bust solution should be based on the binary guide mask
obtained by effective and accurate spot segmentation.
3 Preprocessing: A Microarray Image Rotation
Algorithm
In this section, we formalize MIRA, an algorithm based on
histogram analysis able to automatically detect and correct
rotation problems of microarray grids. The technique is de
signed for orthogonal grids, the most common type for mi
croarrays. It can be formalized as follows. Let I be a black
andwhite binary image. f:I→N is defined as
f?I? = max?h?I?? + max?v?I??,
?3?
where h?I? and v?I? are, respectively, the integral projec
tions profile22along the x and yaxes of I, obtained by
summing up the binary value of each pixel in I ?0 for black
and 1 for white? ?Fig. 2?d? and 2?e??. Let M be a binary
map ?Fig. 2?c?? of a microarray image ?Fig. 2?a?? that cap
tures where the spot pixels are approximately located in
both input channels. One way to obtain M is to partition
each original microarray channel into two classes ?using the
Kmeans algorithm23? and then combining the resulting bi
nary images by the logical OR.
Let M?be the map obtained by rotating M of ? radians.
Since f?M??=f?M??+k*?/2??, ?max=?/4. The estimate cor
rection angle ?*is defined as
?*= argmax
−?
4????
4
?f?M???.
?4?
In order to find the main direction of the grid, we just
consider the directions of the projections. This allows us to
select the direction having the maximum value of f?I? ?Fig.
2?b??, corresponding to the angle where the maximum num
ber of aligned spot centers is located. The pseudocode of
the algorithm is
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)0430135
Page 6
1.Input: a binary map I of input microarray;
2.
maxf=f?I?, ?*=0;
3.
For ?=from−?maxto ?maxdo
4.
I?=Rotate?I,??
5.
If f?I???maxfthen
6.
maxf=f?I??;
7.
?*=?;
M is hence affinetransformed by a rotation of angle ?*
around the image’s center ?Fig. 2?f?–2?h??. A simple bilin
ear interpolation is used to reconstruct the signal after ro
tation. We safely assume that both input channels have the
same rotation angle.
MIRA is based only on the value of f?I?, which depends
only on the entire grid profile, without comparison with
other profiles. This property makes the system reliable,
since an error in detecting a grid rotation will not affect the
others and there is no risk of entering a loop. A rotation test
is always done, since there is no way for MIRA to know a
priori if the grid profile shows no rotation.
Fig. 2 Main steps involved in the MIRA algorithm.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4) 0430136
Page 7
4
In this section, we briefly describe the main steps of the
image segmentation pipeline MISP.13The proposed process
is fully automatic. The technique processes each microarray
image to produce five semantic regions ?Fig. 3?:
Segmentation Pipeline
• background,
• local background,
• red channel foreground,
• green channel foreground,
• red channel and green channel foreground.
The pipeline can be ideally subdivided into two sequential
modules ?Fig. 4?:
• spotbackground separation ?Fig. 5?,
• foreground and local background identification ?Fig.
6?.
The SpotBackground separation module identifies the spot
signal pixels from the background. Using statistical region
merging ?SRM24? on each channel, it is possible to extract
the spot shape by making use of the local mean intensity
rather than the single pixel value intensity. Further process
ing is devoted to better distinguish involved signals by
making use of ad hoc ? LUT25and k means clustering.26
The shape is further refined at the intensity of edges by
taking into account the deviation of edge pixels from the
local mean. Two binary masks, GBin and RBin, are the
output of the SpotBackground separation module and be
come the input of the next module.
The second module, Foreground and Local Background
identification, identifies GBin and RBin with Red Mask
Foreground ?RMF? and Green Mask Foreground ?GMF?. It
also builds a Spot Guide Mask (SGM) as the logical OR of
these two maps.
Moreover, the set of pixels belonging to SGM but not to
RMF ?GMF? are said to be the internal background relative
to the red ?green? channel for the spot.
Let Grid Guide Mask ?GGM? be the minimum square
containing SGM. The difference between GGM and SGM
forms the RGBackMask. The local background relative to
Fig. 3 Microarray image semantic color region. Background ?black?,
local background ?blue?, red channel foreground ?red?, green chan
nel foreground ?green?, red channel and green channel foreground
?yellow?.
Fig. 4 MISP: microarray image segmentation pipeline. Cyan dashed line refers to the steps involved
in the spotbackground separation module introduced in Sec. 4. Green refers to the steps involved in
the foreground and local background identification module introduced in Section 4 ?see Ref. 13 for
more details?.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4) 0430137
Page 8
the red channel is obtained by augmenting RGBackMask
with the pixels belonging to the internal background rela
tive to the red channel. The local background relative to the
green channel is obtained similarly.
Figure 7 summarizes the overall process together with
the involved binary masks. Major details can be found in
Ref. 13.
We point out that the SGRIP module described in the
following section provides a more accurate detection of
GGM than the one presented in Ref. 13, which is not able
to deal with the neighboring spot merging problem or large,
noisy areas. A variety of quality measures and useful data
may be obtained by gathering information inside the differ
ent masks created so far. SGM is used to derive quality
measures for each spot ?e.g., spot area measure?. GGM is
used to assign coordinates to each spot. Other masks are
used to characterize the pixel belonging to foreground/
background/local background, to calculate intensity,3,27and
to extrapolate quality measures for each spot.4,5,17
5
The authors have proposed a simple gridding technique.13
That algorithm is supplanted by a more effective approach
that has been implemented into the new module called
SGRIP. The pipeline for SGRIP includes two phases ?Fig.
8?:
Statistical Gridding Pipeline
• Grid Finding—Correction,
• GGM Creation—Refinement.
In the first phase, Grid Finding approximates the spot cen
ters. It works on SGM by assuming a local homogeneous
background. The final output is obtained after a Correction
step to recover spot centers that have been missed so far.
The spot center prototypes are stored in an m?n matrix P,
where m and n are, respectively, the inferred number of
rows and columns in the array.
The second phase is able to produce a GGM, starting
from the data in matrix P. GGM Creation uses P and SGM
Fig. 5 Spot–background separation.
Fig. 6 Channel foreground and local background identification.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4) 0430138
Page 9
to create a first approximation: to each simple connected
component in SGM is assigned the minimum rectangular
region containing the component. The final step ?GGM Re
finement? separates spots erroneously merged with others.
We assume that SGRIP is performed on previously cor
rectly rotated images. In the following subsections, we de
scribe in greater detail the Grid Finding, Grid Correction,
GGM Creation, and Refinement steps.
5.1
Grid Finding detects the grid location and the number of
spot rows and columns in the grid. We assume that both
horizontal and vertical histograms of SGM have the typical
shape of an almost regularly spaced sequence of peaks
separated by a valley. In the following, the term “Expected
Values” refers to the expected value of a Gaussianlike dis
tribution. The typical shape of a single sequence can be
easily approximated by a Gaussianlike distribution ?e.g.,
doubly truncated.28?
The algorithm is:
GridFinding Algorithm
Input: a binary map SGM of input microarray;
1.Let Hh and Vh be, respectively, the horizontal and vertical
cumulative histogram of SGM;
2.Let MHh and MVh be, respectively, the mean of Hh and Vh;
3.
Let CHh=Hh−MHh;
4.
Let CVh=Vh−MVh;
5.
Let HGSGMand VGSGMbe the family of peaks, respectively, in
CHh and CVh;
6.
Let EHGSGM=?Expected Values?d?:d?HGSGM?;
7.
Let EVGSGM=?Expected Values?d?:d?VGSGM?;
Output: X, the set of couple ?i,j?, where i?EHGSGMand
j?EVGSGM.
Step 3 in the algorithm separates each peak in the se
quence from the others. A side effect of steps 3 and 4 is that
if a row or column contains just a few spots, the corre
sponding peak could be lost ?Fig. 9?; this leads to the need
for the next correction block.
Fig. 7 MISP: software prototype architecture. Involved details are described in Ref. 13.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)0430139
Page 10
5.2
A Correction algorithm is used to reconstruct the missed
spots and to correctly infer the number of spot rows and
columns. This algorithm is applied separately for row and
column coordinates. The algorithm applies the simple idea
that if a row ?column? has been missed, then the distance
between two successive peaks has to become larger than
the median gap between. We adopt the following simple
rule to identify missed spots: If the distance between one
peak and the successive one is greater than K?m, with m
equal to the median of all the gaps between the spots in the
Grid Correction
sequence, then there is a missing row ?column?. The ratio
nale of the above rule is the follows. Ideally, each microar
ray grid is composed of regularly distributed rows and col
umnsand spotsareperfectly
horizontal and vertical cumulative histograms, in an ideal
case one can observe a typical pattern of an almost regu
larly spaced sequence of peaks separated by a valley. Each
column contains the same number of spots and is equidis
tant by previous and successive columns ??x=constant?;
hence, each column histogram, i=1,...,#columns, may be
represented withadoubly
distribution.28In reality, many uncontrollable factors are
involved in shaping the signal distribution for microarray
experiments, so each doubletruncated Gaussian distribu
tion can be approximated by a Gaussian distribution
Ni??i,??, where ?i=?1+?i−1???x. Each Gaussian can be
obtained by shifting the previous mean ? by ?x and using
the same variance ?.
The parameters ? and ? are not constant, due to mi
croarray problems; however, when a grid has a large num
ber of spots, it is possible to approximate ?x with the me
dian m of the distances between columns. Hence, ?i=?1
+?i−1??m.
We observe that a random variable ??N??,?? has
P??−3?????+3???1; taking into account the above
consideration, we can approximate this value with
circular.Considering
truncated Gaussian
P?? −K ? m
2
? ? ? ? +K ? m
2?? 1.
The Grid Correction algorithm uses the parameter K
?line 6? below with the following meaning. If in the interval
??−K?m/2,?+K?m/2?computed for each couple of
Fig. 8 SGRIP pipeline: The cyan line refers to the Grid FindingCorrection phase introduced in Section
5 and detailed in Sections 5.1 and 5.2, while the Green line refers to the GGM CreationRefinement
phase introduced in Section 5 and detailed in Sections 5.3 and 5.4.
Fig. 9 ?a? An example of the horizontal cumulative histogram of
SGM ?Hh?. Red dashed line indicates the mean value ?MHh?. In this
example, one of the Gaussians is below the mean value. ?b? The
corresponding family of the Gaussian HGSGM. The expectation
value for each Gaussian in HGSGMcorresponds to the central point
of the spots column ?blue lines?. The purple line corresponds to the
one unlocalized column that will be referred to successively.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4)04301310
Page 11
columns found by the Grid Finding algorithm, no column
is inside the interval, then with high probability a column
will be reconstructed using the median value m. This is
justified by the above considerations. The algorithm can be
synthetically sketched as follows:
Input: an array Coord of sorted coordinates ?e.g., EVGSGM?
1.
For each couple ? of successive coordinates in Coord
2.
Calculate dij???=Coordi+1−Coordi
3.
Insert dij??? in a vector D with position i;
4.
i=0;
5.
While i ??D?
6.
If Di?K*median?D?
7.
?Coord?=?Coord?+1;
8.
Insert a new element in Coord between Coordiand
Coordi+1with value Coordi+median?D?;
9.
?D?=?D?+1;
10.
Insert a new element in D between Diand Di+1with value
Dimedian?D?;
11.
Di=median?D?;
12.
i=i+1;
The output is a new matrix P of pairs mapping the final
prototype spot centers ?Figs. 9 and 10?. In our experiments,
the K parameter has been estimated to be equal to 1.8 using
the leastsquares method on a training microarray data set.
The position of the missing row ?column? is restored by
using the median as an ad hoc guess. We use the median in
order to reconstruct the rows ?columns?, because it is more
robust than the mean value, which is typically unstable
when the number of missing rows ?columns? is large.
Moreover, using the median rather than the mean, we ob
tain a singlestep method to estimate the positions of miss
ing spots; we do not need to reestimate the guess after
inserting a missed spot. Analog considerations can be done
for the GGM refinement algorithm reported in Section 5.4.
5.3
The Grid Guide Mask Creation algorithm starts with a ma
trix P of pixel coordinates. These coordinates are used as
GGM Creation
starting points to assign to each detected spot the minimum
rectangular region containing the spot itself computed over
the SGM ?Fig. 11?.
The Grid Guide Mask Creation algorithm can be de
scribed as follows:
Input: an m?n matrix P of pixel coordinates and SGM;
1.
For each row i of P
2.
For each column j of P
3.
Let ?x,y? be the coordinates in P?i,j?
4.Initialize count with the number of the neighboring signal
pixels of P?i,j? having zero value;
5.
If SGM?x,y??0, then
6.
push in a stack S the coordinates ?x,y?;
7.
Initialize the corners ?minx,maxx?, ?miny,maxy? of the
smallest rectangular region related to P?i,j? at value
zero;
8.
while nonEmpty?S?
9.
tmp=pop?S?;
10.
count=count+1;
11.
if ?minx,maxx?, ?miny,maxy?,changes with respect to
tmp, update the corner information;
12.
Mark the coordinates ?tmpx,tmpy? as controlled;
13.Let
R=?p:p is a pixel such as the distance from P?i,j? is
less than Range, p is uncontrolled and
SGM?px,py??0?
14.
If R?Ø, then mark as uncontrolled all p?R and
push them in S;
15.
If count?1
16. GGM?i,j?=??minx,miny?,?maxx,maxy?, found’?
17.
else GGM?i,j?=??x,y?,?x,y?,‘not found’?;
For each seed in P?i,j?, we create a record in position
?i,j? in the GGM matrix containing the coordinates of the
Fig. 10 Spot centers ?a? before and ?b? after the Grid Refinement
step. In blue are the inferred positions for an unlocalized spot col
umn in the Grid Finding step.
Fig. 11 GGM creation: ?a? SGM input obtained by the segmentation
pipeline and ?b? the corresponding output after the GGM creation
step.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)04301311
Page 12
top left and the bottom right corners of the minimum rect
angular region that contains the spot guide, and the status
of the corresponding spot ?“found” or “not found”?. The
search of the minimum region containing the spot is carried
out only within a square area of side equal to 2?Range
+1. This is useful to include spots that exhibit doughnut
shapes or that are split into two or more connected compo
nents. Note that when the GGM rectangle stops growing,
there are only two mutually exclusive motivations:
1. no more foreground spot pixels are outside the rect
angle ?this includes the special case of spot absence?;
2 the rectangle is merged with some previously pro
cessed rectangle.
This means that the rectangle hull of two or more spots is
not disjoint.
Observe now that since spot center coordinates are pro
cessed in lefttoright, toptobottom fashion, the merging
of the rectangle relative to the spot with the center ?xn,yn?
may happen only with rectangles relative to spots with cen
ter ?xf,yf? and the following holds:
??xf? xn?AND?yf? yn??OR??xf? xn?AND?yf? yn??.
This property is used ahead to restore spots erroneously
merged with others.
?5?
5.4
The final GGM is obtained by a refinement algorithm that is
designed to solve the overmerging problem. Observe that
one rectangle may be merged only with a rectangle located
W, N, NW, or NE of it. The refinement strategy is based on
the following claim: If a spot snis marked as “not found,”
but a foreground region that can be assigned to snexists,
then the snregion has been assigned to another spot sf,
where
sf?C, with
C=Rw_nw?sn??Rn_ne?sn?,
Rw_nw?sn? and Rn_ne?sn? are the two regions ?Fig. 12?. These
two regions are in correspondence with the two disjoint
clauses in Eq. ?5?. Following the same order ?left to right
and top to bottom?, we can restrict C to the region in which
GGM Refinement
where
the spots of the row preceding snor the spot located at the
left of sn. In the following pseudocode, this region is de
noted by Cr?sn? for northwest merging case. The other
three cases are similar.
Input: SGM, GGM and spot status flag
1.
Let Wset=?w:w is value of width of a region in GGM with
status=“found”?;
2.
Let Hset=?h:h is value of height of a region in GGM with
status=“found”?;
3. Let Wmedian and Hmedian be the medians of Wset values
and Hset values, respectively;
4.Let Wvar and Hvar be the variances of Wset and Hset values,
respectively;
5.
∀ spot sn:status?sn?=“not found” do
6.
If ∃sf?Cr?sn?:sn_area?sf_area;
7.
If sfis a northwest neighbor of sn
8.
Let Hsfand Vsf, respectively be the horizontal and
vertical cumulative histogram of the sfregion
9.
Let p be the point in SGM in which Hsfand Vsfassume
minimum values
10.
If??px−sfx? ?K?Wvar/2?and??py−sfy? ?K??Hvar/2?
11.
Split sfregion into four parts considering p, top left
and bottom right corners
12.
Update sfregion with NW subregion obtained in
step 11
13.
Assign to snthe SE sub region obtained in step 11
14.Else
15.
assign to sfand sntwo regions whose dimensions
are Wmedian·H median center to sfand sn
16.
If sf? other case
17.process is similar way for northwest case
The values K and K? are obtained by variance analysis
on Hset and Vset of different microarrays. In our experi
ment, K=1.2 and K?=1.3.
Figure 13 shows the final output of the overall SGRIP
pipeline applied to the input image SGM reported in Fig.
11.
6
One of the main drawbacks to microarray imaging tools
and algorithms is the difficult task of evaluating the real
performances of each involved technique. Various quality
measures have been proposed in the literature.4,5,17Here we
propose a new quality index defined as follows:
Quality Measures
qindex?IDSpot? =qcom2R?IDSpot? + qcom2G?IDSpot?
2
,
?6?
where
Fig. 12 Regions involved in GGM refinement. The blue circle indi
cates the snspot. The white line encloses the Rn_ne?sn? region, while
the magenta line encloses the Rw_nw?sn? region.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic ImagingOct–Dec 2007/Vol. 16(4)04301312
Page 13
qcom2= ?qsig−noise? qbkg1? qbkg2?1/3.
qindexis the mean of qcom2computed for each channel. qcom2
is partially derived from the combined quality index ?see
Ref. 4 for details?:
?7?
qcom= ?qsize? qsig−noise? qbkg1? qbkg2?1/4? qsat.
The Combined quality index ?qcom? encloses the size of the
spot ?qsize?, the signaltonoiseratio ?qsig−noise?, the local
background variability ?qbkg1?, excessively high local back
ground ?qbkg2?, and saturation in photo intensity detection
?qsat?. qsizeassesses the irregularities of spot size, qsig−noise
is a measure for the signaltonoise ratio, qbkg1quantifies
the variability in local background, qbkg2is the level of the
local background, and qsatindicates if the percentage of the
saturated pixel is less than 10% for each spot.
The original combined quality index qcom has been
modified to be used for software comparison rather than
only as a measure for flag checking associated with each
?8?
spot. Preliminary results demonstrate that the measures qsize
and qsatinvolved in qcommay produce errors in evaluation
and comparison of different segmentation methodology be
cause of the following considerations:
• qsizeis related to the regularity of the spot with respect
to an ideal spot whose dimension equals the mean of
the spots in the microarray. Fixed and variablecircle
segmentation produce foreground masks in which the
background is included or the irregular foreground is
discarded. In the “fixed” case, each spot in the mask
has the same dimension, hence, the qsizefor each spot
in the microarray. Adaptive techniques instead pro
duce irregular foreground masks, and the background
is discarded; hence, qsize?1 even if the method is
more reliable than a “fixed” circle. qsizeis useful for
comparison of segmentation techniques that use the
same methodology ?fixed or adaptive?.
• Analogously, qsatpenalizes the adaptive techniques
even if they are better than fixed by or variablecircle
techniques. Example: Let s be a spot with 4 saturated
pixels. Suppose that the spot is segmented using the
fixedcircle methodology with a circle area of 50 pix
els in which 10 pixels belong to the background. In
this case, qsat=1 because the saturated pixels are less
than 10% ?5 pixels? of the spot area. Suppose that the
same spot is segmented correctly using an adaptive
technique. In this case, the area of the spot is 40 pixels
and qsat=0. Hence, qsatis not useful for comparison of
adaptive segmentation techniques.
Fig. 14 Microarray image from the Stanford microarray database,
ExptID 15739.34
Fig. 15 Examples of microarray used to test MIRA.
Fig. 13 GGM refinement output.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)04301313
Page 14
The other measures, qsig−noise,qbg1,qbg2, are calculated
for each channel and report the goodness in terms of
foregroundbackground separation.
7
The first image data set used for testing each involved mod
ule in MIAF refers to the “Whole Yeast Genome” microar
rays, freely downloadable at the MAGIC Website.29Mi
croarray data denoted by Mi, i=1,...,18, addressed to
specific problems ?rotation, segmentation, gridding?, have
been selected in order to exemplify the MIAF performance.
Comparison with the output obtained with Scanalyze9has
been carried out. We believe that the main strength of our
adaptive approach is revealed when it is compared with
techniques based on circle segmentation. Only by using
adaptive segmentation strategies the real amount of gene
expression for each spot be effectively managed. For fair
ness of comparison, hence, care has been given in order to
use Scanalyze with the best possible choice of userselected
parameters. In particular, parameters have been tuned to
Experiments and Discussion
obtain optimal quality measures. MIAF is able to perform
quality measurement also on an imported Scanalyze grid:
This makes comparison easier. MISP segmentation perfor
mances have been compared with Scanalyze also using a
calibration data set30accessible from the U.S. National
Cancer Institute, generated by Incyte Genomics for the pur
pose of assessing quality assurance parameters within mi
croarray experiments. To test the gridding performed by
MIAF ?SGRIP? more accurately, we also consider the mi
croarray data set used in Ref. 31, related to the whole ge
nome of Saccharomyces cerevisiae and freely download
able at Pat Brown’s lab homepage.32Further testing has
been carried out referring to the collection of microarray
images available in the Stanford Microarray Database
?SMD?.33Using SMD, researchers are able to store, re
trieve, display, and analyze the complete raw data produced
with one of the interactive image processing platforms
compatible with SMD. In particular, we refer to the experi
ments ExpID 1573934?Fig. 14? and ExpID 51509.35We
compare our results with the results obtained in Ref. 12,
where Spot6has been used on the same data set. We apply
Table 2 Experimental rotation assessment of MIRA on M1, M2, M3, M4, M5, and M6.
MicroarrayRowsColumns Absent Spots in %
Real Rotation Angle
? ?in degrees?
Angle Estimate by MIRAError
M1
430
−10° −10°
0°
M2
2412 6.25
−38° −39°
1°
M3
24 12 6.251° 1°0°
M4
23 2414.714° 13°1°
M5
42 400.71
−44° −44°
0°
M6
4040 1.441.2°1°0.2°
Fig. 16 Results obtained using different methods of segmentation on the microarray M7. The plot of
the corresponding qindexrelative to different segmentation methods. In ?a? the input microarray is
reported. In ?b? the segmentation obtained by MISP is shown. ?c? and ?d? are obtained using
Scanalyze.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)04301314
Page 15
the proposed pipeline algorithms to each inner grid.
7.1
To test MIRA, we use a data set in which each microarray
has been previously rotated with a ?known? global rotation
angle. More precisely, microarrays have been manually ro
tated by angles in the range ?−44°,+44°? ?Fig. 15?. Some
results obtained for microarray M1−M6are reported in
Table 2; they confirm the good performances of the pro
posed technique ?error mean: 0.36; error std: 0.49? in case
of both large and small rotations. The method is not sensi
tive to the number of involved spots; however, for grids
having thousands of spots, the MIRA performances benefit
from the major statistical robustness of h?I? and v?I?.
Rotation Tests
7.2
To evaluate MISP algorithm performance, we have per
formed accurate tests for both visual and numerical assess
ment. The tests are performed taking into account 114 dif
ferent spots that can be classified as follows:
Segmentation Tests
1. 42 spots with good signal intensity and a clear circu
larity shape;
2. 34 spots with irregular shape and good signal inten
sity;
3. 38 spots with low signal intensity and shape
variability.
Figures 16–20 show the input microarray and the results
obtained using different methods of segmentation. For each
Fig. 17 Results obtained using different methods of segmentation on the microarray M8. The absent
spots are correctly identified by our processing pipeline: The corresponding qindexis equal to zero.
Fig. 18 Results obtained using different methods of segmentation on the microarray M9. The plot
reports the corresponding qindexrelative to different segmentation methods. ?c? and ?d? are obtained
using Scanalyze. Our solution ?b? is able to outperform in almost all cases.
Battiato et al.: Adaptive techniques for microarray image analysis…
Journal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)04301315
View other sources
Hide other sources
 Available from Sebastiano Battiato · May 22, 2014
 Available from psu.edu