Page 1

1

Invariant Delineation of Nuclear Architecture in

Glioblastoma Multiforme for Clinical and

Molecular Association

Hang Chang∗, Member, IEEE, Ju Han, Member, IEEE, Alexander Borowsky, Leandro Loss, Member, IEEE, Joe

W. Gray, Paul T. Spellman and Bahram Parvin∗, Senior Member, IEEE

Abstract—Automated analysis of whole mount tissue sections

can provide insights into tumor subtypes and the underlying

molecular basis of neoplasm. However, since tumor sections

are collected from different laboratories, inherent technical and

biological variations impede analysis for very large datasets

such as The Cancer Genome Atlas (TCGA). Our objective is

to characterize tumor histopathology, through the delineation

of the nuclear regions, from hematoxylin and eosin (H&E)

stained tissue sections. Such a representation can then be mined

for intrinsic subtypes across a large dataset for prediction

and molecular association. Furthermore, nuclear segmentation

is formulated within a multi-reference graph framework with

geodesic constraints, which enables computation of multidimen-

sional representations, on a cell-by-cell basis, for functional

enrichment and bioinformatics analysis. Here, we present a

novel method, Multi-Reference Graph Cut (MRGC), for nuclear

segmentation that overcomes technical variations associated with

sample preparation by incorporating prior knowledge from man-

ually annotated reference images and local image features. The

proposed approach has been validated on manually annotated

samples and then applied to a dataset of 377 Glioblastoma

Multiforme (GBM) whole slide images from 146 patients. For

the GBM cohort, multidimensional representation of the nuclear

features and their organization have identified (i) statistically

significant subtypes based on several morphometric indices, (ii)

whether each subtype can be predictive or not, and (iii) that the

molecular correlates of predictive subtypes are consistent with

the literature.

Data and intermediaries for a number of tumor types (GBM,

low grade glial, and kidney renal clear carcinoma) are available

at: http://tcga.lbl.gov for correlation with TCGA molecular

data. The website also provides an interface for panning and

zooming of whole mount tissue sections with/without overlaid

segmentation results for quality control.

Index Terms—Nuclear Segmentation, Tumor Histopathology,

Subtyping, Molecular Pathology

Copyright (c) 2010 IEEE. Personal use of this material is permitted.

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

Asterisk indicates corresponding authors.

Hang Chang∗, Ju Han, Leandro Loss and Bahram Parvin∗are with

Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley,

CA, 94720 U.S.A ( email: hchang@lbl.gov; jhan@lbl.gov; laloss@lbl.gov;

b parvin@lbl.gov )

Joe W. Gray and Paul T. Spellman are with Center for Spatial Systems

Biomedicine, Oregon Health Sciences University, Portland, Oregon, 97239

U.S.A (email: grayjo@ohsu.edu;spellmap@ohsu.edu)

Alexander Borowsky is with Center for Comparative Medicine, Uni-

versity ofCalifornia, Davis,California,

borowsky@ucdavis.edu)

This work was supported by NIH U24 CA1437991 carried out at Lawrence

Berkeley National Laboratory under Contract No. DE-AC02-05CH11231.

95616U.S.A (email:ad-

I. INTRODUCTION

Our main motivation for quantifying morphometric compo-

sition from histology sections is to gain insight into cellular

morphology, organization, and sample tumor heterogeneity in

a large cohort. In tumor sections, robust representation and

classification can identify mitotic cells, cellular aneuploidy,

and autoimmune responses. More importantly, if tissue mor-

phology and architecture can be quantified on a very large

scale dataset, then it will pave the way for constructing

databases that are prognostic, the same way that genome-wide

array technologies have identified molecular subtypes and

predictive markers. Genome-wide molecular characterization

(e.g., transcriptome analysis) has the advantage of standardized

techniques for data analysis and pathway enrichment, which

can enable hypothesis generation for the underlying mech-

anisms. However, array-based analysis (i) can only provide

an average measurement of the tissue biopsy, (ii) can be

expensive, (iii) can hide occurrences of rare events, and (iv)

lacks the clarity for translating molecular signature into a

phenotypic signature. Though nuclear morphology and con-

text are difficult to compute as a result of intrinsic cellular

characteristic and technical variations, histology sections can

offer insights into tumor architecture and heterogeneity (e.g.,

mixed populations), in addition to, rare events. Moreover, in

the presence of a very large dataset, phenotypic signatures

can be used to identify intrinsic subtypes within a specific

tumor bank through unsupervised clustering. This facet is

orthogonal to histological grading, where tumor sections are

classified against known grades. The tissue sections are often

visualized with hematoxylin and eosin stains, which label

DNA content (e.g., nuclei) and protein contents, respectively,

in various shades of color. Even though there are inter- and

intra- observer variations [1], a trained pathologist can charac-

terize the rich content, such as the various cell types, cellular

organization, cell state and health, and cellular secretion. If

hematoxylin and eosin (H&E) stained tissue sections can be

quantified in terms of cell type (e.g., epithelial, stromal), tumor

subtype, and histopathological descriptors (e.g., necrotic rate,

nuclear size and shape), then a richer description can be linked

with genomic information for improved diagnosis and therapy.

This is the main benefit of histological imaging since it can

capture tumor architecture.

Ultimately, our goal is to mine a large cohort of tumor data

in order to identify morphometric indices (e.g., nuclear size)

Page 2

2

that have prognostic and/or predictive subtypes. The Cancer

Genome Atlas (TCGA) offers such a collection; however, the

main issue with processing a large cohort, is the inherent

variations as a result of (i) the sample preparation protocols

(e.g., fixation, staining), practiced by different laboratories,

and (ii) the intrinsic tumor architecture (e.g., cell type, cell

state). For example, with respect to heterogeneity in the tumor

architecture, the nuclear color in the RGB space found in

one tissue section may be similar to the cytoplasmic color

in another tissue section. Simultaneously, the nuclear color

intensity (e.g., chromatin content) can vary within a whole

slide image. Therefore, image analysis should be tolerant and

robust, with respect to variations in sample preparation and

tumor architecture, within the entire slide image and across

the tumor cohort.

Stained whole mount tissue sections are scanned at either at

20X or 40X, which results in larger images in the order of 40k-

by-40k pixels or higher. Each image is partitioned into blocks

of 1k-by-1k pixels for processing, and cells at the borders of

each block are excluded during the processing. The details of

the computational pipeline can be found in our earlier paper

[2]. Our approach evolved from our observation that simple

color decomposition and thresholding misses or over-estimates

some of the nuclei in the image, i.e., nuclei with low chromatin

contents are excluded. Further complications ensue as a result

of diversity in nuclear size and shape (e.g., the classic scale

problem).

Fig. 1.

tissue sections.

Work flow in Nuclear Segmentation for a cohort of whole mount

The general approach is shown in Figure 1, where the

primary novelty is in the image-based modeling of inherent

ambiguities that are associated with technical variations and

biological heterogeneity. Image-based modeling captures prior

knowledge from a diverse set of annotated images (e.g., a

dictionary) needed in order to model the foreground and back-

ground representations. Each annotated image is independent

of other images and signifies one facet (e.g., color space,

nuclear shape and size) of the diversity within the cohort.

Moreover, each image is represented in the feature-space

as the Gaussian Mixture Model (GMM) of the Laplacian

of Gaussian (LoG) and RGB responses. Collectively, the

reference dictionary of annotated images provides the means

for color normalization and for capturing global statistics

for segmenting test images. The computed global statistics

can then be coupled, through a graph cut formulation, with

the intrinsic local image statistics and spatial continuity for

binarization. Having segmented an input test image, each

segmented foreground region is subsequently validated for

nuclear shape. If needed, it is decomposed through geometric

reasoning. A secondary novelty is in the details of the com-

putational pipeline. For example, we introduce the concept

of (i) “color map normalization” for registering a test image

against each of the images in the reference library, and (ii)

“blue ratio image” for mapping RGB images into the gray

space; thus, LoG responses can be computed efficiently in one

channel. All important free parameters are selected through

cross-validation. Thus far, close to 1000 whole slide images

have been processed, and the data has been made publicly

available through our website at http://tcga.lbl.gov.In addition,

segmentation results, from the whole mount tissue sections, are

available for quality control through a web-based zoomable

interface.

Essentially, nuclear segmentation provides the basis for

morphometric representation on a cell-by-cell basis. As a

result, tumor histology can be represented as a meaningful

data matrix, where well-known bioinformatics and statistical

tools can be readily applied for hypotheses generation. For

example, a large cohort facilitates tumor subtyping based on

computed morphometric features. Each subtype can then be (i)

tested for its prognostic value, and (ii) utilized for identifying

molecular basis of each subtype for hypothesis generation. In

the case of GBM, prognostic and/or predictive subtypes have

also been posted on our Web site.

Organization of this paper is as follows: Section II reviews

previous research with a focus on quantitative representation of

the H&E sections for translational medicine. Sections III and

IV describes the details of the image-based modeling for nu-

clear segmentation and experimental validation, respectively.

Section V examines one application of nuclear segmentation

of morphometric subtyping and molecular association for

hypothesis generation. Lastly, section VI concludes the paper.

II. REVIEW OF PREVIOUS WORK

Several excellent reviews for the analysis of histology

sections can be found in [3], [4]. From our perspective, four

distinct works have defined the trends in tissue histology analy-

sis: (i) one group of researchers proposednuclear segmentation

and organization for tumor grading and/or prediction of tumor

recurrence [5], [6], [7], [8]. (ii) A second group of researchers

focused on patch level analysis (e.g., small regions) [9], [10],

[11], using color and texture features, for tumor representa-

tion. (iii) A third group focused on block-level analysis to

distinguish different states of tissue development using cell-

graph representation [12], [13]. (iv) Finally, a fourth group

has suggested detection and representation of the auto-immune

response as a prognostic tool in cancer [14]. In contrast

to previous research, our strategy is based on processing a

large cohort of tumors, to compute morphometric subtypes,

Page 3

3

and to examine whether computed subtypes are predictive of

outcome. Since tumor histology is characterized in terms of

nuclear and cellular features, a more detailed review of nuclear

segmentation strategies follows.

The main barriers in nuclear segmentation are technical

variations (e.g., fixation) and biological heterogeneity (e.g.,

cell type). These factors are visible in TCGA dataset. Present

techniques have focused on adaptive thresholding followed

by morphological operators [15], [16]; fuzzy clustering [17],

[18]; level set method using gradient information [14], [19];

color separation followed by optimum thresholding and learn-

ing [20], [21]; hybrid color and texture analysis followed by

learning and unsupervised clustering [6]; and representation of

nuclei organization in tissues [22], [23] that is computed from

either interactive segmentation or a combination of feature

detector. Some applications combine the above techniques;

Several examples are given below. In [24], iterative radial vot-

ing [25] was used to estimate seeds for partitioning perceptual

boundaries between neighboring nuclei. Subsequently, seeds

were used to segment each nucleus through the application

of multiphase level sets [26], [27]. In [28], the input image

was initially binarized into foreground and background regions

with a graph cut framework, the seeds were then selected

from a binarized image using a constrained multi-scale LoG

filter, with the combined results being refined using a second

iteration of the graph cut. Similarly, in

image was first normalized through histogram equalization,

and then binarized based on color-texture extracted from

the most discriminant color space. This was followed by an

iterative operation to split touching nuclei based on concave-

points and radial-symmetry. In their experiment, they had

21 images where 5 of them were annotated. Nuclei, in all

images, had similar size with high chromaticity. Recently, a

spatially constrained expectation maximization algorithm [30]

was demonstrated to be robust to “color nonstandardness” in

histological sections with color being represented in the HSV

space. However, our analysis of the GBM cohort indicates

that strict incorporation of color and spatial information will

not be sufficient as demonstrated in Section IV B (MRGC

vs MRGC-CF). A more related work, described in [31], was

based on a voting system that uses multiple classifiers built

from different reference images; we will refer to this method

as MCV, for short, in the rest of the paper. Compared to the

previous approaches, MCV provides a better way to handle

the variation among different batches. However, due to the

lack of smoothness constraints and local statistical informa-

tion, the classification results can be noisy and erroneous, as

demonstrated in Figure 8. Some of these concepts have also

been utilized in our earlier paper [2], but the results posted

on our website are for the current implementation outlined in

this paper.

In summary, the main limitations of the above techniques

are that they are often applied to a small dataset that originate

from a single laboratory, ignore technical variations that are

manifested in both nuclear and background signals, and are in-

sensitive to cellular heterogeneity (e.g., variation in chromatin

contents). Our goal is to address these issues by processing

whole mount tissue sections, from multiple laboratories, to

[29], the input

construct a large database of morphometric features, and to

enable subtyping and genomic association.

III. APPROACH

Details of the proposed approach are shown in Figure 2,

which leverages several key observations for segmenting nu-

clear regions: (i) global variations across a large cohort of

tissue sections can be captured by a representative set of

reference images, (ii) local variations within an image can

be captured by local foreground(nuclei)/background samples

detected by LoG filter, and (iii) color normalization, against

a reference image, reduces variations in image statistics and

batch effects between a test and a reference image. These

concepts are integrated within a graph cut framework to

delineate nuclei or clumps of nuclei from the background.

Having performed foreground and background segmentation,

we then partitioned potential clumps of nuclei through geomet-

ric reasoning. In the rest of this section, we summarize (a) the

representation of prior models from a diverse set of reference

images, (b) the methodology for color normalization, (c) an

effective approach for color transformation for dimensionality

reduction, (d) the details of feature extraction from each

test image, (e) the multi-reference graph cut formalism for

nuclei/background separation, and (f) the partitioning of a

clump of nuclei into individual nucleus.

A. Construction and Representation of Priors

The purpose of this step is to capture the global variations

for an entire cohort from a reference library. For bioinformatics

analysis, the target dataset consists of 377 individual tissue

sections, and a representative of N (N = 20) reference images

of 1k-by-1k pixels at 20X have been selected. Each reference

image is selected to be an exemplar of tumor phenotypes

based on staining and morphometric properties. Therefore, it

is reasonable to suggest that each reference image has its own

unique feature space, in terms of RGB and LoG responses,

which leads to 2N feature spaces for all reference images:

{F1

RGB1,F2

RGB2,···,FN

RGBN,FN+1

LoG1,FN+2

LoG2,···,F2N

LoGN} (1)

where Fi

feature space for the ithreference image, 1 ≤ i ≤ N.

Subsequently, each reference image is hand segmented and

processed with a LoG filter (please refer to Section III-C

for the details on our LoG integration), at a single scale,

followed by the collection of foreground (nuclei) and back-

ground statistics in both the RGB space and LoG response.

Our experience indicates that even within a single reference

image, there could be distinct modes in terms of RGB color

and nuclear size. One way to capture these heterogeneities

is to represent foreground and background distributions with

GMM. Hence, the conditional probability for pixel p, with

feature fk(p) in the kth(k ∈ [1,2N]) feature space, belonging

to Nuclei(l = 1)/Background(l = 0) can be expressed as a

mixture with D component densities:

RGBiand FN+i

LoGiare RGB feature space and LoG

GMMk

l(p) =

D

?

j=1

˜ p(fk(p)|j)P(j)

(2)

Page 4

4

Fig. 2.Steps in Nuclear Segmentation.

where a mixing parameter P(j) corresponds to the weight of

component j and?D

is a Gaussian with mean µ and covariance matrix Σ in the

corresponding feature space (e.g., 3-by-3 and 1-by-1 matrices

in RGB and single scale LoG spaces, respectively):

j=1P(j) = 1. Each mixture component

˜ p(fk(p)|j)=

1

(2π)

exp?−1

3

2|Σ|

1

2

j

(3)

·

2(fk(p) − µj)TΣ−1

j(fk(p) − µj)?

P(j) and (µj, Σj) for ˜ p(Cp|j) were estimated by expectation

maximization (EM) algorithm [32].

B. Color Normalization

The purpose of color normalization is to close the gap,

in color space, between an input test image and a reference

image. As a result, the prior models, constructed from each

reference image, can be better utilized. We evaluated a number

of color normalization methods and chose the color map nor-

malization described in [31] for its effectiveness in handling

histological data. Let

• input image I and reference image Q have KI and KQ

unique color triplets in terms of (R,G,B), respectively;

• RI/Q

C

be a monotonic function, which maps the color

channel intensity, C ∈ {R,G,B}, from Image I/Q to a

rank that is in the range [0,KI)/[0,KQ);

• (rp,gp,bp) be the color of pixel p, in image I, and

(RI

channel intensity; and

• the color channel intensity values rref, gref and bref,

from image Q, have ranks:

?RI

R(rp),RI

G(gp),RI

B(bp)) be the ranks for each color

RQ

R(rref)=

R(rp)

KI

G(gp)

KI

B(bp)

KI

× KQ+1

2?

RQ

G(gref)=?RI

× KQ+1

2?

RQ

B(bref)=?RI

× KQ+1

2?

As a result of color map normalization, the color for pixel

p: (rp,gp,bp), will be normalized as (rref,gref,bref). In

contrast to standard quantile normalization, which utilizes all

pixels in the image, color map normalization is based on the

unique color in the image, thereby, excluding the frequency of

any color. Our experience suggests that this method is quite

Page 5

5

(a)(b) (c)

Fig. 3.

by [33]; (c) Blue ratio images.

(a) Two diverse pinhole of tumor signatures; (b) Decompositions

powerful for normalizing histology sections, since the color

frequencies vary widely as a result of technical variations and

tumor heterogeneity. Examples of color map normalization can

be found in Figure 2.

C. Color transformation

In order to reduce the computational complexities for in-

tegrating the LoG responses, the RGB space is transformed

into a gray level image to accentuate the nuclear dye. While

several techniques for color decomposition have been pro-

posed [34], [33], they are either too time-consuming or do

not yield favorable outcomes. The color transformation policy

needs to enhance the nuclear stain while attenuating the

background stain. One way to realize such a transformation

is by: BR(x,y) =

1+R(x,y)+G(x,y)×

where B(x,y), R(x,y) and G(x,y) are the blue, red and green

intensities at position (x,y). We refer to this transformation as

the blue ratio image in the rest of this manuscript. In this for-

mulation, the first and second terms accentuate and attenuate

nuclear and background signals, respectively. Subsequently,

the LoG responses are always computed at a single scale from

the blue ratio image. Figure 3 demonstrates that the blue ratio

image method has an improved performance compared to an

alternative method [33].

100∗B(x,y)

256

1+B(x,y)+R(x,y)+G(x,y),

D. Feature Extraction

Our approach integrates both color and scale information,

where the scale is encoded by the LoG response.

1) Normalization of the input test image against every

reference image, as described in Section III-B;

2) Conversion of each normalized image into the blue ratio

image, as described in Section III-C;

3) Application of a LoG filter on each of the blue ratio

images, at a single scale; and

4) Representation of each pixel, from the test image, by its

RGB color in each of the normalized images and LoG

response from each of the blue ratio images.

As a result, each pixel in the test input image is represented

by 2N features, where the first N features are RGB colors

from the normalized images, and the last N features are LoG

responses computed from the blue ratio of the normalized

images. All 2N features are assumed to be independent

per selection of images in Section III-A. The rational for

integrating both color and scale information is that: (i) in some

cases, color information is insufficient to differentiate nuclear

regions from background; (ii) the scales (e.g., LoG responses)

of the background structure and nuclear region are typically

different; and (iii) the nuclear region responds well to blob

detectors, such as a LoG filter [28].

E. Multi-Reference Graph Cut Model

In this section, we first present the background material on

graph cut formalism, and then proceed to the details of the

image-based modeling for incorporating intrinsic and extrinsic

variations.

Within the graph cut formulation, an image is represented

as a graph G = ?¯V ,¯E?, where¯V is the set of all nodes, and

¯E is the set of all arcs connecting adjacent nodes. Usually, the

nodes and edges correspond to pixels (P) and their adjacency

relationship, respectively. Additionally, there are special nodes

known as terminals, which correspond to the set of labels that

can be assigned to pixels. In the case of a graph with two

terminals, the terminals are referred to as the source (S) and

the sink (T), which correspond to specific labels. The labeling

problem is to assign a unique label xp(0 for background, and

1 for foreground) for each node p ∈¯V , and the image cutout

is performed by minimizing the Gibbs energy E [35]:

E =

?

p∈¯V

Efitness(xp) + β

?

(p,q)∈¯ E

Esmoothness(xp,xq) (4)

Where Efitness(xp) is the likelihood energy,encoding the data

fitness cost for assigning xpto p, and Esmoothness(xp,xq) is

the prior energy, denoting the cost when the labels of adjacent

nodes, p and q, are xp and xq, respectively; β is the weight

for Esmoothness.

The optimization algorithms could be classified into two

groups: Goldberg-Tarjan “push-relabel” methods [36], and

Ford-Fulkerson “augmenting paths” [37]. The details of the

two methods can be found in [38].

We recognize that the training data set cannot fully capture

the intrinsic variations of the nuclear signature. Therefore, the

data fitness term is expressed as a combination of the intrinsic

local probability map and learned global property map. The

local probability map has the advantage of capturing local

intrinsic image property in the absence of colormap normal-

ization, thus, diversifying the data fitness term. Equation 4 is

rewritten as

E =

?

p∈¯V

?Egf(xp) + Elf(xp)?+β

?

(p,q)∈¯ E

Esmoothness(xp,xq)

(5)

where Egf is the global data fitness term encoding the fitness

cost for assigning xp to p, Elf is the local data fitness term

encoding the fitness cost for assigning xp to p. Each term

together with the optimization process is discussed below.

1) Global fitness term: The global fitness is established

based on manually annotated reference images. Let’s assume

N reference images: Qi,i ∈ [1,N], and for each reference im-

age, GMMs are used to represent the nuclei and background

in both RGB space and LoG response space, respectively:

Page 6

6

GMMk

the first N GMMs are for RGB space, and the last N

GMMs are for LoG response space. Details can be found

in III-A.

An input test image I is first normalized as Uiwith respect

to every reference image, Qi. Subsequently, RGB color and

LoG responses of Ui are collected to construct 2N features

per pixels, where the first N features are from the normalized

color(RGB) space, and the second N features are from LoG

response. Let

• p be a node corresponding to a pixel;

• fk(p) be kthfeature of p;

• α be the weight of LoG response;

• pk

1)/Background(l = 0):

Nuclei, GMMk

Background, in which k ∈ [1,2N], and

ibe the probability function of fkbeing Nuclei(l =

pk

l(p) =

GMMk

?1

l(p)

j=0GMMk

j(p)

• λibe the weight for Qi:

λi

=

1

3

C∈{R,G,B}

?

C

HC(Qi) · HC(Ui)/(||HC(Qi)|| · ||HC(Ui)||)

λC

i

λC

i

=

where ||.|| is L2norm, HC(·) is the histogram function

on a single color channel C ∈ {R,G,B} of an image.

Intuitively, λ measures similarity between two histograms

derived from Qi and Ui, which are represented with

20 bins. Based on our experiments, the λs become

stable when the number of bins reaches 20; conversely,

histograms with less than 20 bins are considered to have

insufficient resolution. The similarity parameter weighs

the fitness of the prior model, constructed from Qi, to

the features extracted from the normalized image Ui.

The global fitness term is now defined as

Egf(xp= i)=−

N

?

k=1

λklog(pk

i(fk(p)))

(6)

−α ·

2N

?

k=N+1

λk−Nlog(pk

i(fk(p)))

where the first and second terms integrate normalized color

features and LoG responses, respectively.

2) Local Fitness Term: While the global fitness term uti-

lizes both color and LoG information in the normalized space,

it does not utilize information in the original color space

of the input image. As a result, local variation may be lost

for a number of reasons, i.e., non-uniformity in the tissue

sections, local lesions, etc. The local data fitness of a pixel, p,

is computed from foreground and background seeds in a local

neighborhoodaround p that corresponds to peaks detected by a

LoG filter on the blue ratio image, where positive and negative

peaks often, but not always, correspond to the background

and foreground (nuclei), respectively. The accuracy can be

improved by a cascade of filters as follows:

1) Seeds detection: This step aims to collect local fore-

ground and background seeds by incorporating local and

(a)

(b)

Fig. 4.

(green dot) and background (blue dot) signals indicates an excellent perfor-

mance on the initial estimate; (b) Histogram of the blue ratio intensity derived

from image (a) indicates that the peak of the distribution corresponds to the

occurrence frequency of the background pixels.

(a) An example of the LoG response for detection of foreground

global image statistics. Typical positive and negative

peak responses, associated with the LoG filter, are

shown in Figure 4(a). Most of the time, the LoG filter

detects foreground and background locations correctly,

but there is a potential for errors. The protocol consists

of three steps:

a) Create a blue ratio image (Section III-C): In this

transformed space, the peak of the intensity his-

togram always corresponds to the preferred fre-

quency of the background intensity as shown in

Figure 4(b).

b) Construct distributions of the foreground and back-

ground: Apply the LoG filter on the blue ratio

image, detect peaks, and construct a distribution of

the blue ratio intensity at the peaks corresponding

to the negative and positive LoG responses. A

small subset of seeds can be mislabeled, but most

can be corrected in the following step.

c) Constrain the seed selection: Seeds (e.g., peaks of

the LoG response) are constrained by three criteria:

(i) the LoG responses must be above a minimum

conservative threshold for removing strictly noisy

artifacts; (ii) the intensity associated with the peak

of the negative LoG responses (e.g., foreground

peaks) must concur with the background peak,

specified in step (a); and (iii) within a small

Page 7

7

neighborhood of w1×w1, the minimum blue ratio

intensity, at the location of negative seeds, is set

as the threshold for background peaks, as shown

in Figure 5.

Fig. 5.

or negative (e.g., foreground or part of foreground) in the transformed blue

ratio image. In the blue ratio image with the most negative LoG response,

the threshold is set at the minimum intensity.

LoG responses can be either positive (e.g., potential background)

2) Local foreground/background color modeling: For each

pixel, p, foreground and background statistics within a

local neighborhood, w2× w2, is represented by two

GMMs in the original color space. These GMMs cor-

respond to the nuclei and background models (e.g.,

GMMLocal

The local fitness term is defined as:

Nucleiand GMMLocal

Background), respectively.

Elf(xp= i) = −γlog(pl(f(p)))

(7)

where f(p) refers to the RGB feature of node p in the original

color space, γ is the weight for local fitness, plis the proba-

bility function of f being Nuclei(l = 1)/Background(l = 0):

pl(p) =

GMMLocal

?1

l

(p)

j=0GMMLocal

j

(p)

3) Smoothness Term: While both local and global data

fitness terms are encoded by t-links (links between node

and terminals) in the graph, the smoothness term, which

ensures the smoothness of labeling between adjacent nodes, is

represented by n-links (links between adjacent nodes). Here,

we adopt the setup from [39] for n-links, which approximates a

continuous Riemannian metric by a discrete weighted graph so

that the max-flow/min-cut solution for the graph corresponds

to a local geodesic or minimal surface in the continuous case.

Consider a weighted graph constructed in III-E: G = ?¯V ,¯E?,

where¯V is the set of image pixels, and¯E is the set of all

edges connecting adjacent pixels. Let,

• {ek|1 ≤ k ≤ nG} be a set of vectors for the neighbor-

hood system, where nG is the neighborhood order, and

the vectors are ordered by their corresponding angle φk

(a)(b)(c)

Fig. 6.

neighborhood 2D grid; (c) One family of lines formed by edges of the graph.

(a) Eight-neighborhood system: nG = 8; (b) Contour on eight-

w.r.t. the +x axis, such that 0 ≤ φ1< φ2··· < φnG< π.

For example, when nG = 8, we have e1 = (1,0),

e2 = (1,1), e3 = (0,1), e4 = (−1,1), as shown in

Figure 6(a);

• wk be the weight for the edge between pixels: p and q,

where p and q belong to the same neighborhood system,

and ? pq = ±ek;

• L be a line formed by the edges in the graph, as shown

in Figure 6(c);

• C be a contour in the same 2D space where the graph G

is embedded, as shown in Figure 6(b);

• |C|Gbe the cut metric of C:

|C|G=

?

e∈¯ EC

we

where¯EC is the set of edges intersecting contour C;

• |C|Rbe the Riemannian length of contour C; and,

• D(p) be the metric(tensor), which continuously varies

over points p in the 2D Riemannian space;

Based on Integral Geometry [40], the Crofton-style formula

for Riemannian length |C|Rof contour C can be written as,

?

2(uT

L· D(p) · uL)

where uL is the unit vector in the direction of the line L,

and nC is a function that specifies how many times line L

intersects contour C. Following the approach in

local geodesic can be approximated by the max-flow/min-

cut solution (|C|G→ |C|R) with the following edge weight

setting:

wk(p) =δ2· |ek|2· ∆φk· detD(p)

2 · (eT

k· D(p) · ek)

where, δ is the cell-size of the grid, ?φk is the angular

difference between the kthand (k + 1)thedge lines, ?φk=

φk+1− φk, and

D(p) = g(|∇I|) · I + (1 − g(|∇I|)) · u · uT

detD(p)

3

2nCdL = 2|C|R

[39], the

3

2

(8)

(9)

where u =

ent at point p, I is the identity matrix, and g(x) = exp(−x2

∇I

|∇I|is a unit vector in the direction of image gradi-

2σ2)

Edge

p → S

p → T

WeightFor

p ∈ P

p ∈ P

Egf(xp= 1) + Elf(xp= 1)

Egf(xp= 0) + Elf(xp= 0)

we(p,q)

β · wk(p)

{p,q} ∈ N,

φ−→

pq∈ {φk,π + φk}

TABLE I

EDGE WEIGHTS FOR THE GRAPH CONSTRUCTION, WHERE N IS THE

NEIGHBORHOOD SYSTEM, AND β IS THE WEIGHT FOR SMOOTHNESS.

4) Optimization: The construction of the graph, with two

terminals, source S and sink T, is defined in Table I. This graph

is partitioned via the max-flow/min-cut algorithm proposed in

[41] to label the input image into foreground and background.

The optimization method belongs to a class of algorithms

based on augmenting paths, and the details can be found in

[41].

Page 8

8

F. Nuclear Mask Partitioning

A key observation we made is that the nuclear shape

is typically convex. Therefore, ambiguities associated with

the delineation of overlapping nuclei could be resolved by

detecting concavities and partitioning them through geometric

reasoning. The process, shown in Figure 7, consists of the

following steps:

1) Detection of Points of Maximum Curvature: The con-

tours of the nuclear mask were extracted, and the

curvature along the contour was computed by using

k =

boundary points. The derivatives were then computed by

convoluting the boundary with derivatives of Gaussian.

An example of detected points of maximum curvature

is shown in Figure 7.

2) Delaunay Triangulation (DT) of Points of Maximum

Curvature for Hypothesis Generation and Edge Re-

moval: DT was applied to all points of maximum cur-

vature to hypothesize all possible groupings. The main

advantage of DT is that the edges are non-intersecting,

and the Euclidean minimum spanning tree is a sub-

graph of DT. This hypothesis space was further refined

by removing edges based on certain rules, e.g., no

background intersection.

3) Geometric reasoning: Properties of both the hypothesis

graph (e.g, degree of vertex), and the shape of the object

(e.g., convexity) were integrated for edge inference.

This method is similar to the one proposed in our previous

work [42]; however, a significant performance improvement

has been made through triangulation and subsequent geometric

reasoning. Please refer to [43] for details.

x?y??−y?x??

(x?2+y?2)3/2, where x and y are coordinates of the

Fig. 7.

points of maximum curvature where potential folds are formed, (middle

row) formation of partitioning hypotheses through triangulation, (bottom row)

stepwise application of geometric constraints for deleting and pruning edges.

Steps in the delineation of overlapping nuclei: (Top row) identifying

IV. EXPERIMENTAL RESULTS AND DISCUSSION

In this section, we (i) discuss parameter setting, and (ii)

evaluate performance of the system against previous methods.

(a) Reference image(b) Test image

(c) Results via MCV(d) Results via MRGC

Fig. 8.

respectively) based on the same reference image, as shown in (a). Even though

the test image and the reference image are slightly different in color space,

compared with MCV, MRGC still produces 1) more accurate classifi cation,

due to the encoding of statistics from test image’s color space via local

probability map; 2) less noisy classifi cation due to the smoothness constrain.

A comparison between MCV and MRGC (as shown in (c) and (d),

Fig. 9. A subset of reference image ROI, with manual annotation overlaid as

green contours, indicating signifi cant amounts of technical variation. Nuclei

with white hollow regions inside are pointed out by arrows.

A. Experimental design and parameter setting

In order to capture the technical variation, we manually

selected and annotated 20 reference images of the size of

1k-by-1k pixels at 20X, and a subset is shown in Figure 9.

Nuclear segmentation was also performed at 20X, and only

the top M = 10 reference images with the highest weight

of λ were used. Essentially, this was a trade-off between

performance and computational time cost (see in Figure 13).

The number of components for GMM was selected to be

D = 20, while the parameters for GMM were estimated

Page 9

9

(a)(b)(c)

(d) (e)(f)

Fig. 10.

(a) Original image patch; (b) Detected seeds, Green: Nuclei region; Blue:

background; (c) Local Nuclei Probability established based on seeds; (d)

Classifi cation by our approach; (e) Classifi cation by MCV; (f) Classifi cation

by Random forest.

A comparison among our approach, MCV, and random forest.

(a) (b)

Fig. 11.

(b) Segmentation by our approach.

Segmentation on low chromatin nuclei. (a) Original image patch;

via EM algorithm. Other parameter settings were: α = 0.1,

β = 10.0, γ = 0.1, w1= 100, w2= 100, and σ = 4.0 (the

scale for both seeds detection and LoG feature extraction),

in which σ was determined based on the preferred nuclear

size at 20X, w1 was selected to minimize the seeds detec-

tion error on the annotated reference images, and all other

parameters were selected to minimize the cross validation error

from the following discretization: D ∈ {5,10,15,20,25,30},

α ∈ {0.05,0.10,...,0.95,1.00}, β ∈ {5,10,...,95,100},

γ ∈ {0.05,0.10,...,0.95,1.00}, w2 ∈ {50,60,...,190,200}.

The optimal γ value is relatively small, which can be attributed

to the fact that the global statistics from the well-constructed

reference images, cover most of the heterogeneity in our

dataset, and the role of local statistics is simply to assist the

global statistics with improved discriminating powers.

Fig. 13.

computational time as a function of number of reference images used. It is

clear that the top M = 10 reference images with highest λ is a reasonable

trade-off between performance and computational time.

Top and bottom rows show average classifi cation performance and

Approach

MRGC-MS

(Multi-Scale LoG)

MRGC

MRGC-CF

(Color Feature Only)

MRGC-GF

(Global Fitness Only)

Our Previous approach

MCV

Random Forest

PrecisionRecall F-Measure

0.77

0.79

0.82

0.78

0.794

0.785

0.720.83 0.771

0.80

0.78

0.69

0.59

0.71

0.65

0.75

0.76

0.752

0.709

0.719

0.664

TABLE II

COMPARISON OF AVERAGE CLASSIFICATION PERFORMANCE AMONG OUR

APPROACH(MRGC), OUR PREVIOUS APPROACH [2], MCV APPROACH IN

[31], AND RANDOM FOREST. FOR MCV, ONLY COLOR IN RGB SPACE IS

USED, WHICH IS IDENTICAL TO [31]. FOR RANDOM FOREST, THE SAME

FEATURES ARE USED: {R,G,B,LoG}, AND THE PARAMETER SETTINGS

ARE: ntree = 100, mtry = 2, node = 1.

B. Evaluation

Two-fold cross validation, with optimized parameter set-

tings, was applied to the reference images, and a comparison

of average classification performance was made between our

approach,random forest [44], and the most related work (Here,

we refer it to MCV: multi-classifier voting, for short) in [31],

as shown in Table II. Our experiment indicates that

1) By incorporating both global and local statistics (MRGC

vs MRGC-GF), our system better characterizes the vari-

ation in the data.

2) By incorporating the LoG response as a feature (MRGC

Approach

MRGC

Precision

0.75

0.63

Recall

0.85

0.75

F-Measure

0.797

0.685Our previous approach

TABLE III

COMPARISON OF AVERAGE SEGMENTATION PERFORMANCE BETWEEN

OUR CURRENT APPROACH(MRGC), AND OUR PREVIOUS APPROACH [2],

IN WHICH precision =#correctly segmented nuclei

#segmented nuclei

recall =

, AND

#correctly segmented nuclei

#manually segmented nuclei.

Page 10

10

vs MRGC-CF), we can encode the prior scale informa-

tion into the system. As a result, ambiguous background

structures are excluded, which leads to an increase of

precision. However, there is also a decrease in the recall

when compared to MRGC-CF, which is due to the fact

that the tiny fragments inside the nuclei, as indicated by

Figure 9, can also be eliminated.

3) MRGC with multi-scale LoG features (MRGC-MS) has

the best performance. We evaluated LoG responses

at three scales, σ ∈ {2,4,6}, to compensate for a

wide variation in the nuclear size. Improvement in

segmentation is marginal, and it comes with a significant

increase in the computational cost of about 40%. The

LoG filter is simply used for seed detection to represent

the underlying image statistics, and as long as a single

scale can provide sufficient statistics, multiscale LoG is

redundant. Besides, in processing whole slide images,

computational throughput is an important factor.

We also provide an intuitive example, shown in Figure 10,

demonstrating the effectiveness of the local probability map.

It is clear that the local probability map (Figure 10(c)) helps

to characterize nuclei with the low chromatin content, as

shown in the blue bounding boxes. Another example, shown

in Figure 11, further demonstrates the effectiveness of our

approach on the segmentation of low chromatin nuclei.

Finally, a comparison of the segmentation performance

between our current approach and our previous approach [2] is

indicated in Table III, where the correct nuclear segmentation

is defined as follows. Let

• MaxSize(a,b) be the maximum nuclear size of nuclei

a and b, and

• Overlap(a,b) be the amount of overlap between nuclei

a and b.

Subsequently, for any nucleus, nG, from ground truth, if there

is one and only one nucleus, nS, in the segmentation result,

that satisfies

be a correct segmentation of nG. The threshold was set to be

T = 0.8.

The reader may question the classification performance

since both precision and recall are not very high. The major

reason is that the ground truth (annotation) for the reference

images is created at the object (nucleus) level, which means the

hollow regions (loss of chromatin content for various reasons)

inside the nuclei will be marked as the nuclear region rather

than the background, as indicated by Figure 9.

Overlap(nG,nS)

MaxSize(nG,nS)> T, then nS is considered to

V. ANALYSIS OF TCGA GBM COHORT

Having evaluated the performance of the system, we applied

our method to a cohort of 377 GBM whole slide images, from

146 patients, for bioinformatics analysis. Figure 12 shows a

few snapshots of our classification and segmentation results;

Complete results for all the GBM tissue sections (and a few

other tumor types) are available through the NIH web site

at http://tcga-data.nci.nih.gov/tcga/. Following segmentation,

each nucleus is represented by a multidimensional feature

vector, which includes over 52 morphometric indices such

as nuclear size, cellularity, cytoplasmic features, etc., [2].

The density distribution of each index is then computed per

histology section and aggregated per patient.

A particular aspect of bioinformatics analysis relies on

subtyping based on a subset of computed morphometric in-

dices (e.g., cellular density), where subtyping is performed

through consensus clustering [45], [46]. In our experiment,

we evaluated all morphometric indices and discovered that

subtyping based on (i) nuclear size and cellularity, and (ii)

nuclear intensity and gradient, are statistically stable, where

four and two subtypes were inferred, respectively. Figure 14

shows the computed subtypes based on nuclear size and cellu-

larity, where one of the subtypes is predictive of the outcome

based on the clinical data. The patients in the GBM cohort

received one of the two types of therapies (i) an intensive

therapy with either concurrent radiation and chemotherapy, or

4 or more cycles of chemotherapy only, or (ii) a less intensive

therapy of either non-concurrent radiation and chemotherapy

or less than 4 cycles of chemotherapy only [47]. Although the

sample size for the patient receiving the less intensive therapy

is small, survival analyses [48] for one of the subtypes in each

of the clustering experiments points to a trend in an improved

survival for patients receiving the more intensive therapy, as

shown in Figure 15. In addition, several computed subtypes,

based on other morphometric indices, have also been found

to be predictive of the outcome. We also examined molecular

correlates of the predictive subtypes. With respect to predictive

subtype computed from nuclear size and cellularity indices, we

used moderated t-test [49] and identified a set of differentially

regulated transcripts for subtype 2 (e.g., predictive subtype)

as shown in Figure 16. A total of 10 differentially regulated

transcripts were then subject to further bioinformatics analysis

for subnetwork enrichment analysis using Pathway Logic,

which computes and ranks hubs according to their p-values,

as shown in Table IV(e.g., IL1, IL6), which impacts tumor

proliferation and migration in both normal and malignant cells

[50], [51] and the recruitment of the immune response. The

relationships between these hubs and the genes associated with

them are shown in Figure 17. Among the common regulators,

MAPK1 and FN1, which are involved in the proliferation,

are highly ranked transcripts in TCGA’s gene tracker for

GBM. Furthermore, FN1 is (i) implicated in the invasion

and angiogenesis, and (ii) validated as differentially expressed

transcripts in GBM versus benign tumors [52]. Finally, TGFB1

is well known to be involved in tumor maintenance and

progression through suppression of the immune response and

is abundantly produced by GBM [53]. These molecular asso-

ciations reflect that morphometric subtyping can hypothesize

relevant transcripts that are potential targets of therapy, which

is consistent with current literature. An example being, FN1,

and its role in the induction of angiogenesis. With respect to

the predictive subtype computed from nuclear intensity and

gradient indices, subnetwork enrichment analysis revealed a

large number of hubs from a set of differentially regulated

transcripts. In this case, VEGF was discovered to be at the

intersection of all pathways curated through enrichment anal-

ysis. VEGF is well known to be the hallmark of glioblastoma

for the induction of microvasculture formation [54] and has

been suggested as a therapuetic target in GBM [55].

Page 11

11

Hub name

IL1A

MAPK1

FN1

TNF

TGBF1

IL6

p-value

0.0003

0.0005

0.0005

0.003

0.009

0.03

TABLE IV

KEY HUBS IDENTIFIED THROUGH PATHWAY ENRICHMENT ANALYSIS.

0 50100150 200

Cellularity

(b)

250300 350 400450500

−2

0

2

4

6

8

10

12

14x 10

−3

Probability

Subtype1

Subtype2

Subtype3

Subtype4

(a)

Fig. 14.

index and nuclear area: (a) visualization of consensus clustering with four

clusters; and (b) distribution of cellularity index per subtype.

Morphometric subtyping reveals four subtypes based on cellularity

VI. CONCLUSION

We have shown that morphometric representation of cellular

architecture from a large cohort of histology sections can

provide new opportunities for hypothesis generation. The main

barriers are the batch effect and tumor heterogeneity which

hinders nuclear segmentation. However, through image-based

modeling, technical and tumor variations can be captured

for robust nuclear segmentation from whole slide images.

Subsequently, segmented nuclei and corresponding computed

morphometric representation enables characterization of tu-

mor histopathology. Our approach for nuclear segmentation

addresses technical and biological variations by (i) utilizing

global information from a diverse set of annotated reference

Fig. 15.

as a result of more aggressive therapy.

Computed subtypes with cellularity and nuclear size is predictive

ASCL1?

BCAN?

BCAN?

LUZP2?

TRO?

HIP1?

ETV1?

ETV1?

ETV1?

TNXA///TNXB?

C4A///C4B?

C4A///C4B?

COL9A3?

HTATIP2?

RNF128?

DPP4?

TPD52?

RND3?

ADAMTS1?

LRRFIP1?

PLIN2?

P4HA2?

STC1?

LAMB1?

LOXL2?

Subtype 1?

Subtype 2?Subtype 3?Subtype 4?

Fig. 16. Heat map representing a subset of differentially regulated transcripts

for Subtype 2.

Fig. 17.

Figure 15(a), reveals inflammatory hubs that promote tumor differentiation

and invasiveness in GBM.

Subnetwork enrichment analysis,for the predictive subtype in

images, (ii) normalizing the test image against the reference

images in the color space, and (iii) incorporating local varia-

tions in the test image. Segmentation is formulated within a

graph cut framework with geodesic constraint for improved

accuracy of the nuclear boundaries. The method has been

validated against annotated data and applied to a large dataset

of GBM tumor cohort to identify subtypes as a function

of cellularity and nuclear size. One of these subtypes is

shown to have an increase in survival as a result of a more

aggressive therapy with an underlying molecular signature that

is consistent with invasiveness and proliferation.

REFERENCES

[1] L. Dalton, S. Pinder, C. Elston, I. Ellis, D. Page, W. Dupont, and

R. Blamey, “Histolgical gradings of breast cancer: linkage of patient

outcome with level of pathologist agreements,” Modern Pathology,

vol. 13, no. 7, pp. 730–735, 2000.

[2] H. Chang, G. Fontenay, J. Han, G. Cong, F. Baehner, J. Gray, P. Spell-

man, and B. Parvin, “Morphometric analysis of TCGA Gliobastoma

Multiforme,” BMC Bioinformatics, vol. 12, no. 1, 2011.

[3] C. Demir and B. Yener, “ Automated cancer diagnosis based on

histopathological images: A systematic survey,”Technical Report, Rens-

selaer Polytechnic Institute, Department of Computer Science., 2009.

Page 12

12

[4] M. Gurcan, L. Boucheron, A. Can, A. Madabhushi, N. Rajpoot, and

Y. Bulent, “Histopathological image analysis: a review,” IEEE Transac-

tions on Biomedical Engineering, vol. 2, pp. 147–171, 2009.

[5] D. Axelrod, N. Miller, H. Lickley, J. Qian, W. Christens-Barry, Y. Yuan,

Y. Fu, and J. Chapman, “Effect of quantitative nuclear features on

recurrence of ductal carcinoma in situ (DCIS) of breast,” Cancer

Informatics, vol. 4, pp. 99–109, 2008.

[6] M. Datar, D. Padfi eld, and H. Cline, “Color and texture based segmen-

tation of molecular pathology images using HSOMs,”in ISBI, 2008, pp.

292–295.

[7] A. Basavanhally, J. Xu, A. Madabhushu, and S. Ganesan, “Computer-

aided prognosis of ER+ breast cancer histopathology and correlating

survival outcome with oncotype DX assay,”in ISBI, 2009, pp. 851–854.

[8] S. Doyle, M. Feldman, J. Tomaszewski, N. Shih, and A. Madabhushu,

“Cascaded multi-class pairwise classifi er (CASCAMPA) for normal,

cancerous, and cancer confounder classes in prostate histology,”in ISBI,

2011, pp. 715–718.

[9] R. Bhagavatula, M. Fickus, W. Kelly, C. Guo, J. Ozolek, C. Castro, and

J. Kovacevic, “ Automatic identifi cation and delineation of germ layer

components in h&e stained images of teratomas derived from human

and nonhuman primate embryonic stem cells,”in ISBI, 2010, pp. 1041–

1044.

[10] J. Kong, L. Cooper, A. Sharma, T. Kurk, D. Brat, and J. Saltz, “Texture

based image recognition in microscopy images of diffuse gliomas with

multi-class gentle boosting mechanism,” in ICASSAP, 2010, pp. 457–

460.

[11] J. Han, H. Chang, L. Loss, K. Zhang, F. Baehner, J. Gray, P. Spellman,

and B. Parvin, “Comparison of sparse coding and kernel methods for

histopathological classifi cation of glioblastoma multiforme,” in ISBI,

2011, pp. 711–714.

[12] E. Acar, G. Plopper, and B. Yener, “Coupled analysis of in vitro and

histology samples to quantify structure-function relationships,” PLoS

One, vol. 7, no. 3, p. e32227, 2012.

[13] C. Bilgin, S. Ray, B. Baydil, W. Daley, M. Larsen, and B. Yener,

“Multiscale feature analysis of salivary gland branching morphogenesis,”

PLoS One, vol. 7, no. 3, p. e32906, 2012.

[14] H. Fatakdawala, J. Xu, A. Basavanhally, G. Bhanot, S. Ganesan, F. Feld-

man, J. Tomaszewski, and A. Madabhushi, “Expectation-maximization-

driven geodesic active contours with overlap resolution (EMaGACOR):

Application to lymphocyte segmentation on breast cancer histopathol-

ogy,”IEEE Transactions on Biomedical Engineering, vol. 57, no. 7, pp.

1676–1690, 2010.

[15] P. Phukpattaranont and P. Boonyaphiphat, “Color based segmentation

of nuclear stained breast cancer cell images,” ECTI Transactions on

Electrical Engineering, and Communication, vol. 5, no. 2, pp. 158–164,

2007.

[16] B. Ballaro, A. Florena, V. Franco, D. Tegolo, C. Tripodo, and C. Valenti,

“ An automated image analysis methodology for classifying megakary-

ocytes in chronic myeloproliferative disorders,”Medical Image Analysis,

vol. 12, pp. 703–712, 2008.

[17] L. Latson, N. Sebek, and K. Powell, “ Automated cell nuclear segmen-

tation in color images of hematoxylin and eosin-stained breast biopsy,”

Analytical and Quantitative Cytology and Histology, vol. 26, no. 6, pp.

321–331, 2003.

[18] W. Land, D. McKee, T. Zhukov, D. Song, and W. Qian, “ A kernelized

fuzzy support verctor machine CAD system for the diagnostic of lung

cancer from tissue,”International Journal of Functional Informatics and

Personalised Medicine, vol. 1, no. 1, pp. 26–52, 2008.

[19] D. Glotsos, P. Spyridonos, D. Cavouras, P. Ravazoula, P. Dadioti,

and G. Nikiforidis, “ Automated segmentation of routinely hematoxyli-

eosin stained microscopic images by combining support vector machine,

clustering, and active contour models,”Anal Quant Cytol Histol, vol. 26,

no. 6, pp. 331–340, 2004.

[20] H. Chang, R. Defi lippis, T. Tlsty, and B. Parvin, “Graphical methods

for quantifying macromolecules through bright fi eld imaging,”Bioinfor-

matics, vol. 25, no. 8, pp. 1070–1075, 2009.

[21] E. Cosatto, M. Miller, H. Graf, and J. Meyer, “Grading nuclear ple-

morphism on histological micrographs,” in International Conference on

Pattern Recognition, 2008, pp. 1–4.

[22] S. Petushi, F. Garcia, M. Haber, C. Katsinis, and A. Tozeren, “Large-

scale computations on histology images reveal grade-differentiation

parameters for breast cancer,” BMC Medical Imaging, vol. 6, no. 14,

pp. 1070–1075, 2006.

[23] S. Doyle, S. Agner, A. Madabhushi, M. Feldman, and Tomaszewski,

“ Automated grading of breast cancer histopathology using spectral

clustering with textural and architectural image features,”in ISBI, 2008,

pp. 496–499.

[24] F. Bunyak, A. Hafi ane, and K. Palanippan, “Histopathology tissue

segmentation by combining fuzzy clustering with multiphase vector level

set,” Adv Exp Med Biol., vol. 696, pp. 413–424, 2011.

[25] B. Parvin, Q. Yang, J. Han, H. Chang, B. Rydberg, and Barcellos-Hoff,

“Iterative voting for inference of structural saliency and characterization

of subcellular events,”IEEE Transactions on Image Processing, vol. 16,

no. 3, pp. 615–623, March 2007.

[26] S. Nath, K. Palaniappan, and F. Bunyak, “Cell segmentation using

coupled level sets and graph-vertex,” in Medical Image Computing and

Computed-assisted Intervention-MICCAI, 2006, pp. 101–108.

[27] H. Chang and B. Parvin, “Multiphase level set for automated delineation

of membrane-bound macromolecules,” in ISBI, 2010, pp. 165–168.

[28] Y. Al-Kofahi, W. Lassoued, W. Lee, and B. Roysam, “Improved au-

tomatic detection and segmentation of cell nuclei in histopathology

images,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 4,

pp. 841–852, 2010.

[29] H. Kong, M. Gurcan, and K. Belkacem-Boussaid, “Partitioning

histopathological images: an integrated framework for supervised color-

texture segmentation and cell splitting,” IEEE Transactions on Medical

Imaging, vol. 30, no. 9, pp. 1661–1677, 2011.

[30] J. Monaco, J. Hipp, D. Lucas, S. Smith, U. Balis, and A. Madabhushi,

“Image segmentation with implicit color standardization using spatially

constrained expectation maximization: Detection of nuclei,” in Medical

Image Computing and Computed-assisted Intervention-MICCAI, 2012,

pp. 365–372.

[31] S. Kothari, J. H. Phan, R. A. Moffi tt, T. H. Stokes, S. E. Hassberger,

Q. Chaudry, A. N. Young, and M. D. Wang, “ Automatic batch-invariant

color segmentation of histological cancer images.”in ISBI. IEEE, 2011,

pp. 657–660.

[32] C.Tomasi, “Estimating

tureDensitieswith

www.cs.duke.edu/courses/spring04/cps196.1/handouts/EM/tomasiEM.pdf,

2004.

[33] A. Ruifork and D. Johnston, “Quantifi cation of histochemical staining

by color decomposition,” Anal Quant Cytol Histology, vol. 23, no. 4,

pp. 291–299, 2001.

[34] A. Rabinovich, S. Agarwal, C. Laris, J. H. Price, and S. Belongie, “Un-

supervised color decomposition of histologically stained tissue samples.”

in NIPS, 2003, pp. 667–674.

[35] S.Geman and D.Geman, “Stochastic relaxation, Gibbs distribution and

the Bayesian restoration of images,”IEEE Transaction on PAMI, vol. 6,

no. 6, pp. 721–741, 1984.

[36] A. V. Goldberg and R. E. Tarjan, “ A New Approach to Maximum-Flow

Problem,”Journal of the Association for Computing Machinery, vol. 35,

no. 4, pp. 921–940, 1988.

[37] L. Ford and D. Fullkerson, Flows in Networks.

Press, 1962.

[38] W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver,

Combinatorial Optimization.John Wiley & Sons, 1998.

[39] Y. Boykov and V. Kolmogorov, “Computing geodesics and minimal

surfaces via graph cuts,”in Proc. of IEEE ICCV, vol. 1, 2003, pp. 26–33.

[40] L. A. Santalo, Integral geometry and geometric probability.

Wesley, 1979.

[41] Y.Boykov and V.Kolmogorov, “ An experimental comparision of min-

cut/max-flow algorithms for energy minimization in vision,” IEEE

Transaction on PAMI, vol. 26, no. 9, pp. 1124–1137, 2004.

[42] S. Raman, C. Maxwell, M. Barcellos-Hoff, and B. Parvin, “Geometric

approach segmentation and protein localization in cell cultured assays,”

Journal of Microscopy, pp. 427–436, 2007.

[43] Q. Wen, H. Chang, and B. Parvin, “ A Delaunay triangulation approach

for segmenting clumps of nuclei,” in ISBI, 2009, pp. 9–12.

[44] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp.

5–32, 2001.

[45] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, “Consensus clustering –

a resampling-based method for class discovery and visualization of gene

expression microarray data,” in MACHINE LEARNING, FUNCTIONAL

GENOMICS SPECIAL ISSUE, 2003, pp. 91–118.

[46] J. Han, H. Chang, O. Giricz, G. Lee, F. Baehner, J. Gray, M. Bissell,

P. Kenny, and B. Parvin, “Molecular predictors of 3D morphogenesis by

breast cancer cells in 3D culture,” PLoS Computational Biology, vol. 6,

no. 2, p. e1000684, 2010.

[47] R. G. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D.

Wilkerson, C. R. Miller, L. Ding, T. Golub, J. P. Mesirov, G. Alexe,

M. Lawrence, M. O’Kelly, P. Tamayo, B. A. Weir, S. Gabrie, W. Winck-

ler, S. Gupta, L. Jakkula, H. S. Feiler, J. G. Hodgson, C. D. James, J. N.

Sarkaria, C. Brennan, A. Kahn, P. T. Spellman, R. K. Wilson, T. P. Speed,

J. W. Gray, M. Meyerson, G. Getz, C. M. Perou, D. N. Hayes, , and

GaussianMix-

EM-ATutorial,”

Princeton University

Addison-

Page 13

13

T. C. G. A. R. Network, “Integrated genomic analysis identifi es clinically

relevant subtypes of glioblastoma characterized by abnormalities in

PDGFRA, IDH1, EGFR, and NF1,” Cancer Cell, vol. 17, no. 1, pp.

98–110, 2010.

[48] P. Meier and E. Kaplan, “Nonparametric estimation from incomplete

observations,” Journal of American Statistical Association, vol. 53, pp.

457–481, 1958.

[49] G. Smyth, “Linear models and empirical bayes methods for assessing

differential expression in microarray experiments,” Statistical Applied

Genetics in Molecular Biology, vol. 3, no. 3, 2004.

[50] B. Paugh, L. Bryan, S. Paugh, K. Wilczynska, S. Alvarez, S. Singh,

D. Kapitonov, H. Rokita, S. Wright, I. Griswold-Prenner, S. Milstien,

S. Spiegel, and T. Kordula, “Interleukin-1 regulates the expression of

shpingosone kinase 1 in glioblastoma cells,” The Journal of Biological

Chemistry, vol. 284, no. 6, pp. 3408–3417, 2009.

[51] Q. Liu, R. Li, J. Shen, Q. He, L. Deng, C. Zhang, and J. Zhang, “Il-6

promotion of glioblastoma cell invasion and angiogenesis in u251 and

t98 cell lines,” Journal of Neurooncology, vol. 100, no. 2, pp. 165–176,

2010.

[52] C. Colin, N. Baeza, C. Bartoli, F. Fina, N. Eudes, I. Nanni, P. Martin,

L. Ouafi k, and D. Figarella-Branger, “Identifi cation of genes differen-

tially expressed in glioblastoma versus pilocytic astrocytoma using sup-

pression subtractive hybridization,” Oncogenomics, vol. 25, pp. 2818–

2826, 2006.

[53] M. Barcellos-Hoff, E. Newcomb, D. Zagzag, and A. Narayana, “Ther-

apeutic targets in malignant glioblastoma microenvironment,” Seminal

Radiation Oncology, vol. 19, pp. 163–170, 2009.

[54] R. Jain, T. Di, D. Duda, J. Loeffler, A. Sorensen, and T. Batchelor,

“ Angiogenesis in brain tumors,” Nature Review Neuroscience, vol. 8,

no. 8, pp. 610–622, 2007.

[55] A. Hormigo, B. Ding, and S. Rafi i, “ A target for antiangiogenic

therapy: Vascular enothelium derived from glioblastoma,” Proceedings

of National Academy of Science, vol. 108, no. 11, pp. 4271–4272, 2011.

Page 14

14

(a)(b) (c)

Fig. 12.

via our approach(MRGC); (c) Nuclear partition results via geometric reasoning.

Classifi cation and segmentation results indicates tolerance to intrinsic variations: (a) Original images; (b) Nuclear/Background classifi cation results