Content uploaded by Xiaoyang Rebecca Li
Author content
All content in this area was uploaded by Xiaoyang Rebecca Li on Aug 02, 2020
Content may be subject to copyright.
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
Toward Zero Human Efforts: Iterative Training Framework for Noisy
Segmentation Label
Xiaoyang Rebecca Li
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1-832-488-9766,
Xiaoyang.rebecca.li@gmail.com
Badri Roysam*
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1-713-743-1773,
broysam@central.uh.edu
Hien Nguyen*
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1- 713-743-8615,
hvnguy35@central.uh.edu
* co-advisors
1. INTRODUCTION
Nuclear detection and segmentation are challenging
in large-scale brain images because of the
heterogeneity within brain cell spatial distributions
and morphologies [1]. Recently, Several well-
designed networks, such as UNet[2] and
MaskRCNN[3], achieved state-of-the-art accuracies
on instance segmentation tasks. However, the
performance of these supervised learning networks
heavily relies upon the quality of training samples [4].
Furthermore, most of the human annotations are
extremely labor-intensive. Thus, being able to build a
reliable deep network with minimal human
annotations is essential.
In this research, we propose an efficient unsupervised
learning framework to robustly segment nuclei. We
first use an iterative training process to improve
segmentation quality without human labels. Then
introduce a background boosting technique to
enhance the segmentation accuracy.
2. RELATED WORK
The primary goal of deep learning is to learn from a
set of training samples and produce a model capable
of predicting a similar outcome. However, when high
quality training samples are not present, the model
suffers from overfitting and produces detections with
the same error as the inputs. Recent work concerning
early stopping techniques brings a new thought of
training to Deep Learning (DL) models but requires a
good validation set to determine termination.
Bootstrap training [5] claims the classification
performance of neural networks can be improved by
retraining the netwo rk with the results of pri or testing
outputs. It provides a good foundation of the iterative
training but does not involve corrections over
iterations. Prior study [6], which implements iterative
training and graph cut refinements to the output
predictions, is very similar to our work. Our work
expands this idea as we address segmenting crowded
objects by background boosting technique .
3. METHODOLOGY
Motivated by the limitations above, we propose an
unsupervised pipeline that trains deep networks for
cell segmentation, as shown in Fig. 1. Given an
unlabeled training dataset, a watershed clustering
method is used to generate noisy detection and seg-
mentation masks from the DAPI stained grayscale
image. These noisy labels serve as the input to
training the initial MaskRCNN model. Our pipeline
then uses a background boosting technique to
enhance the output of MaskRCNN, especially for
those crowded regions with many cells. The rectified
MaskRCNN’s output then serves as the new input to
updating itself. Our experiments show that this
iterative training process can significantly reduce
noises and improve the model’s performance.
Fig. 1. Fully automatic pipeline for brain cell nuclear
segmentation
3.1 Iterative Training
For an image with pixel position, its
segmentation labels at the pixel input is denoted by
, where ∈{0, … }, is the total number of
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
objects in the image. For a deep learning network
with parameters θ , the output at pixel
is
(|;θ ). However, in our setting, the
ground truth value is unknown and should be
regarded as an unobserved latent variable. Thus, the
weakly annotated label can only be used for initial
guidance, i.e.
=
(|;0 ). (1)
To estimate the ground truth label � and learn the
network parameter θ at the same time, we adopt
Expectation and Maximization (EM) where E step
learns the latent segmentation by the previous
network parameters θ, according to [7], i.e. the input
label to the network’s output from the previous
iteration; M step assigns the optimal value of θ by
observing the loss function of the testing label.
Defines the loss function as the average IoU over
current and previous iteration. Then the convergence
of algorithm can be observed when the loss function
barely changes.
3.2 Background Boosting (BgBoost)
In weakly supervised training, there commonly exists
objects never labeled or labeled in bad quality such
that the network treats them as out-of-model noise.
Many of those out-of-model objects located at the
background regions and outside of the detected
regions. The missing objects are unable to be
detected because the network never learns the
correct labels. To reduce these effects, we applied
background boosting to further detect the objects in
the blind regions of the original MRCNN model.
Given an input image , and a trained model ℱ, the
segmentation output can be denoted as =ℱ(0)
where is a set of binary masks, i.e. =
{1,2…}. The foreground mask describes the
location of the pixels that have been detected.
Calculating B as the binary summation of all the
binary masks, i.e. =⋃
1i .
Background boosting first remove the union of binary
masks from the previous iterations, then run the same
detection model on the remaining regions of the
images. The maximum steps of iteration can
approximately set as the maximum number of objects
in the image. Notice that this technique does require
training the network again but aims to make use of
the one trained model as much as possible.
Nuclear Clumps, i.e. a group of closely adjacent or
partially overlapped cells, is one of the main
challenges in nuclear segmentation. Our method (*)
can tackle this clumps problem because it is able to
segment the subset of objects in clumps. And the
region proposal network tends to look for strong
candidates, given a fixed number of proposals. In this
way, the network favors the correctly segmented
objects and neglect the wrong ones. Examples of the
removed objects from previous iterations are shown
in the gray areas in Fig. 2. In general cases, big clumps
would break into small clumps and become much
easier to recognize.
4. RESULTS AND CONTRIBUTIONS
The training set is a set of 6,000 small images of sizes
512x512, cropped from a whole rat brain image. 181
of these randomly cropped images are human
annotations for result validation. None of the human
annotations are used in training. Training MRCNN for
one iteration takes five hours using a GPU, while
testing on the whole dataset takes three hours. Due
to the extensive time it takes to train, we only run two
iterations of training; nonetheless, the results show
stable improvement after these training sessions
4.1 BgBoost Discussions
Background boosting helps detect out-of-model
objects in the background. We examine 181 vali dation
images of sizes 512x512 and record the cell detection
results from each iteration. As can be seen in Fig. 3(1),
all the testing samples stop before 10 iterations, and
the average number of cells detected in the
background increase over iterations. It also verifies
the convergence of the algorithm. Fig. 3(2) shows the
distribution of the stopping iterations over all
Image
Iter 0
Iter 1
Iter 2
Result
Fig.
2. An example of background boosting
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
validation samples. Most of the samples stop
between iterations 3 and 5. The image samples
stopping at early iterations often have sparsely
distributed cells, whereas the nuclei in the images
that stop at a later iteration tend to be densely
packed. It verified that background boosting greatly
benefits the crowded regions.
4.2 Overall Performance
mIoU
F1@
IoU0.5
F1@
IoU0.75
Yousef
56.3
46.1
8.4
Watershed (Training input)
75.7
73.2
55.6
Original MRCNN
74.2
72.3
48.9
Iterative Training
76.2
74.8
53.5
*Iterative Training with
BgBoost
79.1
81.1
63.9
Table 1. Performance Metrics Comparisons
To evaluate the segmentation performance, we
measure the Intersection-over-Union (IoU) and F1
scores at IoU thresholds 0.5 and 0.75 , as shown in
Table 1. The performance of the parametric based
methods, including multi-scale LoG with graph-cuts
by Yousef and compactness constraint watershed,
are listed in the first two rows of Table 1. The
watershed result is also used as the noisy labeled
annotation input for MRCNN training. Directly
applying the original MRCNN on the noisy labeled
training set receives a performance drop compared
with the training set. Iterative training and refinement
can primarily help to recover the out-of-model
objects and increase the IoU and F1 performances.
4.3 Contributions
Our framework of iterative training from noisy
models with background boosting shows significant
improvement of the compacted object separation
problem. Our main contributions are:
1. Using an iterative training process to improve
segmentation quality without human labels
2. Introducing a background boosting technique
to enhance segmentation accuracy
3. A similar technique can be easily applied to
other data-driving models other than
MRCNN
5. REFERENCES
[1] Bougen-Zhuk ov N, Loh SY, Lee HK, Loo LH. Large-scale
image-based screening and profiling of cellular phenotypes.
Cytometry Part A. 2017 Feb;91(2):115-25.
[2] Ronneberger, O., Fischer, P., & Brox, T. "U-net:
Convolutional networks for biomedical image
segmentation." International Conference on Medical image
computing and computer-assisted intervention. Springer,
Cham, 2015.
[3] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE
international conference on computer vision. 2017.
[4] Caicedo JC, Goodman A, Karhohs KW, Cimini BA,
Ackerman J, Haghighi M, Heng C, Becker T, Doan M,
McQuin C, Rohban M. Nucleus segmentation across
imaging experiments: the 2018 Data Science Bowl. Nature
methods. 2019 Dec;16(12):1247-53.
[5] Reed, Scott, et al. "Training deep neural networks on noisy
labels with bootstrapping." arXiv preprint arXiv:1412.6596
(2014).
[6] Zhao, Xiangyun, Shuang Liang, and Yichen Wei. "Pseudo
mask augmented object detection." Proceedings of the
IEEE conf erence on c omputer vision and pa ttern
recognition. 2018.
[7] Khoreva, Anna, et al. "Simple does it: Weakly supervised
instance and semantic segmentation." Proceedings of the
IEEE conf erence on c omputer vision and pa ttern
recognition. 2017
Figure
3. BgBoost over iterations
(1)
(2)