PosterPDF Available

Toward Zero Human Efforts: Iterative Training Framework for Noisy Segmentation Label

Authors:

Abstract

we propose an efficient unsupervised learning framework to robustly segment nuclei. We first use an iterative training process to improve segmentation quality without human labels. Then introduce a background boosting technique to enhance the segmentation accuracy.
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
Toward Zero Human Efforts: Iterative Training Framework for Noisy
Segmentation Label
Xiaoyang Rebecca Li
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1-832-488-9766,
Xiaoyang.rebecca.li@gmail.com
Badri Roysam*
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1-713-743-1773,
broysam@central.uh.edu
Hien Nguyen*
Unive rsity o f Houston
4726 Calhoun Rd
2nd line of address
1- 713-743-8615,
hvnguy35@central.uh.edu
* co-advisors
1. INTRODUCTION
Nuclear detection and segmentation are challenging
in large-scale brain images because of the
heterogeneity within brain cell spatial distributions
and morphologies [1]. Recently, Several well-
designed networks, such as UNet[2] and
MaskRCNN[3], achieved state-of-the-art accuracies
on instance segmentation tasks. However, the
performance of these supervised learning networks
heavily relies upon the quality of training samples [4].
Furthermore, most of the human annotations are
extremely labor-intensive. Thus, being able to build a
reliable deep network with minimal human
annotations is essential.
In this research, we propose an efficient unsupervised
learning framework to robustly segment nuclei. We
first use an iterative training process to improve
segmentation quality without human labels. Then
introduce a background boosting technique to
enhance the segmentation accuracy.
2. RELATED WORK
The primary goal of deep learning is to learn from a
set of training samples and produce a model capable
of predicting a similar outcome. However, when high
quality training samples are not present, the model
suffers from overfitting and produces detections with
the same error as the inputs. Recent work concerning
early stopping techniques brings a new thought of
training to Deep Learning (DL) models but requires a
good validation set to determine termination.
Bootstrap training [5] claims the classification
performance of neural networks can be improved by
retraining the netwo rk with the results of pri or testing
outputs. It provides a good foundation of the iterative
training but does not involve corrections over
iterations. Prior study [6], which implements iterative
training and graph cut refinements to the output
predictions, is very similar to our work. Our work
expands this idea as we address segmenting crowded
objects by background boosting technique .
3. METHODOLOGY
Motivated by the limitations above, we propose an
unsupervised pipeline that trains deep networks for
cell segmentation, as shown in Fig. 1. Given an
unlabeled training dataset, a watershed clustering
method is used to generate noisy detection and seg-
mentation masks from the DAPI stained grayscale
image. These noisy labels serve as the input to
training the initial MaskRCNN model. Our pipeline
then uses a background boosting technique to
enhance the output of MaskRCNN, especially for
those crowded regions with many cells. The rectified
MaskRCNN’s output then serves as the new input to
updating itself. Our experiments show that this
iterative training process can significantly reduce
noises and improve the model’s performance.
Fig. 1. Fully automatic pipeline for brain cell nuclear
segmentation
3.1 Iterative Training
For an image with pixel position, its
segmentation labels at the pixel input is denoted by
, where {0, … }, is the total number of
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
objects in the image. For a deep learning network
with parameters θ , the output at pixel
is
(|;θ ). However, in our setting, the
ground truth value is unknown and should be
regarded as an unobserved latent variable. Thus, the
weakly annotated label can only be used for initial
guidance, i.e.
=
(|;0 ). (1)
To estimate the ground truth label and learn the
network parameter θ at the same time, we adopt
Expectation and Maximization (EM) where E step
learns the latent segmentation by the previous
network parameters θ, according to [7], i.e. the input
label to the network’s output from the previous
iteration; M step assigns the optimal value of θ by
observing the loss function of the testing label.
Defines the loss function as the average IoU over
current and previous iteration. Then the convergence
of algorithm can be observed when the loss function
barely changes.
3.2 Background Boosting (BgBoost)
In weakly supervised training, there commonly exists
objects never labeled or labeled in bad quality such
that the network treats them as out-of-model noise.
Many of those out-of-model objects located at the
background regions and outside of the detected
regions. The missing objects are unable to be
detected because the network never learns the
correct labels. To reduce these effects, we applied
background boosting to further detect the objects in
the blind regions of the original MRCNN model.
Given an input image , and a trained model , the
segmentation output can be denoted as =(0)
where is a set of binary masks, i.e. =
{1,2}. The foreground mask describes the
location of the pixels that have been detected.
Calculating B as the binary summation of all the
binary masks, i.e. =
1i .
Background boosting first remove the union of binary
masks from the previous iterations, then run the same
detection model on the remaining regions of the
images. The maximum steps of iteration can
approximately set as the maximum number of objects
in the image. Notice that this technique does require
training the network again but aims to make use of
the one trained model as much as possible.
Nuclear Clumps, i.e. a group of closely adjacent or
partially overlapped cells, is one of the main
challenges in nuclear segmentation. Our method (*)
can tackle this clumps problem because it is able to
segment the subset of objects in clumps. And the
region proposal network tends to look for strong
candidates, given a fixed number of proposals. In this
way, the network favors the correctly segmented
objects and neglect the wrong ones. Examples of the
removed objects from previous iterations are shown
in the gray areas in Fig. 2. In general cases, big clumps
would break into small clumps and become much
easier to recognize.
4. RESULTS AND CONTRIBUTIONS
The training set is a set of 6,000 small images of sizes
512x512, cropped from a whole rat brain image. 181
of these randomly cropped images are human
annotations for result validation. None of the human
annotations are used in training. Training MRCNN for
one iteration takes five hours using a GPU, while
testing on the whole dataset takes three hours. Due
to the extensive time it takes to train, we only run two
iterations of training; nonetheless, the results show
stable improvement after these training sessions
4.1 BgBoost Discussions
Background boosting helps detect out-of-model
objects in the background. We examine 181 vali dation
images of sizes 512x512 and record the cell detection
results from each iteration. As can be seen in Fig. 3(1),
all the testing samples stop before 10 iterations, and
the average number of cells detected in the
background increase over iterations. It also verifies
the convergence of the algorithm. Fig. 3(2) shows the
distribution of the stopping iterations over all
Image
Iter 0
Iter 1
Iter 2
Result
Fig.
2. An example of background boosting
We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.
validation samples. Most of the samples stop
between iterations 3 and 5. The image samples
stopping at early iterations often have sparsely
distributed cells, whereas the nuclei in the images
that stop at a later iteration tend to be densely
packed. It verified that background boosting greatly
benefits the crowded regions.
4.2 Overall Performance
mIoU
F1@
IoU0.5
F1@
IoU0.75
56.3
46.1
8.4
75.7
73.2
55.6
74.2
72.3
48.9
76.2
74.8
53.5
BgBoost
79.1
81.1
63.9
Table 1. Performance Metrics Comparisons
To evaluate the segmentation performance, we
measure the Intersection-over-Union (IoU) and F1
scores at IoU thresholds 0.5 and 0.75 , as shown in
Table 1. The performance of the parametric based
methods, including multi-scale LoG with graph-cuts
by Yousef and compactness constraint watershed,
are listed in the first two rows of Table 1. The
watershed result is also used as the noisy labeled
annotation input for MRCNN training. Directly
applying the original MRCNN on the noisy labeled
training set receives a performance drop compared
with the training set. Iterative training and refinement
can primarily help to recover the out-of-model
objects and increase the IoU and F1 performances.
4.3 Contributions
Our framework of iterative training from noisy
models with background boosting shows significant
improvement of the compacted object separation
problem. Our main contributions are:
1. Using an iterative training process to improve
segmentation quality without human labels
2. Introducing a background boosting technique
to enhance segmentation accuracy
3. A similar technique can be easily applied to
other data-driving models other than
MRCNN
5. REFERENCES
[1] Bougen-Zhuk ov N, Loh SY, Lee HK, Loo LH. Large-scale
image-based screening and profiling of cellular phenotypes.
Cytometry Part A. 2017 Feb;91(2):115-25.
[2] Ronneberger, O., Fischer, P., & Brox, T. "U-net:
Convolutional networks for biomedical image
segmentation." International Conference on Medical image
computing and computer-assisted intervention. Springer,
Cham, 2015.
[3] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE
international conference on computer vision. 2017.
[4] Caicedo JC, Goodman A, Karhohs KW, Cimini BA,
Ackerman J, Haghighi M, Heng C, Becker T, Doan M,
McQuin C, Rohban M. Nucleus segmentation across
imaging experiments: the 2018 Data Science Bowl. Nature
methods. 2019 Dec;16(12):1247-53.
[5] Reed, Scott, et al. "Training deep neural networks on noisy
labels with bootstrapping." arXiv preprint arXiv:1412.6596
(2014).
[6] Zhao, Xiangyun, Shuang Liang, and Yichen Wei. "Pseudo
mask augmented object detection." Proceedings of the
IEEE conf erence on c omputer vision and pa ttern
recognition. 2018.
[7] Khoreva, Anna, et al. "Simple does it: Weakly supervised
instance and semantic segmentation." Proceedings of the
IEEE conf erence on c omputer vision and pa ttern
recognition. 2017
Figure
3. BgBoost over iterations
(1)
(2)
... Caicedo et al. [92] presented an RNN-based regularization to remove unrelated features resulting from noisy labels for weakly supervised single-cell profiling. An unsupervised learning method for nuclear segmentation in brain images was proposed in [139] to iteratively train a mask R-CNN model with automatically generated noisy instance segmentation masks and refine the labels using an expectation and maximization (EM) procedure. Park et al. [140] proposed a robust neuron segmentation method that leveraged ADMSE loss to adaptively reduce the weights of noisy labels. ...
Article
Full-text available
Cell images, which have been widely used in biomedical research and drug discovery, contain a great deal of valuable information that encodes how cells respond to external stimuli and intentional perturbations. Meanwhile, to discover rarer phenotypes, cell imaging is frequently performed in a high-content manner. Consequently, the manual interpretation of cell images becomes extremely inefficient. Fortunately, with the advancement of deep-learning technologies, an increasing number of deep learning-based algorithms have been developed to automate and streamline this process. In this study, we present an in-depth survey of the three most critical tasks in cell image analysis: segmentation, tracking, and classification. Despite the impressive score, the challenge still remains: most of the algorithms only verify the performance in their customized settings, causing a performance gap between academic research and practical application. Thus, we also review more advanced machine learning technologies, aiming to make deep learning-based methods more useful and eventually promote the application of deep-learning algorithms.
Article
Full-text available
Segmenting the nuclei of cells in microscopy images is often the first step in the quantitative analysis of imaging data for biological and biomedical applications. Many bioimage analysis tools can segment nuclei in images but need to be selected and configured for every experiment. The 2018 Data Science Bowl attracted 3,891 teams worldwide to make the first attempt to build a segmentation method that could be applied to any two-dimensional light microscopy image of stained nuclei across experiments, with no human interaction. Top participants in the challenge succeeded in this task, developing deep-learning-based models that identified cell nuclei across many image types and experimental conditions without the need to manually adjust segmentation parameters. This represents an important step toward configuration-free bioimage analysis software tools.
Article
Full-text available
Cellular phenotypes are observable characteristics of cells resulting from the interactions of intrinsic and extrinsic chemical or biochemical factors. Image-based phenotypic screens under large numbers of basal or perturbed conditions can be used to study the influences of these factors on cellular phenotypes. Hundreds to thousands of phenotypic descriptors can also be quantified from the images of cells under each of these experimental conditions. Therefore, huge amounts of data can be generated, and the analysis of these data has become a major bottleneck in large-scale phenotypic screens. Here, we review current experimental and computational methods for large-scale image-based phenotypic screens. Our focus is on phenotypic profiling, a computational procedure for constructing quantitative and compact representations of cellular phenotypes based on the images collected in these screens. © 2016 International Society for Advancement of Cytometry.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Training deep neural networks on noisy labels with bootstrapping
  • Scott Reed
Reed, Scott, et al. "Training deep neural networks on noisy labels with bootstrapping." arXiv preprint arXiv:1412.6596 (2014).