PosterPDF Available

Toward Zero Human Efforts: Iterative Training Framework for Noisy Segmentation Label

September 2020

September 2020

Conference: Grace Hopper Celebration 2020
Affiliation: University of Houston

Authors:

Adobe Inc.

we propose an efficient unsupervised learning framework to robustly segment nuclei. We first use an iterative training process to improve segmentation quality without human labels. Then introduce a background boosting technique to enhance the segmentation accuracy.

Content uploaded by Xiaoyang Rebecca Li

Content may be subject to copyright.

We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.

Toward Zero Human Efforts: Iterative Training Framework for Noisy

Segmentation Label

Xiaoyang Rebecca Li

Unive rsity o f Houston

4726 Calhoun Rd

2nd line of address

1-832-488-9766,

Xiaoyang.rebecca.li@gmail.com

Badri Roysam*

Unive rsity o f Houston

4726 Calhoun Rd

2nd line of address

1-713-743-1773,

broysam@central.uh.edu

Hien Nguyen*

Unive rsity o f Houston

4726 Calhoun Rd

2nd line of address

1- 713-743-8615,

hvnguy35@central.uh.edu

* co-advisors

1. INTRODUCTION

Nuclear detection and segmentation are challenging

in large-scale brain images because of the

heterogeneity within brain cell spatial distributions

and morphologies [1]. Recently, Several well-

designed networks, such as UNet[2] and

MaskRCNN[3], achieved state-of-the-art accuracies

on instance segmentation tasks. However, the

performance of these supervised learning networks

heavily relies upon the quality of training samples [4].

Furthermore, most of the human annotations are

extremely labor-intensive. Thus, being able to build a

reliable deep network with minimal human

annotations is essential.

In this research, we propose an efficient unsupervised

learning framework to robustly segment nuclei. We

first use an iterative training process to improve

segmentation quality without human labels. Then

introduce a background boosting technique to

enhance the segmentation accuracy.

2. RELATED WORK

The primary goal of deep learning is to learn from a

set of training samples and produce a model capable

of predicting a similar outcome. However, when high

quality training samples are not present, the model

suffers from overfitting and produces detections with

the same error as the inputs. Recent work concerning

early stopping techniques brings a new thought of

training to Deep Learning (DL) models but requires a

good validation set to determine termination.

Bootstrap training [5] claims the classification

performance of neural networks can be improved by

retraining the netwo rk with the results of pri or testing

outputs. It provides a good foundation of the iterative

training but does not involve corrections over

iterations. Prior study [6], which implements iterative

training and graph cut refinements to the output

predictions, is very similar to our work. Our work

expands this idea as we address segmenting crowded

objects by background boosting technique .

3. METHODOLOGY

Motivated by the limitations above, we propose an

unsupervised pipeline that trains deep networks for

cell segmentation, as shown in Fig. 1. Given an

unlabeled training dataset, a watershed clustering

method is used to generate noisy detection and seg-

mentation masks from the DAPI stained grayscale

image. These noisy labels serve as the input to

training the initial MaskRCNN model. Our pipeline

then uses a background boosting technique to

enhance the output of MaskRCNN, especially for

those crowded regions with many cells. The rectified

MaskRCNN’s output then serves as the new input to

updating itself. Our experiments show that this

iterative training process can significantly reduce

noises and improve the model’s performance.

Fig. 1. Fully automatic pipeline for brain cell nuclear

segmentation

3.1 Iterative Training

For an image  with  pixel position, its

segmentation labels at the pixel  input is denoted by

, where ∈{0, … },  is the total number of

We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.

objects in the image. For a deep learning network

with parameters θ , the output at pixel

 is 

(|;θ ). However, in our setting, the

ground truth value  is unknown and should be

regarded as an unobserved latent variable. Thus, the

weakly annotated label  can only be used for initial

guidance, i.e.



=

(|;0 ). (1)

To estimate the ground truth label � and learn the

network parameter θ at the same time, we adopt

Expectation and Maximization (EM) where E step

learns the latent segmentation by the previous

network parameters θ, according to [7], i.e. the input

label  to the network’s output from the previous

iteration; M step assigns the optimal value of θ by

observing the loss function of the testing label.

Defines the loss function as the average IoU over

current and previous iteration. Then the convergence

of algorithm can be observed when the loss function

barely changes.

3.2 Background Boosting (BgBoost)

In weakly supervised training, there commonly exists

objects never labeled or labeled in bad quality such

that the network treats them as out-of-model noise.

Many of those out-of-model objects located at the

background regions and outside of the detected

regions. The missing objects are unable to be

detected because the network never learns the

correct labels. To reduce these effects, we applied

background boosting to further detect the objects in

the blind regions of the original MRCNN model.

Given an input image , and a trained model ℱ, the

segmentation output can be denoted as =ℱ(0)

where  is a set of  binary masks, i.e. =

{1,2…}. The foreground mask describes the

location of the pixels that have been detected.

Calculating B as the binary summation of all the

binary masks, i.e. =⋃



1i .

Background boosting first remove the union of binary

masks from the previous iterations, then run the same

detection model on the remaining regions of the

images. The maximum steps of iteration can

approximately set as the maximum number of objects

in the image. Notice that this technique does require

training the network again but aims to make use of

the one trained model as much as possible.

Nuclear Clumps, i.e. a group of closely adjacent or

partially overlapped cells, is one of the main

challenges in nuclear segmentation. Our method (*)

can tackle this clumps problem because it is able to

segment the subset of objects in clumps. And the

region proposal network tends to look for strong

candidates, given a fixed number of proposals. In this

way, the network favors the correctly segmented

objects and neglect the wrong ones. Examples of the

removed objects from previous iterations are shown

in the gray areas in Fig. 2. In general cases, big clumps

would break into small clumps and become much

easier to recognize.

4. RESULTS AND CONTRIBUTIONS

The training set is a set of 6,000 small images of sizes

512x512, cropped from a whole rat brain image. 181

of these randomly cropped images are human

annotations for result validation. None of the human

annotations are used in training. Training MRCNN for

one iteration takes five hours using a GPU, while

testing on the whole dataset takes three hours. Due

to the extensive time it takes to train, we only run two

iterations of training; nonetheless, the results show

stable improvement after these training sessions

4.1 BgBoost Discussions

Background boosting helps detect out-of-model

objects in the background. We examine 181 vali dation

images of sizes 512x512 and record the cell detection

results from each iteration. As can be seen in Fig. 3(1),

all the testing samples stop before 10 iterations, and

the average number of cells detected in the

background increase over iterations. It also verifies

the convergence of the algorithm. Fig. 3(2) shows the

distribution of the stopping iterations over all

Image

Iter 0

Iter 1

Iter 2

Result

Fig.

2. An example of background boosting

We envision a future where the people who imagine and build technology mirror the people and societies for whom they build it.

validation samples. Most of the samples stop

between iterations 3 and 5. The image samples

stopping at early iterations often have sparsely

distributed cells, whereas the nuclei in the images

that stop at a later iteration tend to be densely

packed. It verified that background boosting greatly

benefits the crowded regions.

4.2 Overall Performance

mIoU

F1@

IoU0.5

F1@

IoU0.75

Yousef

56.3

46.1

8.4

Watershed (Training input)

75.7

73.2

55.6

Original MRCNN

74.2

72.3

48.9

Iterative Training

76.2

74.8

53.5

*Iterative Training with

BgBoost

79.1

81.1

63.9

Table 1. Performance Metrics Comparisons

To evaluate the segmentation performance, we

measure the Intersection-over-Union (IoU) and F1

scores at IoU thresholds 0.5 and 0.75 , as shown in

Table 1. The performance of the parametric based

methods, including multi-scale LoG with graph-cuts

by Yousef and compactness constraint watershed,

are listed in the first two rows of Table 1. The

watershed result is also used as the noisy labeled

annotation input for MRCNN training. Directly

applying the original MRCNN on the noisy labeled

training set receives a performance drop compared

with the training set. Iterative training and refinement

can primarily help to recover the out-of-model

objects and increase the IoU and F1 performances.

4.3 Contributions

Our framework of iterative training from noisy

models with background boosting shows significant

improvement of the compacted object separation

problem. Our main contributions are:

1. Using an iterative training process to improve

segmentation quality without human labels

2. Introducing a background boosting technique

to enhance segmentation accuracy

3. A similar technique can be easily applied to

other data-driving models other than

MRCNN

5. REFERENCES

[1] Bougen-Zhuk ov N, Loh SY, Lee HK, Loo LH. Large-scale

image-based screening and profiling of cellular phenotypes.

Cytometry Part A. 2017 Feb;91(2):115-25.

[2] Ronneberger, O., Fischer, P., & Brox, T. "U-net:

Convolutional networks for biomedical image

segmentation." International Conference on Medical image

computing and computer-assisted intervention. Springer,

Cham, 2015.

[3] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE

international conference on computer vision. 2017.

[4] Caicedo JC, Goodman A, Karhohs KW, Cimini BA,

Ackerman J, Haghighi M, Heng C, Becker T, Doan M,

McQuin C, Rohban M. Nucleus segmentation across

imaging experiments: the 2018 Data Science Bowl. Nature

methods. 2019 Dec;16(12):1247-53.

[5] Reed, Scott, et al. "Training deep neural networks on noisy

labels with bootstrapping." arXiv preprint arXiv:1412.6596

(2014).

[6] Zhao, Xiangyun, Shuang Liang, and Yichen Wei. "Pseudo

mask augmented object detection." Proceedings of the

IEEE conf erence on c omputer vision and pa ttern

recognition. 2018.

[7] Khoreva, Anna, et al. "Simple does it: Weakly supervised

instance and semantic segmentation." Proceedings of the

IEEE conf erence on c omputer vision and pa ttern

recognition. 2017

Figure

3. BgBoost over iterations

(1)

(2)

Deep Learning in Cell Image Analysis

Article

Full-text available

Sep 2022

Cell images, which have been widely used in biomedical research and drug discovery, contain a great deal of valuable information that encodes how cells respond to external stimuli and intentional perturbations. Meanwhile, to discover rarer phenotypes, cell imaging is frequently performed in a high-content manner. Consequently, the manual interpretation of cell images becomes extremely inefficient. Fortunately, with the advancement of deep-learning technologies, an increasing number of deep learning-based algorithms have been developed to automate and streamline this process. In this study, we present an in-depth survey of the three most critical tasks in cell image analysis: segmentation, tracking, and classification. Despite the impressive score, the challenge still remains: most of the algorithms only verify the performance in their customized settings, causing a performance gap between academic research and practical application. Thus, we also review more advanced machine learning technologies, aiming to make deep learning-based methods more useful and eventually promote the application of deep-learning algorithms.

Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl

Article

Full-text available

Dec 2019
Br J Pharmacol

Segmenting the nuclei of cells in microscopy images is often the first step in the quantitative analysis of imaging data for biological and biomedical applications. Many bioimage analysis tools can segment nuclei in images but need to be selected and configured for every experiment. The 2018 Data Science Bowl attracted 3,891 teams worldwide to make the first attempt to build a segmentation method that could be applied to any two-dimensional light microscopy image of stained nuclei across experiments, with no human interaction. Top participants in the challenge succeeded in this task, developing deep-learning-based models that identified cell nuclei across many image types and experimental conditions without the need to manually adjust segmentation parameters. This represents an important step toward configuration-free bioimage analysis software tools.

Large-scale image-based screening and profiling of cellular phenotypes: Phenotypic Screening and Profiling

Article

Full-text available

Jul 2016
CYTOM PART A

Cellular phenotypes are observable characteristics of cells resulting from the interactions of intrinsic and extrinsic chemical or biochemical factors. Image-based phenotypic screens under large numbers of basal or perturbed conditions can be used to study the influences of these factors on cellular phenotypes. Hundreds to thousands of phenotypic descriptors can also be quantified from the images of cells under each of these experimental conditions. Therefore, huge amounts of data can be generated, and the analysis of these data has become a major bottleneck in large-scale phenotypic screens. Here, we review current experimental and computational methods for large-scale image-based phenotypic screens. Our focus is on phenotypic profiling, a computational procedure for constructing quantitative and compact representations of cellular phenotypes based on the images collected in these screens. © 2016 International Society for Advancement of Cytometry.

Pseudo Mask Augmented Object Detection

Conference Paper

Jun 2018

Simple Does It: Weakly Supervised Instance and Semantic Segmentation

Conference Paper

Jul 2017

U-Net: Convolutional Networks for Biomedical Image Segmentation

Conference Paper

Oct 2015

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

Training deep neural networks on noisy labels with bootstrapping

Jan 2014

Scott Reed

Reed, Scott, et al. "Training deep neural networks on noisy labels with bootstrapping." arXiv preprint arXiv:1412.6596 (2014).

Toward Zero Human Efforts: Iterative Training Framework for Noisy Segmentation Label

Abstract

Recommended publications

Fully Automatic Cell Segmentation with Fourier Descriptors

Unsupervisedly Training GANs for Segmenting Digital Pathology with Automatically Generated Annotatio...

Train, Learn, Expand, Repeat

Weakly-Supervised Crack Detection