Fig 2- uploaded by Olaf Ronneberger
Content may be subject to copyright.

# The 3D u-net architecture. Blue boxes represent feature maps. The number of channels is denoted above each feature map.

Source publication
Article
Full-text available
This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2)...

## Contexts in source publication

Context 1
... biomedical 2D images can be segmented with an accuracy close to human performance by CNNs today [11,12,3]. Due to this success, several attempts have been made to apply 3D CNNs on biomedical volumetric data. Milletari et al. [9] present a CNN combined with a Hough voting approach for 3D segmentation. However, their method is not end-to-end and only works for compact blob-like structures. The approach of Kleesiek et al. [6] is one of few end-to-end 3D CNN approaches for 3D segmentation. However, their network is not deep and has only one max-pooling after the first convolutions; there- fore, it is unable to analyze structures at multiple scales. Our work is based on the 2D u-net [11] which won several international segmentation and tracking competitions in 2015. The architecture and the data augmentation of the u-net allows learning models with very good generalization performance from only few annotated samples. It exploits the fact that properly applied rigid transfor- mations and slight elastic deformations still yield biologically plausible images. Up-convolutional architectures like the fully convolutional networks for semantic segmentation [8] and the u-net are still not wide spread and we know of only one attempt to generalize such an architecture to 3D [14]. In this work by Tran et al., the architecture is applied to videos and full annotation is available for training. The highlight of the present paper is that it can be trained from scratch on sparsely annotated volumes and can work on arbitrarily large volumes due to its seamless tiling strategy. Figure 2 illustrates the network architecture. Like the standard u-net, it has an analysis and a synthesis path each with four resolution steps. In the analysis path, each layer contains two 3 × 3 × 3 convolutions each followed by a rectified linear unit (ReLu), and then a 2 × 2 × 2 max pooling with strides of two in each dimension. In the synthesis path, each layer consists of an upconvolution of 2 × 2 × 2 by strides of two in each dimension, followed by two 3 × 3 × 3 convolutions each followed by a ReLu. Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path. In the last layer a 1×1×1 convolution reduces the number of output channels to the number of labels which is 3 in our case. The architecture has 19069955 parameters in total. Like suggested in [13] we avoid bottlenecks by doubling the number of channels already before max pooling. We also adopt this scheme in the synthesis ...
Context 2
... biomedical 2D images can be segmented with an accuracy close to human performance by CNNs today [11,12,3]. Due to this success, several attempts have been made to apply 3D CNNs on biomedical volumetric data. Milletari et al. [9] present a CNN combined with a Hough voting approach for 3D segmentation. However, their method is not end-to-end and only works for compact blob-like structures. The approach of Kleesiek et al. [6] is one of few end-to-end 3D CNN approaches for 3D segmentation. However, their network is not deep and has only one max-pooling after the first convolutions; there- fore, it is unable to analyze structures at multiple scales. Our work is based on the 2D u-net [11] which won several international segmentation and tracking competitions in 2015. The architecture and the data augmentation of the u-net allows learning models with very good generalization performance from only few annotated samples. It exploits the fact that properly applied rigid transfor- mations and slight elastic deformations still yield biologically plausible images. Up-convolutional architectures like the fully convolutional networks for semantic segmentation [8] and the u-net are still not wide spread and we know of only one attempt to generalize such an architecture to 3D [14]. In this work by Tran et al., the architecture is applied to videos and full annotation is available for training. The highlight of the present paper is that it can be trained from scratch on sparsely annotated volumes and can work on arbitrarily large volumes due to its seamless tiling strategy. Figure 2 illustrates the network architecture. Like the standard u-net, it has an analysis and a synthesis path each with four resolution steps. In the analysis path, each layer contains two 3 × 3 × 3 convolutions each followed by a rectified linear unit (ReLu), and then a 2 × 2 × 2 max pooling with strides of two in each dimension. In the synthesis path, each layer consists of an upconvolution of 2 × 2 × 2 by strides of two in each dimension, followed by two 3 × 3 × 3 convolutions each followed by a ReLu. Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path. In the last layer a 1×1×1 convolution reduces the number of output channels to the number of labels which is 3 in our case. The architecture has 19069955 parameters in total. Like suggested in [13] we avoid bottlenecks by doubling the number of channels already before max pooling. We also adopt this scheme in the synthesis ...

## Citations

... The deep learning based contouring software (INTContour, Carina Medical LLC, Lexington, KY) employs 3D U-Net structure [27] for organ segmentation. The algorithm has achieved good performance in 2017 AAPM thoracic challenge [28] and 2019 RT-MAC challenge [29]. ...
Article
Full-text available
Abstract Background Impaired function of masticatory muscles will lead to trismus. Routine delineation of these muscles during planning may improve dose tracking and facilitate dose reduction resulting in decreased radiation-related trismus. This study aimed to compare a deep learning model with a commercial atlas-based model for fast auto-segmentation of the masticatory muscles on head and neck computed tomography (CT) images. Material and methods Paired masseter (M), temporalis (T), medial and lateral pterygoid (MP, LP) muscles were manually segmented on 56 CT images. CT images were randomly divided into training (n = 27) and validation (n = 29) cohorts. Two methods were used for automatic delineation of masticatory muscles (MMs): Deep learning auto-segmentation (DLAS) and atlas-based auto-segmentation (ABAS). The automatic algorithms were evaluated using Dice similarity coefficient (DSC), recall, precision, Hausdorff distance (HD), HD95, and mean surface distance (MSD). A consolidated score was calculated by normalizing the metrics against interobserver variability and averaging over all patients. Differences in dose (∆Dose) to MMs for DLAS and ABAS segmentations were assessed. A paired t-test was used to compare the geometric and dosimetric difference between DLAS and ABAS methods. Results DLAS outperformed ABAS in delineating all MMs (p 0.05). DLAS based contours had dose endpoints more closely matched with that of the manually segmented when compared with ABAS. Conclusions DLAS auto-segmentation of masticatory muscles for the head and neck radiotherapy had improved segmentation accuracy compared with ABAS with no qualitative difference in dosimetric endpoints compared to manually segmented contours.
... Recent advances in deep learning, such as the emergence of fully convolutional networks (FCN) have enabled the training of models for semantic segmentation tasks 6 . Especially, 3D U-Net, which has a contracting path and a symmetric expanding path, has been proven to be effective for 3D medical image segmentation 7 . Some authors have proposed novel cascaded architectures such as segmentation-by-detection networks and cascaded 3D FCN to improve segmentation performance using region proposal network prior to segmentation [8][9][10][11][12] . ...
Article
Full-text available
Segmentation is fundamental to medical image analysis. Recent advances in fully convolutional networks has enabled automatic segmentation; however, high labeling efforts and difficulty in acquiring sufficient and high-quality training data is still a challenge. In this study, a cascaded 3D U-Net with active learning to increase training efficiency with exceedingly limited data and reduce labeling efforts is proposed. Abdominal computed tomography images of 50 kidneys were used for training. In stage I, 20 kidneys with renal cell carcinoma and four substructures were used for training by manually labelling ground truths. In stage II, 20 kidneys from the previous stage and 20 newly added kidneys were used with convolutional neural net (CNN)-corrected labelling for the newly added data. Similarly, in stage III, 50 kidneys were used. The Dice similarity coefficient was increased with the completion of each stage, and shows superior performance when compared with a recent segmentation network based on 3D U-Net. The labeling time for CNN-corrected segmentation was reduced by more than half compared to that in manual segmentation. Active learning was therefore concluded to be capable of reducing labeling efforts through CNN-corrected segmentation and increase training efficiency by iterative learning with limited data.
... However, the proposed works only use 2D CNN networks or hand-engineered features. Moreover, because fully convolutional network (FCN) has greatly improved the state-of-art in image segmentation, it also plays an important role in the detection of lesions in medical imaging [18][19][20]. ...
Article
Full-text available
Background: As the rupture of cerebral aneurysm may lead to fatal results, early detection of unruptured aneurysms may save lives. At present, the contrast-unenhanced time-of-flight magnetic resonance angiography is one of the most commonly used methods for screening aneurysms. The computer-assisted detection system for cerebral aneurysms can help clinicians improve the accuracy of aneurysm diagnosis. As fully convolutional network could classify the image pixel-wise, its three-dimensional implementation is highly suitable for the classification of the vascular structure. However, because the volume of blood vessels in the image is relatively small, 3D convolutional neural network does not work well for blood vessels. Results: The presented study developed a computer-assisted detection system for cerebral aneurysms in the contrast-unenhanced time-of-flight magnetic resonance angiography image. The system first extracts the volume of interest with a fully automatic vessel segmentation algorithm, then uses 3D-UNet-based fully convolutional network to detect the aneurysm areas. A total of 131 magnetic resonance angiography image data are used in this study, among which 76 are training sets, 20 are internal test sets and 35 are external test sets. The presented system obtained 94.4% sensitivity in the fivefold cross-validation of the internal test sets and obtained 82.9% sensitivity with 0.86 false positive/case in the detection of the external test sets. Conclusions: The proposed computer-assisted detection system can automatically detect the suspected aneurysm areas in contrast-unenhanced time-of-flight magnetic resonance angiography images. It can be used for aneurysm screening in the daily physical examination.
... One crucial step in the evaluation of medical image content is the recognition and segmentation of specific organs or tissues, i.e. performing a voxel-wise classification known as semantic segmentation. Several methods for automated segmentation of MR images have been proposed, including methods relying on explicit models (9), general correspondence (10), random forests (11), as well as deep learning (12) (DL) approaches including convolutional neural networks (CNNs) (13)(14)(15).. Most CNN-based segmentation architectures are derived from the UNet (15), due to its good generalizability and performance (16,17). Landmark detection (18,19) or anatomical object localization (20) have been demonstrated first and can be an important pre-processing step for segmentation. ...
Preprint
Purpose: To enable fast and reliable assessment of subcutaneous and visceral adipose tissue compartments derived from whole-body MRI. Methods: Quantification and localization of different adipose tissue compartments from whole-body MR images is of high interest to examine metabolic conditions. For correct identification and phenotyping of individuals at increased risk for metabolic diseases, a reliable automatic segmentation of adipose tissue into subcutaneous and visceral adipose tissue is required. In this work we propose a 3D convolutional neural network (DCNet) to provide a robust and objective segmentation. In this retrospective study, we collected 1000 cases (66$\pm$ 13 years; 523 women) from the Tuebingen Family Study and from the German Center for Diabetes research (TUEF/DZD), as well as 300 cases (53$\pm$ 11 years; 152 women) from the German National Cohort (NAKO) database for model training, validation, and testing with a transfer learning between the cohorts. These datasets had variable imaging sequences, imaging contrasts, receiver coil arrangements, scanners and imaging field strengths. The proposed DCNet was compared against a comparable 3D UNet segmentation in terms of sensitivity, specificity, precision, accuracy, and Dice overlap. Results: Fast (5-7seconds) and reliable adipose tissue segmentation can be obtained with high Dice overlap (0.94), sensitivity (96.6%), specificity (95.1%), precision (92.1%) and accuracy (98.4%) from 3D whole-body MR datasets (field of view coverage 450x450x2000mm${}^3$). Segmentation masks and adipose tissue profiles are automatically reported back to the referring physician. Conclusion: Automatic adipose tissue segmentation is feasible in 3D whole-body MR data sets and is generalizable to different epidemiological cohort studies with the proposed DCNet.
... U-Net became widely used for medical imaging segmentation and several improvements were soon made. Cicek et al. created a version of U-net capable of using 3D inputs instead of 2D images [26]. Similarly, Milletari et al. proposed V-Net, a volumetric version of U-Net and incorporated the Dice coefficient into the loss function [27]. ...
Article
Full-text available
Radiation oncology for prostate cancer is important as it can decrease the morbidity and mortality associated with this disease. Planning for this modality of treatment is both fundamental, time-consuming and prone to human-errors, leading to potentially avoidable delays in start of treatment. A fundamental step in radiotherapy planning is contouring of radiation targets , where medical specialists contouring, i.e., segment, the boundaries of the structures to be irradiated. Automating this step can potentially lead to faster treatment planning without a decrease in quality, while increasing time available to physicians and also more consistent treatment results. This can be framed as an image segmentation task, which has been studied for many decades in the fields of Computer Vision and Machine Learning. With the advent of Deep Learning, there have been many proposals for different network architectures achieving high performance levels. In this review, we searched the literature for those methods and describe them briefly, grouping those based on Computed Tomography (CT) or Magnetic Resonance Imaging (MRI). This is a booming field, evidenced by the date of the publications found. However, most publications use data from a very limited number of patients, which presents an obstacle to deep learning models training. Although the performance of the models has achieved very satisfactory results, there is still room for improvement, and there is arguably a long way before these models can be used safely and effectively in clinical practice.
... Thus, temporal as well as spatial information is available for the convolution kernels and in theory enables learning of spatial and temporal features. The idea follows (Cicek et al., 2016), which adapted U-Net (Ronneberger et al., 2015) for the segmentation of 3D data. In our implementation, bands of three input frames are stacked according to their colour spectrum (R 1 R 2 R 3 G 1 G 2 G 3 B 1 B 2 B 3 ), which forms the input of the network. ...
Preprint
Full-text available
Various lake observables, including lake ice, are related to climate and climate change and provide a good opportunity for long-term monitoring. Lakes (and as part of them lake ice) is therefore considered an Essential Climate Variable (ECV) of the Global Climate Observing System (GCOS). Following the need for an integrated multi-temporal monitoring of lake ice in Switzerland, MeteoSwiss in the framework of GCOS Switzerland supported this 2-year project to explore not only the use of satellite images but also the possibilities of Webcams and in-situ measurements. The aim of this project is to monitor some target lakes and detect the extent of ice and especially the ice-on/off dates, with focus on the integration of various input data and processing methods. The target lakes are: St. Moritz, Silvaplana, Sils, Sihl, Greifen and Aegeri, whereby only the first four were mainly frozen during the observation period and thus processed. The observation period was mainly the winter 2016-17. During the project, various approaches were developed, implemented, tested and compared. Firstly, low spatial resolution (250 - 1000 m) but high temporal resolution (1 day) satellite images from the optical sensors MODIS and VIIRS were used. Secondly, and as a pilot project, the use of existing public Webcams was investigated for (a) validation of results from satellite data, and (b) independent estimation of lake ice, especially for small lakes like St. Moritz, that could not be possibly monitored in the satellite images. Thirdly, in-situ measurements were made in order to characterize the development of the temperature profiles and partly pressure before freezing and under the ice-cover until melting. This report presents the results of the project work.
... In our work, we chose to use a fully 3D U-Net-type [3] CNN. To circumvent the problem of preserving both high-level features and high resolution, we use cascading networks (as e.g. in [4]). ...
Article
Task Based Semantic Segmentation of Soft X-ray CT Images Using 3D Convolutional Neural Networks - Axel Ekman, Jian-Hua Chen, Gerry Mc Dermott, Mark A. Le Gros, Carolyn Larabell
... Therefore, it would not work on routine contrast-enhanced CT scans which are done during the portal venous phase. Recently, in [11], the 3D U-Net [4] is trained for small bowel segmentation with sparsely annotated CT volumes (Seven axial slices for each volume) to avoid making dense annotation. ...
Preprint
We present a novel method for small bowel segmentation where a cylindrical topological constraint based on persistent homology is applied. To address the touching issue which could break the applied constraint, we propose to augment a network with an additional branch to predict an inner cylinder of the small bowel. Since the inner cylinder is free of the touching issue, a cylindrical shape constraint applied on this augmented branch guides the network to generate a topologically correct segmentation. For strict evaluation, we achieved an abdominal computed tomography dataset with dense segmentation ground-truths. The proposed method showed clear improvements in terms of four different metrics compared to the baseline method, and also showed the statistical significance from a paired t-test.
... Computer-aided diagnosis of brain tumors is also an active field of study in medical imaging. In this regard, over the years research has been carried out for not only the classification and detection [51], [52], [53] of brain tumor types but also their segmentation [54], [55] using deep learning methods. An automatic method for brain tumor segmentation from 3D MRI images was proposed by H. Dong et al. [56]. ...
... Our aim is to provide a segmentation framework for different types of MS lesions, large and small, with a minimum lesion size of 3 voxels as recommended in the guidelines for MS CLs [23]. We propose a fully-convolutional architecture inspired by the 3D U-Net [24]. Compared to our previous studies, we significantly extend our cohort of patients to 90 subjects from two different clinical centers. ...
Article
Full-text available
The presence of cortical lesions in multiple sclerosis patients has emerged as an important biomarker of the disease. They appear in the earliest stages of the illness and have been shown to correlate with the severity of clinical symptoms. However, cortical lesions are hardly visible in conventional magnetic resonance imaging (MRI) at 3T, and thus their automated detection has been so far little explored. In this study, we propose a fully-convolutional deep learning approach, based on the 3D U-Net, for the automated segmentation of cortical and white matter lesions at 3T. For this purpose, we consider a clinically plausible MRI setting consisting of two MRI contrasts only: one conventional T2-weighted sequence (FLAIR), and one specialized T1-weighted sequence (MP2RAGE). We include 90 patients from two different centers with a total of 728 and 3856 gray and white matter lesions, respectively. We show that two reference methods developed for white matter lesion segmentation are inadequate to detect small cortical lesions, whereas our proposed framework is able to achieve a detection rate of 76% for both cortical and white matter lesions with a false positive rate of 29% in comparison to manual segmentation. Further results suggest that our framework generalizes well for both types of lesion in subjects acquired in two hospitals with different scanners.