Article

ResAttenGAN: Simultaneous segmentation of multiple spinal structures on axial lumbar MRI image using residual attention and adversarial learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An axial MRI image of the lumbar spine generally contains multiple spinal structures and their simultaneous segmentation will help analyze the pathogenesis of the spinal disease, generate the spinal medical report, and make a clinical surgery plan for the treatment of the spinal disease. However, it is still a challenging issue that multiple spinal structures are segmented simultaneously and accurately because of the large diversities of the same spinal structure in intensity, resolution, position, shape, and size, the implicit borders between different structures, and the overfitting problem caused by the insufficient training data. In this paper, we propose a novel network framework ResAttenGAN to address these challenges and achieve the simultaneous and accurate segmentation of disc, neural foramina, thecal sac, and posterior arch. ResAttenGAN comprises three modules, i.e. full feature fusion (FFF) module, residual refinement attention (RRA) module, and adversarial learning (AL) module. The FFF module captures multi-scale feature information and fully fuse the features at all hierarchies for generating the discriminative feature representation. The RRA module is made up of a local position attention block and a residual border refinement block to accurately locate the implicit borders and refine their pixel-wise classification. The AL module smooths and strengthens the higher-order spatial consistency to solve the overfitting problem. Experimental results show that the three integrated modules in ResAttenGAN have advantages in tackling the above challenges and ResAttenGAN outperforms the existing segmentation methods under evaluation metrics.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An axial MRI image of the lumbar spine often contains multiple spinal structures, and its simultaneous segmentation will help analyze the pathogenesis of spine disease, generate the spine medical report, and establish a clinical surgical plan for the treatment of spine disease [7]. However, simultaneous and accurate segmentation of multiple spine structures is challenging due to the large variations in density, resolution, location, shape, and size of the same spine structure, implicit boundaries between different structures, and overfitting. ...
Article
Full-text available
The most important information to be noted about LSS not a hernia. While a hernia occurs with a rupture in the disc, LSS occurs as a result of calcification due to deformation of the bone in the following years. In addition, the correct interpretation and diagnosis of biomedical images requires serious expertise, making the diagnosis of LSS difficult. Looking at the literature, the U-Net method can perform semantic segmentation with high success. In recent years, it has been seen in the literature that the success of the classical U-Net has increased when the architecture of different deep learning methods has been applied. In order to segment the LSS region, semantic segmentation was performed on lumbar spine MR images with 3 different deep learning methods. The success of these methods was calculated by Dice and IoU scores. The highest segmentation success among 1560 images was obtained in the ResUNet model with 0.93 DICE score. LSS treatment, which negatively affects human life, is very important because of the difficulty of interpreting MR images and the confusion of LSS with lumbar hernia. Today, expert decision support systems have become essential for correct diagnosis, which is the most important feature of starting a treatment/surgical operation. Especially the high success of classification/segmentation obtained by deep learning methods has also been demonstrated in LSS segmentation, which is the subject of our study.
... The transverse axis image primarily concerns the T2-weighted scanned image. For the automatic diagnosis of the lumbar disc herniation method, most existing studies only use the images of the central sagittal plane [6][7][8] or the paradigm of object detection to design the model [9,10], labeling of intervertebral disc tissues of different morphologies and design of assisted diagnostic models using the paradigm of target detection and segmentation [11,12], which may ignore much important information. However, the cross-sectional image contains more detailed information. ...
Article
Full-text available
Lumbar disc herniation is a common disease that causes low back pain. Due to the high cost of medical diagnosis, as well as a shortage and uneven distribution of medical resources, a system that can automatically analyze and diagnose lumbar spine Magnetic Resonance Imaging (MRI) is becoming an urgent need. This study uses deep learning methods to establish a classifier to diagnose lumbar disc herniation. An MRI classification dataset of lumbar disc herniation consisting of public MRI images is presented and is used to train the proposed classifier. Because a common difficulty in applying computer vision technology to medical images is labeling training data, we use a semi-supervised model training method, while multilayer transverse axial MRI images are used as the model input. In this method, we first use unlabelled MRI images for random self-supervised pre-training and the pre-trained model as a feature extractor for MRI images. Then, all marked cross-sections of each intervertebral disc are used to calculate the feature vector through the feature extractor. The information of all feature vectors is integrated, while a multilayer perceptron is used for classification training. After training, the model achieved 87.11 $$\%$$ % accuracy, 87.50 $$\%$$ % sensitivity, 86.72 $$\%$$ % specificity and 0.9487 AUC (Area Under the ROC Curve) index on the test set. To analyze the rationality of the diagnostic results more quickly, we output the severity of degenerative changes in each region using a heatmap.
... The segmentation of magnetic resonance (MR) images has various applications in the process of disease's diagnosis [1], treatment planning [2], and quantification of image-derived metrics [3]. One of these applications is the generation of pseudo computed tomography (CT) images for positron emission tomography (PET) attenuation correction [4]. ...
Article
The segmentation of magnetic resonance (MR) images is a crucial task for creating pseudo computed tomography (CT) images which are used to achieve positron emission tomography (PET) attenuation correction. One of the main challenges of creating pseudo CT images is the difficulty to obtain an accurate segmentation of the bone tissue in brain MR images. Deep convolutional neural networks (CNNs) have been widely and efficiently applied to perform MR image segmentation. The aim of this work is to propose a segmentation approach that combines multiresolution handcrafted features with CNN-based features to add directional properties and enrich the set of features to perform segmentation. The main objective is to efficiently segment the brain into three tissue classes: bone, soft tissue, and air. The proposed method combines non subsampled Contourlet (NSCT) and non subsampled Shearlet (NSST) coefficients with CNN’s features using different mechanisms. The entropy value is calculated to select the most useful coefficients and reduce the input’s dimensionality. The segmentation results are evaluated using fifty clinical brain MR and CT images by calculating the precision, recall, dice similarity coefficient (DSC), and Jaccard similarity coefficient (JSC). The results are also compared to other methods reported in the literature. The DSC of the bone class is improved from 0.6179 ± 0.0006 to 0.6416 ± 0.0006. The addition of multiresolution features of NSCT and NSST with CNN’s features demonstrates promising results. Moreover, NSST coefficients provide more useful information than NSCT coefficients.
Article
Full-text available
Accurate classification and segmentation of brain tumors is a critical task to perform. The term classification is the process of grading tumors i.e., whether the tumor is Malignant (cancerous) and Benign (not cancerous), and segmentation is the process of extracting the region of interest. In the last few years, with the development of approaches like computer vision, deep learning, and machine learning algorithms, Magnetic Resonance Images (MRI) are the most widely used modality for the purpose of tumor screening and diagnosing. This process is automated in nature and also attain higher accuracy. Nowadays, physicians also practice MRI automated diagnosis systems, so that the diagnosis is faster, reliable, automated, reproducible, and more prominently less expensive. So here in this paper, we present an extensive survey of brain tumor classification and segmentation approaches based on MRI images. This manuscript mainly explores recently used deep learning methods and approaches. Finally, the paper concludes with various state-of-the-art findings.
Article
Full-text available
Accurate image segmentation plays an essential role in diagnosing and treating various spinal diseases. However, traditional segmentation methods often consume a lot of time and energy. This research proposes an innovative deep‐learning‐based automatic segmentation method for spine magnetic resonance imaging (MRI) images. The proposed method DAUNet++ is supported by UNet++, which adds residual structure and attention mechanism. Specifically, a residual block is utilized for down‐sampling to construct the RVNet, as a new skeleton structure. Furthermore, two novel types of dual channel and spatial attention modules are proposed to emphasize rich feature regions, enhance useful information, and improve the network performance by recalibrating the characteristic. The published spinesagt2wdataset3 spinal MRI image dataset is adopted in the experiment. The dice similarity coefficient score on the test set is 0.9064. Higher segmentation accuracy and efficiency are achieved, indicating the effectiveness of the proposed method.
Article
Recently, Artificial Intelligence namely Deep Learning methods have revolutionized a wide range of domains and applications. Besides, Digital Pathology has so far played a major role in the diagnosis and the prognosis of tumors. However, the characteristics of the Whole Slide Images namely the gigapixel size, high resolution and the shortage of richly labeled samples have hindered the efficiency of classical Machine Learning methods. That goes without saying that traditional methods are poor in generalization to different tasks and data contents. Regarding the success of Deep learning when dealing with Large Scale applications, we have resorted to the use of such models for histopathological image segmentation tasks. First, we review and compare the classical UNet and Att-UNet models for colon cancer WSI segmentation in a sparsely annotated data scenario. Then, we introduce novel enhanced models of the Att-UNet where different schemes are proposed for the skip connections and spatial attention gates positions in the network. In fact, spatial attention gates assist the training process and enable the model to avoid irrelevant feature learning. Alternating the presence of such modules namely in our Alter-AttUNet model adds robustness and ensures better image segmentation results. In order to cope with the lack of richly annotated data in our AiCOLO colon cancer dataset, we suggest the use of a multi-step training strategy that also deals with the WSI sparse annotations and unbalanced class issues. All proposed methods outperform state-of-the-art approaches but Alter-AttUNet generates the best compromise between accurate results and light network. The model achieves 95.88% accuracy with our sparse AiCOLO colon cancer datasets. Finally, to evaluate and validate our proposed architectures we resort to publicly available WSI data: the NCT-CRC-HE-100K, the CRC-5000 and the Warwick colon cancer histopathological dataset. Respective accuracies of 99.65%, 99.73% and 79.03% were reached. A comparison with state-of-art approaches is established to view and compare the key solutions for histopathological image segmentation.
Article
Full-text available
Automated medical report generation in spine radiology, i.e., given spinal medical images and directly create radiologist-level diagnosis reports to support clinical decision making, is a novel yet fundamental study in the domain of artificial intelligence in healthcare. However, it is incredibly challenging because it is an extremely complicated task that involves visual perception and high-level reasoning processes. In this paper, we propose the neural-symbolic learning (NSL) framework that performs human-like learning by unifying deep neural learning and symbolic logical reasoning for the spinal medical report generation. Generally speaking, the NSL framework firstly employs deep neural learning to imitate human visual perception for detecting abnormalities of target spinal structures. Concretely, we design an adversarial graph network that interpolates a symbolic graph reasoning module into a generative adversarial network through embedding prior domain knowledge, achieving semantic segmentation of spinal structures with high complexity and variability. NSL secondly conducts human-like symbolic logical reasoning that realizes unsupervised causal effect analysis of detected entities of abnormalities through meta-interpretive learning. NSL finally fills these discoveries of target diseases into a unified template, successfully achieving a comprehensive medical report generation. When employed in a real-world clinical dataset, a series of empirical studies demonstrate its capacity on spinal medical report generation and show that our algorithm remarkably exceeds existing methods in the detection of spinal structures. These indicate its potential as a clinical tool that contributes to computer-aided diagnosis.
Article
Full-text available
Traumatic Brain Injury (TBI) is a major cause of death and disability worldwide. Automated brain hematoma segmentation and outcome prediction for patients with TBI can effectively facilitate patient management. In this study, we propose a novel Multi-view convolutional neural network with a mixed loss to segment total acute hematoma on head CT scans collected within 24 hours after the injury. Based on the automated segmentation, the volumetric distribution and shape characteristics of the hematoma were extracted and combined with other clinical observations to predict 6-month mortality. The proposed hematoma segmentation network achieved an average Dice coefficient of 0.697 and an intraclass correlation coefficient of 0.966 between the volumes estimated from the predicted hematoma segmentation and volumes of the annotated hematoma segmentation on the test set. Compared with other published methods, the proposed method has the most accurate segmentation performance and volume estimation. For 6-month mortality prediction, the model achieved an average area under the precision-recall curve (AUCPR) of 0.559 and area under the receiver operating characteristic curve (AUC) of 0.853 using 10-fold cross-validation on a dataset consisting of 828 patients. The average AUCPR and AUC of the proposed model are respectively more than 10% and 5% higher than those of the widely used IMPACT model.
Article
Full-text available
Simultaneous and automatic segmentation of the blood pool and myocardium is an important precondition for early diagnosis and pre-operative planning in patients with complex congenital heart disease. However, due to the high diversity of cardiovascular structures and changes in mechanical properties caused by cardiac defects, the segmentation task still faces great challenges. To overcome these challenges, in this study we propose an integrated multi-task deep learning framework based on the dilated residual and hybrid pyramid pooling network (DRHPPN) for joint segmentation of the blood pool and myocardium. The framework consists of three closely connected progressive sub-networks. An inception module is used to realize the initial multi-level feature representation of cardiovascular images. A dilated residual network (DRN), as the main body of feature extraction and pixel classification , preliminary predicts segmentation regions. A hybrid pyramid pooling network (HPPN) is designed for facilitating the aggregation of local information to global information, which complements DRN. Extensive experiments on three-dimensional cardiovascular magnetic resonance (CMR) images (the available dataset of the MICCAI 2016 HVSMR challenge) demonstrate that our approach can accurately segment the blood pool and myocardium and achieve competitive performance compared with state-of-the-art segmenta-tion methods.
Article
Full-text available
We propose a semi-automatic algorithm for the segmentation of vertebral bodies in magnetic resonance (MR) images of the human lumbar spine. Quantitative analysis of spine MR images often necessitate segmentation of the image into specific regions representing anatomic structures of interest. Existing algorithms for vertebral body segmentation require heavy inputs from the user, which is a disadvantage. For example, the user needs to define individual regions of interest (ROIs) for each vertebral body, and specify parameters for the segmentation algorithm. To overcome these drawbacks, we developed a semi-automatic algorithm that considerably reduces the need for user inputs. First, we simplified the ROI placement procedure by reducing the requirement to only one ROI, which includes a vertebral body; subsequently, a correlation algorithm is used to identify the remaining vertebral bodies and to automatically detect the ROIs. Second, the detected ROIs are adjusted to facilitate the subsequent segmentation process. Third, the segmentation is performed via graph-based and line-based segmentation algorithms. We tested our algorithm on sagittal MR images of the lumbar spine and achieved a 90% dice similarity coefficient, when compared with manual segmentation. Our new semi-automatic method significantly reduces the user’s role while achieving good segmentation accuracy.
Article
Full-text available
Spinal clinicians still rely on laborious workloads to conduct comprehensive assessments of multiple spinal structures in MRIs, in order to detect abnormalities and discover possible pathological factors. The objective of this work is to perform automated segmentation and classification (i.e., normal and abnormal) of intervertebral discs, vertebrae, and neural foramen in MRIs in one shot, which is called semantic segmentation that is extremely urgent to assist spinal clinicians in diagnosing neural foraminal stenosis, disc degeneration, and vertebral deformity as well as discovering possible pathological factors. However, no work has simultaneously achieved the semantic segmentation of intervertebral discs, vertebrae, and neural foramen due to three-fold unusual challenges: 1) Multiple tasks, i.e., simultaneous semantic segmentation of multiple spinal structures, are more difficult than individual tasks; 2) Multiple targets: average 21 spinal structures per MRI require automated analysis yet have high variety and variability; 3) Weak spatial correlations and subtle differences between normal and abnormal structures generate dynamic complexity and indeterminacy. In this paper, we propose a Recurrent Generative Adversarial Network called Spine-GAN for resolving above-aforementioned challenges. Firstly, Spine-GAN explicitly solves the high variety and variability of complex spinal structures through an atrous convolution (i.e., convolution with holes) autoencoder module that is capable of obtaining semantic task-aware representation and preserving fine-grained structural information. Secondly, Spine-GAN dynamically models the spatial pathological correlations between both normal and abnormal structures thanks to a specially designed long short-term memory module. Thirdly, Spine-GAN obtains reliable performance and efficient generalization by leveraging a discriminative network that is capable of correcting predicted errors and global-level contiguity. Extensive experiments on MRIs of 253 patients have demonstrated that Spine-GAN achieves high pixel accuracy of 96.2%, Dice coefficient of 87.1%, Sensitivity of 89.1% and Specificity of 86.0%, which reveals its effectiveness and potential as a clinical tool.
Conference Paper
Full-text available
In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1]. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/.
Conference Paper
Full-text available
Mass segmentation provides effective morphological features which are important for mass diagnosis. In this work, we propose a novel end-to-end network for mammographic mass segmentation which employs a fully convolutional network (FCN) to model a potential function, followed by a conditional random field (CRF) to perform structured learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with a position priori. Further, we employ adversarial training to eliminate over-fitting due to the small sizes of mammogram datasets. Multi-scale FCN is employed to improve the segmentation performance. Experimental results on two public datasets, INbreast and DDSM-BCRP, demonstrate that our end-to-end network achieves better performance than state-of-the-art approaches. 1
Article
Full-text available
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Poster
Full-text available
Diseases of the spine are quite common, especially due to degenerative changes of the ligamentary and osseous structures. When making the decision for adequate procedure neuro-imaging plays a main role for estimating the dimension of surgical treatment. Accurate and objective evaluation of vertebral deformations is of significant importance in clinical diagnostics and therapy of pathological conditions affecting the spine. A Computer assisted diagnosis system aims to facilitate characterization and quantification of abnormalities. Our aim is to perform semi-automated segmentation of vertebral bodies derived from MRI acquisitions to speed-up a pure manually analysis.
Article
Full-text available
Objective: The purpose of this study was to evaluate the reliability of a new magnetic resonance imaging (MRI) grading system for cervical neural foraminal stenosis (NFS). Materials and methods: Cervical NFS at bilateral C4/5, C5/6, and C6/7 was classified into the following three grades based on the T2-weighted axial images: Grade 0 = absence of NFS, with the narrowest width of the neural foramen greater than the width of the extraforaminal nerve root (EFNR); Grade 1 = the narrowest width of the neural foramen the same or less than (but more than 50% of) the width of the EFNR; Grade 2 = the width of the neural foramen the same or less than 50% of the width of the EFNR. The MRIs of 96 patients who were over 60 years old (M:F = 50:46; mean age 68.4 years; range 61-86 years) were independently analyzed by seven radiologists. Interobserver and intraobserver agreements were analyzed using the percentage agreement, kappa statistics, and intraclass correlation coefficient (ICC). Results: For the distinction among the three individual grades at all six neural foramina, the ICC ranged from 0.68 to 0.73, indicating fair to good reproducibility. The percentage agreement ranged from 60.2% to 70.6%, and the kappa values (κ = 0.50-0.58) indicated fair to moderate agreement. The percentages of intraobserver agreement ranged from 85.4% to 93.8% (κ = 0.80-0.92), indicating near perfect agreement. Conclusion: The new MRI grading system shows sufficient interobserver and intraobserver agreement to reliably assess cervical NFS.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the fully convolutional network (FCN) architecture and its variants. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. The design of SegNet was primarily motivated by road scene understanding applications. Hence, it is efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than competing architectures and can be trained end-to-end using stochastic gradient descent. We also benchmark the performance of SegNet on Pascal VOC12 salient object segmentation and the recent SUN RGB-D indoor scene understanding challenge. We show that SegNet provides competitive performance although it is significantly smaller than other architectures. We also provide a Caffe implementation of SegNet and a webdemo at http://mi.eng.cam.ac.uk/projects/segnet/
Article
Full-text available
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Full-text available
Despite its clinical importance, accurate identification of vertebral fractures is problematic and time-consuming. There is a recognized need to improve the detection of vertebral fractures so that appropriate high-risk patients can be selected to initiate clinically beneficial therapeutic interventions. The objective of this study was to develop and evaluate semi-automatic algorithms for detailed annotation of vertebral bodies from T4 to L4 in digitised lateral spinal Dual-energy X-ray absorptiometry (DXA) vertebral fracture assessment (VFA) images. Using lateral spinal DXA VFA images from subjects imaged at a University Hospital fracture liaison service (FLS), image algorithms were developed for semi-automatic detailed annotation of vertebral bodies from T4 to L4. 201 women aged 50 years or older with non-vertebral fractures. Algorithm accuracy and precision. Statistical models of vertebral shape and appearance from T4 to L4 were constructed using VFA images from 130 subjects. The resulting models form part of an algorithm for performing semi-automatic detailed annotation of vertebral bodies from T4 to L4. Algorithm accuracy and precision were evaluated on a test-set of 71 independent images. Overall accuracy was 0.72 mm (3.00% of vertebral height) and overall precision was 0.26 mm (1.11%) for point-to-line distance. Accuracy and precision were best on normal vertebrae (0.65 mm (2.67%) and 0.21 mm (0.90%), respectively) and mild fractures (0.78 mm (3.18%) and 0.32 mm(1.39%), respectively), but accuracy and precision errors were higher for moderate (1.07 mm (4.66%) and 0.48 mm (2.15%), respectively) and severe fractures (2.07 mm (9.65%) and 1.10 mm (5.09%), respectively). Accuracy and precision results for the algorithm were comparable with other reported results in the literature. This semi-automatic image analysis had high overall accuracy and precision on normal vertebrae and mild fractures, but performed less well in moderate and severe fractures. It is therefore a useful tool to identify normality of vertebral shape and to identify mild fractures. Copyright © 2015 Elsevier Inc. All rights reserved.
Conference Paper
Full-text available
Segmentation of vertebral structures in magnetic resonance (MR) images is challenging because of poor con­trast between bone surfaces and surrounding soft tissue. This paper describes a semi-automatic method for segmenting vertebral bodies in multi-slice MR images. In order to achieve a fast and reliable segmentation, the method takes advantage of the correlation between shape and pose of different vertebrae in the same patient by using a statistical multi-vertebrae anatomical shape+pose model. Given a set of MR images of the spine, we initially reduce the intensity inhomogeneity in the images by using an intensity-correction algorithm. Then a 3D anisotropic diffusion filter smooths the images. Afterwards, we extract edges from a relatively small region of the pre-processed image with a simple user interaction. Subsequently, an iterative Expectation Maximization tech­nique is used to register the statistical multi-vertebrae anatomical model to the extracted edge points in order to achieve a fast and reliable segmentation for lumbar vertebral bodies. We evaluate our method in terms of speed and accuracy by applying it to volumetric MR images of the spine acquired from nine patients. Quantitative and visual results demonstrate that the method is promising for segmentation of vertebral bodies in volumetric MR images.
Article
Full-text available
Clinical routine often requires to analyze spinal images of multiple anatomic structures in multiple anatomic planes from multiple imaging modalities (M3). Unfortunately, existing methods for segmenting spinal images are still limited to one specific structure, in one specific plane or from one specific modality (S3). In this paper, we propose a novel approach, Regression Segmentation, that is for the first time able to segment M3 spinal images in one single unified framework. This approach formulates the segmentation task innovatively as a boundary regression problem: modeling a highly nonlinear mapping function from substantially diverse M3 images directly to desired object boundaries. Leveraging the advancement of sparse kernel machines, regression segmentation is fulfilled by a multi-dimensional support vector regressor (MSVR) which operates in an implicit, high dimensional feature space where M3 diversity and specificity can be systematically categorized, extracted, and handled. The proposed regression segmentation approach was thoroughly tested on images from 113 clinical subjects including both disc and vertebral structures, in both sagittal and axial planes, and from both MRI and CT modalities. The overall result reaches a high dice similarity index (DSI) 0.912 and a low boundary distance (BD) 0.928 mm. With our unified and expendable framework, an efficient clinical tool for M3 spinal image segmentation can be easily achieved, and will substantially benefit the diagnosis and treatment of spinal diseases.
Article
Full-text available
The application of kinematic data acquired during biomechanical testing to specimen-specific, three-dimensional models of the spine has emerged as a useful tool in spine biomechanics research. However, the development of these models is subject to segmentation error because of complex morphology and pathologic changes of the spine. This error has not been previously characterized. Eight cadaveric lumbar spines were prepared and underwent computed tomography (CT) scanning. After disarticulation and soft-tissue removal, 5 individual vertebrae from these specimens were scanned a second time. The CT images of the full lumbar specimens were segmented twice each by 2 operators, and the images of the individual vertebrae with soft tissue removed were segmented as well. The solid models derived from these differing segmentation sessions were registered, and the distribution of distances between nearest neighboring points was calculated to evaluate the accuracy and precision of the segmentation technique. Manual segmentation yielded root-mean-square errors below 0.39 mm for accuracy, 0.33 mm for intrauser precision, and 0.35 mm for interuser precision. Furthermore, the 95th percentile of all distances was below 0.75 mm for all analyses of accuracy and precision. These findings indicate that such models are highly accurate and that a high level of intrauser and interuser precision can be achieved. The magnitude of the error presented here should inform the design and interpretation of future studies using manual segmentation techniques to derive models of the lumbar spine.
Article
Full-text available
Examinations of the spinal column with both, Magnetic Resonance (MR) imaging and Computed Tomography (CT), often require a precise three-dimensional positioning, angulation and labeling of the spinal disks and the vertebrae. A fully automatic and robust approach is a prerequisite for an automated scan alignment as well as for the segmentation and analysis of spinal disks and vertebral bodies in Computer Aided Diagnosis (CAD) applications. In this article, we present a novel method that combines Marginal Space Learning (MSL), a recently introduced concept for efficient discriminative object detection, with a generative anatomical network that incorporates relative pose information for the detection of multiple objects. It is used to simultaneously detect and label the spinal disks. While a novel iterative version of MSL is used to quickly generate candidate detections comprising position, orientation, and scale of the disks with high sensitivity, the anatomical network selects the most likely candidates using a learned prior on the individual nine dimensional transformation spaces. Finally, we propose an optional case-adaptive segmentation approach that allows to segment the spinal disks and vertebrae in MR and CT respectively. Since the proposed approaches are learning-based, they can be trained for MR or CT alike. Experimental results based on 42 MR and 30 CT volumes show that our system not only achieves superior accuracy but also is among the fastest systems of its kind in the literature. On the MR data set the spinal disks of a whole spine are detected in 11.5s on average with 98.6% sensitivity and 0.073 false positive detections per volume. On the CT data a comparable sensitivity of 98.0% with 0.267 false positives is achieved. Detected disks are localized with an average position error of 2.4mm/3.2mm and angular error of 3.9°/4.5° in MR/CT, which is close to the employed hypothesis resolution of 2.1mm and 3.3°.
Article
In this paper, we propose and validate a deep learning framework that incorporates both multi-atlas registration and level-set for segmenting pancreas from CT volume images. The proposed segmentation pipeline consists of three stages, namely coarse, fine, and refine stages. Firstly, a coarse segmentation is obtained through multi-atlas based 3D diffeomorphic registration and fusion. After that, to learn the connection feature, a 3D patch-based convolutional neural network (CNN) and three 2D slice-based CNNs are jointly used to predict a fine segmentation based on a bounding box determined from the coarse segmentation. Finally, a 3D level-set method is used, with the fine segmentation being one of its constraints, to integrate information of the original image and the CNN-derived probability map to achieve a refine segmentation. In other words, we jointly utilize global 3D location information (registration), contextual information (patch-based 3D CNN), shape information (slice-based 2.5D CNN) and edge information (3D level-set) in the proposed framework. These components form our cascaded coarse-fine-refine segmentation framework. We test the proposed framework on three different datasets with varying intensity ranges obtained from different resources, respectively containing 36, 82 and 281 CT volume images. In each dataset, we achieve an average Dice score over 82%, being superior or comparable to other existing state-of-the-art pancreas segmentation algorithms.
Chapter
Multi-vertebrae segmentation plays an important role in spine diseases diagnosis and treatment planning. Global spatial dependencies between vertebrae are essential prior information for automatic multi-vertebrae segmentation. However, due to the lack of global information, previous methods have to localize specific vertebrae regions first, then segment and recognize the vertebrae in the region, resulting in a reduction in feature reuse and increase in computation. In this paper, we propose to leverage both global spatial and label information for multi-vertebrae segmentation from arbitrary MR images in one go. Specifically, a spatial graph convolutional network (GCN) is designed to first automatically learn an adjacency matrix and construct a graph on local feature maps, then adopt stacked GCN to capture the global spatial relationships between vertebrae. A label attention network is built to predict the appearance probabilities of all vertebrae using attention mechanism to reduce the ambiguity caused by variant FOV or similar appearances of adjacent vertebrae. The proposed method is trained in an end-to-end manner and evaluated on a challenging dataset of 292 MRI scans with various fields of view, image characteristics and vertebra deformations. The experimental results show that our method achieves high performance (\(89.28\pm 5.21\) of IDR and \(85.37\pm 4.09\%\) of mIoU) from arbitrary input images.
Article
In this paper, we embed two types of attention modules in the dilated fully convolutional network (FCN) to solve biomedical image segmentation tasks efficiently and accurately. Different from previous work on image segmentation through multiscale feature fusion, we propose the fully convolutional attention network (FCANet) to aggregate contextual information at long-range and short-range distances. Specifically, we add two types of attention modules, the spatial attention module and the channel attention module, to the Res2Net network, which has a dilated strategy. The features of each location are aggregated through the spatial attention module, so that similar features promote each other in space size. At the same time, the channel attention module treats each channel of the feature map as a feature detector and emphasizes the channel dependency between any two channel maps. Finally, we weight the sum of the output features of the two types of attention modules to retain the feature information of the long-range and short-range distances, to further improve the representation of the features and make the biomedical image segmentation more accurate. In particular, we verify that the proposed attention module can seamlessly connect to any end-to-end network with minimal overhead. We perform comprehensive experiments on three public biomedical image segmentation datasets, i.e., the Chest X-ray collection, the Kaggle 2018 data science bowl and the Herlev dataset. The experimental results show that FCANet can improve the segmentation effect of biomedical images. The source code models are available at https://github.com/luhongchun/FCANet
Article
Computer vision systems have numerous tools to assist in various medical fields, notably in image diagnosis. Computed tomography (CT) is the principal imaging method used to assist in the diagnosis of diseases such as bone fractures, lung cancer, heart disease, and emphysema, among others. Lung cancer is one of the four main causes of death in the world. The lung regions in the CT images are marked manually by a specialist as this initial step is a significant challenge for computer vision techniques. Once defined, the lung regions are segmented for clinical diagnoses. This work proposes an automatic segmentation of the lungs in CT images, using the Convolutional Neural Network (CNN) Mask R-CNN, to specialize the model for lung region mapping, combined with supervised and unsupervised machine learning methods (Bayes, Support Vectors Machine (SVM), K-means and Gaussian Mixture Models (GMMs)). Our approach using Mask R-CNN with the K-means kernel produced the best results for lung segmentation reaching an accuracy of 97.68 ± 3.42% and an average runtime of 11.2 s. We compared our results against other works for validation purposes, and our approach had the highest accuracy and was faster than some state-of-the-art methods.
Article
Nowadays, Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) has demonstrated to be a valid complementary diagnostic tool for early detection and diagnosis of breast cancer. However, without a CAD (Computer Aided Detection) system, manual DCE-MRI examination can be difficult and error-prone. The early stage of breast tissue segmentation, in a typical CAD, is crucial to increase reliability and reduce the computational effort by reducing the number of voxels to analyze and removing foreign tissues and air. In recent years, the deep convolutional neural networks (CNNs) enabled a sensible improvement in many visual tasks automation, such as image classification and object recognition. These advances also involved radiomics, enabling high-throughput extraction of quantitative features, resulting in a strong improvement in automatic diagnosis through medical imaging. However, machine learning and, in particular, deep learning approaches are gaining popularity in the radiomics field for tissue segmentation. This work aims to accurately segment breast parenchyma from the air and other tissues (such as chest-wall) by applying an ensemble of deep CNNs on 3D MR data. The novelty, besides applying cutting-edge techniques in the radiomics field, is a multi-planar combination of U-Net CNNs by a suitable projection-fusing approach, enabling multi-protocol applications. The proposed approach has been validated over two different datasets for a total of 109 DCE-MRI studies with histopathologically proven lesions and two different acquisition protocols. The median dice similarity index for both the datasets is 96.60 % (±0.30 %) and 95.78 % (±0.51 %) respectively with p < 0.05, and 100% of neoplastic lesion coverage.
Article
Cancer is the second leading cause of death after cardiovascular diseases. Out of all types of cancer, brain cancer has the lowest survival rate. Brain tumors can have different types depending on their shape, texture, and location. Proper diagnosis of the tumor type enables the doctor to make the correct treatment choice and help save the patient's life. There is a high need in the Artificial Intelligence field for a Computer Assisted Diagnosis (CAD) system to assist doctors and radiologists with the diagnosis and classification of tumors. Over recent years, deep learning has shown an optimistic performance in computer vision systems. In this paper, we propose an enhanced approach for classifying brain tumor types using Residual Networks. We evaluate the proposed model on a benchmark dataset containing 3064 MRI images of 3 brain tumor types (Meningiomas, Gliomas, and Pituitary tumors). We have achieved the highest accuracy of 99% outperforming the other previous work on the same dataset.
Article
Objective: Automatic artery/vein (A/V) segmentation from fundus images is required to track blood vessel changes occurring with many pathologies including retinopathy and cardiovascular pathologies. One of the clinical measures that quantifies vessel changes is the arterio-venous ratio (AVR) which represents the ratio between artery and vein diameters. This measure significantly depends on the accuracy of vessel segmentation and classification into arteries and veins. This paper proposes a fast, novel method for semantic A/V segmentation combining deep learning and graph propagation. Methods: A convolutional neural network (CNN) is proposed to jointly segment and classify vessels into arteries and veins. The initial CNN labeling is propagated through a graph representation of the retinal vasculature, whose nodes are defined as the vessel branches and edges are weighted by the cost of linking pairs of branches. To efficiently propagate the labels, the graph is simplified into its minimum spanning tree. Results: The method achieves an accuracy of 94.8% for vessels segmentation. The A/V classification achieves a specificity of 92.9% with a sensitivity of 93.7% on the CT-DRIVE database compared to the state-of-the-art-specificity and sensitivity, both of 91.7%. Conclusion: The results show that our method outperforms the leading previous works on a public dataset for A/V classification and is by far the fastest. Significance: The proposed global AVR calculated on the whole fundus image using our automatic A/V segmentation method can better track vessel changes associated to diabetic retinopathy than the standard local AVR calculated only around the optic disc.
Article
The findings of splenomegaly, abnormal enlargement of the spleen, is a non-invasive clinical biomarker for liver and spleen disease. Automated segmentation methods are essential to efficiently quantify splenomegaly from clinically acquired abdominal magnetic resonance imaging (MRI) scans. However, the task is challenging due to (1) large anatomical and spatial variations of splenomegaly, (2) large inter- and intra-scan intensity variations on multi-modal MRI, and (3) limited numbers of labeled splenomegaly scans. In this paper, we propose the Splenomegaly Segmentation Network (SS-Net) to introduce the deep convolutional neural network (DCNN) approaches in multimodal MRI splenomegaly segmentation. Large convolutional kernel layers were used to address the spatial and anatomical variations, while the conditional generative adversarial networks (GAN) were employed to leverage the segmentation performance of SS-Net in an end-to-end manner. A clinically acquired cohort containing both T1-weighted (T1w) and T2-weighted (T2w) MRI splenomegaly scans was used to train and evaluate the performance of multi-atlas segmentation (MAS), 2D DCNN networks, and a 3D DCNN network. From the experimental results, the DCNN methods achieved superior performance to the state-of-the-art MAS method. The proposed SS-Net method achieved the highest median and mean Dice scores among investigated baseline DCNN methods.
Article
Precise segmentation of the vertebrae is often required for automatic detection of vertebral abnormalities. This especially enables incidental detection of abnormalities such as compression fractures in images that were acquired for other diagnostic purposes. While many CT and MR scans of the chest and abdomen cover a section of the spine, they often do not cover the entire spine. Additionally, the first and last visible vertebrae are likely only partially included in such scans. In this paper, we therefore approach vertebra segmentation as an instance segmentation problem. A fully convolutional neural network is combined with an instance memory that retains information about already segmented vertebrae. This network iteratively analyzes image patches, using the instance memory to search for and segment the first not yet segmented vertebra. At the same time, each vertebra is classified as completely or partially visible, so that partially visible vertebrae can be excluded from further analyses. We evaluated this method on spine CT scans from a vertebra segmentation challenge and on low-dose chest CT scans. The method achieved an average Dice score of 95.8% and 92.1%, respectively, and a mean absolute surface distance of 0.194 mm and 0.344 mm.
Article
Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance. In this paper, we first point out that a simple fusion of low-level and high-level features could be less effective because of the gap in semantic levels and spatial resolution. We find that introducing semantic information into low-level features and high-resolution details into high-level features is more effective for the later fusion. Based on this observation, we propose a new framework, named ExFuse, to bridge the gap between low-level and high-level features thus significantly improve the segmentation quality by 4.0\% in total. Furthermore, we evaluate our approach on the challenging PASCAL VOC 2012 segmentation benchmark and achieve 87.9\% mean IoU, which outperforms the previous state-of-the-art results.
Conference Paper
The Fundus blood vessel is the only vascular system that can be observed noninvasively in the human body. Through the fundus photograph, we can get arterial and venous structures. Changes in the shape and size of blood vessels are important features for the diagnosis of diabetes, hypertension and other diseases. The segmentation of the blood vessels and the classification of arteries and veins are the basis for obtaining the characteristics and quantitative indicators. This paper discusses the research progress of retinal vessel segmentation and arteriovenous classification on the fundus images, and summarizes the research background, various methods along with advantages and disadvantages. It aims to guide the researchers to understand the research content and progress in this field, and to provide a comprehensive foundation for the follow-up research work.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
The aim of this study was to assess the concordance between magnetic resonance imaging (MRI) grading for cervical neural foraminal stenosis (CNFS) based on axial and oblique sagittal images and evaluate the reliability of each grading plane. CNFS was graded at C2–3 to C7–T1 levels based on axial and oblique sagittal images separately by three radiologists. The concordance between CNFS grading based on axial and oblique sagittal images was strong for all three observers (Kendall's W = 0.80, 0.79, and 0.82), despite the tendency of higher grading with oblique sagittal images. Both imaging planes supported strong interobserver reliability.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
This paper presents a segmentation approach guided by the user for extracting the vertebral bodies of spine from MRI. The proposed approach, called VBSeg, takes advantage of super pixels to reduce the image complexity and then making easy the detection of each vertebral body contour. Super pixels adapt themselves to the image structures, once their formation law follows the homogeneity of the image regions. However, for some diseases or abnormalities, the boundary of each super pixel does not fit well in the vertebra contour. To avoid this drawback, we propose to use the Otsu's method as a possegmentation step to divide the super pixels into smaller ones. The final segmentation is obtained through a region growing approach using points manually selected by the specialist. It can produce masks of the five lumbar vertebrae with an average precision of 80% and recall of 87%, when compared to the manual segmentation of a trained specialist. These values show that the VBSeg is a valuable asset to assist the medical specialist in the task of vertebral bodies' segmentation, with much less effort and time demand.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.