Research

Radiomics vs. Deep Learning to predict lipomatous soft tissue tumors malignancy on Magnetic Resonance Imaging (report)

Authors:
To read the file of this research, you can request a copy directly from the authors.

Abstract and Figures

Lipomatous soft-tissue tumors grow from mesenchymal tissue, referred as lipoma and liposarcoma for benign and malignant tumors respectively. Five subclasses of liposarcoma exist, requiring different patient treatments. Most of the types are easily distinguishable relying on Magnetic Resonance Imaging (MRI). But lipoma and Atypical Lipomatous Tumor / Well-Differentiated Liposarcoma (ALT/WDL) have overlapping MR imaging characteristics. A biopsy is usually performed to detect cancerous cells, and classify the tumor. However, these biopsies are invasive, costly and unnecessary in most cases as the malignant/benign ratio is significantly low (1/100). This work aim to provide efficient MRI-based decision support tools to discriminate cancerous tumors, as an alternative to biopsies. 85 MRI scans from patients with lipoma or ALT/WDL were gathered from 43 different centers with non-uniform protocols. We compared different approaches based on three versions of the MRI dataset: a 2D version with only one slice per patient (where the tumor is the largest), a 3D version (with all the slices where the tumor is visible), and a 3D version with batch-effect correction, to remove the inter-site technical variability. Radiomic features were extracted from the datasets, producing respectively 35 and 92 features for the 2D and 3D collection. In parallel, the MR images were normalized for pixel intensity, and inhomogeneities were corrected. We compared traditional machine learning algorithms based on radiomic data, to deep learning approaches using Convolutional Neural Networks (CNN) applied directly on MR images. Three CNN-based architectures were compared: a custom CNN learned from scratch, a fine-tuned pre-trained ResNet50 model, and a XGBoost classifier based on a CNN feature extraction. The models performance were assessed using 10 cross-validation folds. On the batch-corrected 3D radiomic dataset, we achieve to classify correctly all the validation folds (mean AUROC = 1) using linear Support Vector Machine (SVM) with feature selection, as well as using a Random Forest classifier. The best image classification performance was obtained by fine-tuning the pre-trained ResNet50 (mean AUROC = 0.878, std AUROC = 0.11). In our context of very limited observations, radiomic-based models outperformed the image-based approaches. Importantly, the batcheffect normalization, that removed the inter-site technical variability on the radiomic data, had a huge effect on the models performance. With such a small dataset, it was a hard task to train complex architectures like CNN. Moreover, the MRI scans were acquired on various body regions, resulting in high heterogeneity in the images, making the generalization even harder. These exciting results on the radiomics need to be confirmed on an external validation cohort, but could have an impact on clinical practice to differentiate lipoma from ALT/WDL based only on MRI, and in a wider approach, to classify all types of lipomatous soft-tissue tumors.
Content may be subject to copyright.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Deep learning methods, and in particular convolutional neural networks (CNNs), have led to an enormous breakthrough in a wide range of computer vision tasks, primarily by using large-scale annotated datasets. However, obtaining such datasets in the medical domain remains a challenge. In this paper, we present methods for generating synthetic medical images using recently presented deep learning Generative Adversarial Networks (GANs). Furthermore, we show that generated medical images can be used for synthetic data augmentation, and improve the performance of CNN for medical image classification. Our novel method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). We first exploit GAN architectures for synthesizing high quality liver lesion ROIs. Then we present a novel scheme for liver lesion classification using CNN. Finally, we train the CNN using classic data augmentation and our synthetic data augmentation and compare performance. In addition, we explore the quality of our synthesized examples using visualization and expert assessment. The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results increased to 85.7% sensitivity and 92.4% specificity. We believe that this approach to synthetic data augmentation can generalize to other medical classification applications and thus support radiologists' efforts to improve diagnosis.
Article
Full-text available
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
Article
Full-text available
Abstract Background Soft tissue sarcomas are rare entities with over 50 histological subtypes. Liposarcoma (LS) is the most common neoplasm in this group; it is a complex neoplasm that is divided into different histological subtypes. Different therapy options, such as surgical resection, radiation, and chemotherapy, are available. Depending on the subtype, location, status of the resection margins and metastatic status, different therapy options are used. Therefore, the aim of this study was to determine the prognostic factors influencing the survival of patients affected by LS with consideration for the grading, histological subtype, state of the resection margin, size, location, metastases and local recurrence in a retrospective, single-centre analysis over 15 years. Methods We included 133 patients (male/female = 67/66) in this study. We recorded the histologic subtype, grade, TNM classification, localization, biopsy technique, tumour margins, number of operations, complications, radiation and dose, chemotherapy, survival, recrudescence, metastases and follow-up. Survivorship analysis was performed. Results We detected 56 (43%; 95%-CI 34.6–51.6%) atypical LS cases, 21 (16.2%; 95%-CI 9.8–22.5) dedifferentiated LS cases, 40 (30.8%; 95%-CI 22.8–38.7) myxoid LS cases and 12 (9.2%; 95%-CI 4.3–14.2) pleomorphic LS cases. G1 was the most common grade, which was followed by G3. Negative margins (R0) were detected in 67 cases (53.6%; 95%-CI 44.9–62.3) after surgical resection. Local recurrence was detected in 23.6% of cases. The presence of metastases and dedifferentiated LS subtype as well as negative margins, grade and tumour size are significant prognostic factors of the survival rates (p
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https:// github. com/ weiliu89/ caffe/ tree/ ssd.
Article
Full-text available
Purpose To evaluate whether radiomic feature-based magnetic resonance (MR) imaging signatures allow prediction of survival and stratification of patients with newly diagnosed glioblastoma with improved accuracy compared with that of established clinical and radiologic risk models. Materials and Methods Retrospective evaluation of data was approved by the local ethics committee and informed consent was waived. A total of 119 patients (allocated in a 2:1 ratio to a discovery [n = 79] or validation [n = 40] set) with newly diagnosed glioblastoma were subjected to radiomic feature extraction (12 190 features extracted, including first-order, volume, shape, and texture features) from the multiparametric (contrast material-enhanced T1-weighted and fluid-attenuated inversion-recovery imaging sequences) and multiregional (contrast-enhanced and unenhanced) tumor volumes. Radiomic features of patients in the discovery set were subjected to a supervised principal component (SPC) analysis to predict progression-free survival (PFS) and overall survival (OS) and were validated in the validation set. The performance of a Cox proportional hazards model with the SPC analysis predictor was assessed with C index and integrated Brier scores (IBS, lower scores indicating higher accuracy) and compared with Cox models based on clinical (age and Karnofsky performance score) and radiologic (Gaussian normalized relative cerebral blood volume and apparent diffusion coefficient) parameters. Results SPC analysis allowed stratification based on 11 features of patients in the discovery set into a low- or high-risk group for PFS (hazard ratio [HR], 2.43; P = .002) and OS (HR, 4.33; P < .001), and the results were validated successfully in the validation set for PFS (HR, 2.28; P = .032) and OS (HR, 3.45; P = .004). The performance of the SPC analysis (OS: IBS, 0.149; C index, 0.654; PFS: IBS, 0.138; C index, 0.611) was higher compared with that of the radiologic (OS: IBS, 0.175; C index, 0.603; PFS: IBS, 0.149; C index, 0.554) and clinical risk models (OS: IBS, 0.161, C index, 0.640; PFS: IBS, 0.139; C index, 0.599). The performance of the SPC analysis model was further improved when combined with clinical data (OS: IBS, 0.142; C index, 0.696; PFS: IBS, 0.132; C index, 0.637). Conclusion An 11-feature radiomic signature that allows prediction of survival and stratification of patients with newly diagnosed glioblastoma was identified, and improved performance compared with that of established clinical and radiologic risk models was demonstrated. (©) RSNA, 2016 Online supplemental material is available for this article.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
In the past decade, the field of medical image analysis has grown exponentially, with an increased number of pattern recognition tools and an increase in data set sizes. These advances have facilitated the development of processes for high-throughput extraction of quantitative features that result in the conversion of images into mineable data and the subsequent analysis of these data for decision support; this practice is termed radiomics. This is in contrast to the traditional practice of treating medical images as pictures intended solely for visual interpretation. Radiomic data contain first-, second-, and higher-order statistics. These data are combined with other patient data and are mined with sophisticated bioinformatics tools to develop models that may potentially improve diagnostic, prognostic, and predictive accuracy. Because radiomics analyses are intended to be conducted with standard of care images, it is conceivable that conversion of digital images to mineable data will eventually become routine practice. This report describes the process of radiomics, its challenges, and its potential power to facilitate better clinical decision making, particularly in the care of patients with cancer.
Article
Full-text available
We provide examples and highlights of Advanced Normalization Tools (ANTS) that address practical problems in real data.
Article
Full-text available
Purpose: To review the reliability of MR imaging features for the purpose of distinguishing lipoma and atypical lipomatous tumor/well-differentiated liposarcoma (ALT/WDL). Materials and methods: A retrospective review of 87 patients with histologically proven lipomatous tumors was performed. All underwent MR imaging, assessing lipomatous content, septation, and nodules. The associations between these features and tumor diagnosis based on morphology and the presence or absence of MDM2 amplification were explored. The age of the patient and the size and location of the lesion were also recorded for statistical analysis. Results: Of the 87 patients, 54 were classified as lipomas and 33 as ALT/WDL. MR identified ALT/WDL with a sensitivity of 90.9 % (CI 74.5-97.6) and a specificity of 37.0 % (CI 24.6-51.3). The positive and negative predictive values were 46.9 % (CI 34.5-59.7) and 86.9 % (CI 65.3-96.6), respectively. The mean age of patients with ALT/WDL was greater (60 years [range 40-83 years]) than those with lipoma (52 years [range 10-79 years]) (p = 0.025). The mean size of ALT/WDL (18.7 cm [range 5-36 cm]) was significantly greater than lipoma (13.9 cm [range 3-32 cm]) (p = 0.003). Features that increased the likelihood of ALT/WDL included: patient age over 60 years, maximal lesion dimension over 10 cm, location in lower limb, and presence of non-fatty areas, by a factor of 2.61-6.25 times. Conclusions: ALT/WDL and lipoma have overlapping MR imaging characteristics. The most reliable imaging discriminators of ALT/WDL were size of lesion and lipomatous content, but due to the overlap in the MRI appearances of lipoma and ALT/WDL, discrimination should be based on molecular pathology rather than imaging.
Article
Full-text available
This paper presents a new wrapper-based feature selection method for multilayer perceptron (MLP) neural networks. It uses a feature ranking criterion to measure the importance of a feature by computing the aggregate difference, over the feature space, of the probabilistic outputs of the MLP with and without the feature. Thus, a score of importance with respect to every feature can be provided using this criterion. Based on the numerical experiments on several artificial and real-world data sets, the proposed method performs, in general, better than several selected feature selection methods for MLP, particularly when the data set is sparse or has many redundant features. In addition, as a wrapper-based approach, the computational cost for the proposed method is modest.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes ( > 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.
Conference Paper
Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians' expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on 256 × 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the training robustness. The results show that CPGGAN-based DA can boost 10% sensitivity in diagnosis with clinically acceptable additional False Positives. Surprisingly, further tumor realism, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three physicians cannot accurately distinguish them from the real ones in Visual Turing Test.
Conference Paper
Image synthesis learns a transformation from the intensity features of an input image to yield a different tissue contrast of the output image. This process has been shown to have application in many medical image analysis tasks including imputation, registration, and segmentation. To carry out synthesis, the intensities of the input images are typically scaled-i.e., normalized-both in training to learn the transformation and in testing when applying the transformation, but it is not presently known what type of input scaling is optimal. In this paper, we consider seven different intensity normalization algorithms and three different synthesis methods to evaluate the impact of normalization. Our experiments demonstrate that intensity normalization as a preprocessing step improves the synthesis results across all investigated synthesis algorithms. Furthermore, we show evidence that suggests intensity normalization is vital for successful deep learning-based MR image synthesis.
Article
With the proliferation of multi-site neuroimaging studies, there is a greater need for handling non-biological variance introduced by differences in MRI scanners and acquisition protocols. Such unwanted sources of variation, which we refer to as "scanner effects", can hinder the detection of imaging features associated with clinical covariates of interest and cause spurious findings. In this paper, we investigate scanner effects in two large multi-site studies on cortical thickness measurements across a total of 11 scanners. We propose a set of tools for visualizing and identifying scanner effects that are generalizable to other modalities. We then propose to use ComBat, a technique adopted from the genomics literature and recently applied to diffusion tensor imaging data, to combine and harmonize cortical thickness values across scanners. We show that ComBat removes unwanted sources of scan variability while simultaneously increasing the power and reproducibility of subsequent statistical analyses. We also show that ComBat is useful for combining imaging data with the goal of studying life-span trajectories in the brain.
Article
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
Article
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
Article
Diffusion tensor imaging (DTI) is a well-established magnetic resonance imaging (MRI) technique used for studying microstructural changes in the white matter. As with many other imaging modalities, DTI images suffer from technical between-scanner variation that hinders comparisons of images across imaging sites, scanners and over time. Using fractional anisotropy (FA) and mean diffusivity (MD) maps of 205 healthy participants acquired on two different scanners, we show that the DTI measurements are highly site-specific, highlighting the need of correcting for site effects before performing downstream statistical analyses. We first show evidence that combining DTI data from multiple sites, without harmonization, may be counter-productive and negatively impacts the inference. Then, we propose and compare several harmonization approaches for DTI data, and show that ComBat, a popular batch-effect correction tool used in genomics, performs best at modeling and removing the unwanted inter-site variability in FA and MD maps. Using age as a biological phenotype of interest, we show that ComBat both preserves biological variability and removes the unwanted variation introduced by site. Finally, we assess the different harmonization methods in the presence of different levels of confounding between site and age, in addition to test robustness to small sample size studies.
Article
In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Article
Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in understanding an object's precise 2D location. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old along with per-instance segmentation masks. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Article
Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.
Article
A variant of the popular nonparametric nonuniform intensity normalization (N3) algorithm is proposed for bias field correction. Given the superb performance of N3 and its public availability, it has been the subject of several evaluation studies. These studies have demonstrated the importance of certain parameters associated with the B -spline least-squares fitting. We propose the substitution of a recently developed fast and robust B-spline approximation routine and a modified hierarchical optimization scheme for improved bias field correction over the original N3 algorithm. Similar to the N3 algorithm, we also make the source code, testing, and technical documentation of our contribution, which we denote as ??N4ITK,?? available to the public through the Insight Toolkit of the National Institutes of Health. Performance assessment is demonstrated using simulated data from the publicly available Brainweb database, hyperpolarized <sup>3</sup>He lung image data, and 9.4T postmortem hippocampus data.
Article
Previous run length based texture analysis studies have mostly relied upon the use of run length or gray level distributions of the number of runs for characterizing the textures of images. In this study, some new joint run length-gray level distributions are proposed which offer additional insight into the image characterization problem and serve as effective features for a texturebased classification of images. Experimental evidence is offered to demonstrate the utilitarian value of these new concepts.
Conference Paper
A major problem in magnetic resonance imaging (MRI) is the lack of a pulse sequence dependent standardized intensity scale like the Hounsfield units in computed tomography. This affects the post pro- cessing of the acquired images as, in general, segmentation and regis- tration methods depend on the observed image intensities. Different ap- proaches dealing with this problem were proposed recently. In this article we will describe and compare five state-of-the-art standardization meth- ods regarding speed, applicability and accuracy. As a quality measure the mean distance and the Kullback-Leibler divergence are considered. For the experiments 28 MRI head volume images, acquired during clinical routine, were used.
Article
Most of the texture measures based on run lengths use only the lengths of the runs and their distribution. We propose to use the gray value distribution of the runs to define two new features, viz., low gray level run emphasis (LGRE) and high gray level run emphasis (HGRE).
Article
From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.
Article
One of the major drawbacks of magnetic resonance imaging (MRI) has been the lack of a standard and quantifiable interpretation of image intensities. Unlike in other modalities, such as X-ray computerized tomography, MR images taken for the same patient on the same scanner at different times may appear different from each other due to a variety of scanner-dependent variations and, therefore, the absolute intensity values do not have a fixed meaning. We have devised a two-step method wherein all images (independent of patients and the specific brand of the MR scanner used) can be transformed in such a way that for the same protocol and body region, in the transformed images similar intensities will have similar tissue meaning. Standardized images can be displayed with fixed windows without the need of per-case adjustment. More importantly, extraction of quantitative information about healthy organs or about abnormalities can be considerably simplified. This paper introduces and compares new variants of this standardizing method that can help to overcome some of the problems with the original method.
Article
The techniques available for the interrogation and analysis of neuroimaging data have a large influence in determining the flexibility, sensitivity, and scope of neuroimaging experiments. The development of such methodologies has allowed investigators to address scientific questions that could not previously be answered and, as such, has become an important research area in its own right. In this paper, we present a review of the research carried out by the Analysis Group at the Oxford Centre for Functional MRI of the Brain (FMRIB). This research has focussed on the development of new methodologies for the analysis of both structural and functional magnetic resonance imaging data. The majority of the research laid out in this paper has been implemented as freely available software tools within FMRIB's Software Library (FSL).
Article
Five properties of texture, namely, coarseness, contrast, business, complexity, and texture strength, are given conceptual definitions in terms of spatial changes in intensity. These conceptual definitions are then approximated in computational forms. In comparison with human perceptual measurements, the computational measures have shown good correspondences in the rank ordering of ten natural textures. The extent to which the measures approximate visual perception was investigated in the form of texture similarity measurements. These results were also encouraging, although not as good as in the rank ordering of the textures. The differences may be due to the complex mechanism of human usage of multiple cues. Improved classification results were obtained using the above features as compared with two existing texture analysis techniques. The application of the features in agricultural land-use classification is considered