Conference Paper

U-Net: Convolutional Networks for Biomedical Image Segmentation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... With advancements in deep neural networks (DNN) [10]- [13] most speech researchers have begun developing speech denoising and dereverberation strategies to estimate TF-masks. Many researchers have also looked into denoising autoencoders, recurrent networks, gated recurrent units (GRU) [14], convolution networks and their variants such as temporal convolution networks (TCN) [15], [16], fully convolutional networks (FCN) [17], [18], gated convolution networks, and so on. Unlike most signal processing methods, deep neural networks can learn patterns for either denoising or dere-verberation and generalize them to larger unseen scenarios with the help of nonlinear optimization. ...
... As previously stated, the proposed network is built using a fully convolutional complex-valued encoder-decoder network (Cplx-UNet), which is a standard U-Net [18], [36] framework with traditional convolution blocks substituted with complex equivalents. A U-Net is an encoder-decoder based fully convolutional network (FCN) that was originally developed for image segmentation task [17], [18], [36]. ...
... As previously stated, the proposed network is built using a fully convolutional complex-valued encoder-decoder network (Cplx-UNet), which is a standard U-Net [18], [36] framework with traditional convolution blocks substituted with complex equivalents. A U-Net is an encoder-decoder based fully convolutional network (FCN) that was originally developed for image segmentation task [17], [18], [36]. Recently, the speech community has seen a surge in the adoption of this architecture in conjunction with magnitude responses or complicated spectrograms for a variety of applications. ...
Preprint
Full-text available
With the advancements in deep learning approaches, the performance of speech enhancing systems in the presence of background noise have shown significant improvements. However, improving the system's robustness against reverberation is still a work in progress, as reverberation tends to cause loss of formant structure due to smearing effects in time and frequency. A wide range of deep learning-based systems either enhance the magnitude response and reuse the distorted phase or enhance complex spectrogram using a complex time-frequency mask. Though these approaches have demonstrated satisfactory performance, they do not directly address the lost formant structure caused by reverberation. We believe that retrieving the formant structure can help improve the efficiency of existing systems. In this study, we propose SkipConvGAN - an extension of our prior work SkipConvNet. The proposed system's generator network tries to estimate an efficient complex time-frequency mask, while the discriminator network aids in driving the generator to restore the lost formant structure. We evaluate the performance of our proposed system on simulated and real recordings of reverberant speech from the single-channel task of the REVERB challenge corpus. The proposed system shows a consistent improvement across multiple room configurations over other deep learning-based generative adversarial frameworks.
... In this study, we use a CNN, named MarsQuakeNet (MQNet), for the detection of marsquake energy in the time-frequency domain, and highlight its value for waveform denoising. MQNet is based on the UNet architecture developed by Ronneberger et al. (2015) for medical image segmentation. The UNet consist of a set of convolutional layers that are arranged in an encoder-decoder setup. ...
... The UNet consist of a set of convolutional layers that are arranged in an encoder-decoder setup. In medical image analysis (Ronneberger et al., 2015), this technique is used to build segmentation masks that pixel-wise classify cells or organs in the human body. Jansson et al. (2017), Chandna et al. (2017) and others showed how this method can decompose the time-frequency representation of audio tracks into different instruments or vocals. ...
... The UNet (Ronneberger et al., 2015) builds on the fully convolutional architecture, but is extended by an expansive path that leads to high-resolution localization of features. The network typically consumes 2D data in form of images or time-frequency representations of time series. ...
Article
Full-text available
NASA's Interior Exploration using Seismic Investigations, Geodesy and Heat Transport (InSight) seismometer has been recording Martian seismicity since early 2019, and to date, over 1,300 marsquakes have been cataloged by the Marsquake Service (MQS). Due to typically low signal‐to‐noise ratios (SNR) of marsquakes, their detection and analysis remain challenging: while event amplitudes are relatively low, the background noise has large diurnal and seasonal variations and contains various signals originating from the interactions of the local atmosphere with the lander and seismometer system. Since noise can resemble marsquakes in a number of ways, the use of conventional detection methods for catalog curation is limited. Instead, MQS finds events through manual data inspection. Here, we present MarsQuakeNet (MQNet), a deep convolutional neural network for the detection of marsquakes and the removal of noise contamination. Based on three‐component seismic data, MQNet predicts segmentation masks that identify and separate event and noise energy in time‐frequency domain. As the number of cataloged MQS events is small, we combine synthetic event waveforms with recorded noise to generate a training data set. We apply MQNet to the entire continuous 20 samples‐per‐second waveform data set available to date (>1,000 Martian days), for automatic event detection and for retrieving denoised amplitudes. The algorithm reproduces all high quality, as well as majority of low quality events in the manual, carefully curated MQS catalog. Furthermore, MQNet detects ∼60% additional events that were previously unknown with mostly low SNR, that are verified in manual review. Our analysis on the event rate confirms seasonal trends and shows a substantial increase in the second Martian year.
... On the other hand, fully automated segmentation can substantially reduce the time required for target volume delineation and produce more consistent segmentation masks [2], [3]. Over the last several years, deep learning methods have produced impressive results for segmentation tasks such as labeling tumors or various anatomical structures [4]- [7]. ...
... To do so, we propose the PocketNet paradigm for deep learning models, a straightforward modification to existing architectures that substantially reduces the number of parameters and maintains the same performance as the original architecture. This modification questions the long-held assumption that doubling the number of features after each downsampling operation (i.e., pooling or convolution) is necessary for convolutional neural networks [4]- [7], [10]- [15]. Our work demonstrates the effectiveness of our PocketNets in several 3D segmentation tasks and a classification problem. ...
... Many common network architectures for imaging tasks rely on manipulating images (or image features) at multiple scales because natural images encapsulate data at multiple resolutions. As a result, most CNN architectures -including many popular state-of-the-art methods such as nnUNet [14] and HRNet [13] -follow a pattern of downsampling and upsampling, following the intuition of the original U-Net paper [4] that popularized this approach. In the architecture first presented there, the number of feature maps (i.e., channels) in each convolution operator is doubled each time the resolution of the images decreases; the justification being that the increased number of feature maps offsets the loss of information from downsampling. ...
Article
Full-text available
Medical imaging deep learning models are often large and complex, requiring specialized hardware to train and evaluate these models. To address such issues, we propose the PocketNet paradigm to reduce the size of deep learning models by throttling the growth of the number of channels in convolutional neural networks. We demonstrate that, for a range of segmentation and classification tasks, PocketNet architectures produce results comparable to that of conventional neural networks while reducing the number of parameters by multiple orders of magnitude, using up to 90% less GPU memory, and speeding up training times by up to 40%, thereby allowing such models to be trained and deployed in resource-constrained settings.
... The backbone of an FCN is usually a pre-trained CNN such as VGG16. The idea behind FCN is what became the backbone of U-Nets [19]. The fully convolutional DenseNet is another approach [17]. ...
... Graphical comparison between (a) U-Net[19], (b) fully convolutional DenseNet[17], and (c) proposed AID-U-Net. ...
... Deep models that use convolutional neural network (CNN) structures have shown promising ability in overcoming the obstacles brought by real-world troubles. Han et al. [9], and Akinlar et al. [10], both of them used prior knowledge to improve pupil segmentation accuracy by selecting U-Net [11] architectures. The shape-prior loss and ellipse fit error loss were used, respectively. ...
... In this work, we consider another follow-up work of ViT framework, termed Swin-Transformer [18], which constructs hierarchical feature maps with linear computational complexity by image size. Swin-Transformer can conveniently leverage advanced techniques for dense prediction, such as feature pyramid networks U-Net [11] and Feature Pyramid Networks (FPN) [19]. In image segmentation tasks, the hierarchical architecture of Swin-Transformer achieved excellent performance, such as Swin U-Net [20] and DS-Trans U-Net [21] in 2D medical image segmentation, and Swin-UNETR [22] in 3D medical segmentation with state-of-art results. ...
Conference Paper
Detecting pupil from the image is critical in human-machine interaction and biomedical computing applications, which is supposed to be an actual image segmentation problem. Recently developed deep learning models provide a variety of novel approaches to the pupil segmentation task. However, dataset preparation and annotation acquirement to build pupil image datasets are labor-intensive and time-consuming. The shortage of labeled samples restricted the improvement of deep learning models. In this work, we use a mask image modeling mechanism to learn the latent representation from limited data samples, which significantly helps train deep models. Further, we propose a novel pupil segmentation model based on the recently proposed Swin-Transformer to validate the improvement validity of the mask mechanism. The proposed computational framework achieves better performance on the pupil segmentation tasks based on the LPW dataset through comparison experiments with other related deep learning models. The proposed framework is a promising solution for pupil segmentation and detection in small-sample learning applications.
... Recently, some works presented different CNN-based approaches for solving the inverse scattering problem, showing the potential of the model [8,11,12]. The CNNs of these works are based on the U-Net [13] and use both, amplitude and phase information. Surprisingly, to our knowledge, the CNN for solving the microwave Electromagnetic (EM) inverse problem with amplitude-only information has not been employed yet. ...
Article
An inverse method for parameters estimation of dielectric cylinders (dielectric properties, location, and radius) from amplitude-only microwave information is presented. To this end two different Artificial Neural Networks (ANN) topologies were compared; a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN). Several two-dimensional (2D) simulations, with different sizes and locations of homogeneous dielectric cylinders employing the Finite Differences Time Domain (FDTD) method, were performed to generate training, validation, and test sets for both ANN models. The prediction errors were lower for the CNN in high Signal-to-Noise Ratio (SNR) scenarios, although the MLP was more robust in low SNR situations. The CNN model performance was also tested for 2D simulations of dielectrically homogeneous and heterogeneous cylinders placed in acrylic holders showing potential experimental applications. Moreover, the CNN was also tested for a three-dimensional model simulated as realistic as possible, showing good results in predicting all parameters directly from the S-parameters.
... Address feature loss issues. Ronneberger et al. 18 proposed U-NET, which adopted an Encoder-decoder structure for small targets and added richer feature information fusion. Badrinarayanan et al. 19 proposed Segmentation Network (SegNet), which adopted the same structure as U-NET. ...
Article
Full-text available
In computer vision, convolution and pooling operations tend to lose high-frequency information, and the contour details will also disappear with the deepening of the network, especially in image semantic segmentation. For RGB-D image semantic segmentation, all the effective information of RGB and depth image can not be used effectively, while the form of wavelet transform can retain the low and high frequency information of the original image perfectly. In order to solve the information losing problems, we proposed an RGB-D indoor semantic segmentation network based on multi-scale fusion: designed a wavelet transform fusion module to retain contour details, a nonsubsampled contourlet transform to replace the pooling operation, and a multiple pyramid module to aggregate multi-scale information and context global information. The proposed method can retain the characteristics of multi-scale information with the help of wavelet transform, and make full use of the complementarity of high and low frequency information. As the depth of the convolutional neural network increases without losing the multi-frequency characteristics, the segmentation accuracy of image edge contour details is also improved. We evaluated our proposed efficient method on commonly used indoor datasets NYUv2 and SUNRGB-D, and the results showed that we achieved state-of-the-art performance and real-time inference.
... The modeling pipeline of 69 Figure 2 has three key steps described as follows: Using DeepTracer [12] we first predict the 3D backbone coordinates of the protein 72 complex directly from cryo-EM density map. DeepTracer uses a 3D U-Net architecture 73 which is modified from the original 2D U-Net [13] architecture developed especially for 74 biomedical image segmentation. The output from DeepTracer block is a predicted 3D 75 backbone coordinate structure that has the carbon, carbon alpha, nitrogen and oxygen 76 atoms in the Protein Data Bank Format (PDB), which is standardized by wwPDB [14]. ...
Preprint
Full-text available
Elucidating protein-ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein-ligand interaction is traditionally tackled by molecular docking and simulation based on physical forces and statistical potentials, which cannot effectively leverage cryo-EM data and existing protein structural information in the protein-ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein-ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein-ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. The results demonstrate the deep learning bioinformatics approach is a promising direction to model protein-ligand interaction on cryo-EM data using prior structural information. The source code, data, and instruction to reproduce the results are open sourced and available on GitHub repository : https://github.com/jianlin-cheng/DeepProLigand
... In this study, a U-Net model was also found the best among the three considered models (pyramid scene parsing network (PSPNet), DeepLabv3, and U-Net). Ronneberger et al. [13] proposed the U-Net model to effectively solve the semantic segmentation problem in medical images. The U-Net structure consists of two parts: an encoding module and a decoding module, namely, the contraction path that captures the context, the encoding layer, and the symmetric expansion path that can be accurately located, the decoding layer. ...
Article
Full-text available
Wheat stripe rust-damaged leaves present challenges to automatic disease index calculation, including high similarity between spores and spots, and difficulty in distinguishing edge contours. In actual field applications, investigators rely on the naked eye to judge the disease extent, which is subjective, of low accuracy, and essentially qualitative. To address the above issues, this study undertook a task of semantic segmentation of wheat stripe rust damage images using deep learning. To address the problem of small available datasets, the first large-scale open dataset of wheat stripe rust images from Qinghai province was constructed through field and greenhouse image acquisition, screening, filtering, and manual annotation. There were 33,238 images in our dataset with a size of 512 × 512 pixels. A new segmentation paradigm was defined. Dividing indistinguishable spores and spots into different classes, the task of accurate segmentation of the background, leaf (containing spots), and spores was investigated. To assign different weights to high- and low-frequency features, we used the Octave-UNet model that replaces the original convolutional operation with the octave convolution in the U-Net model. The Octave-UNet model obtained the best benchmark results among four models (PSPNet, DeepLabv3, U-Net, Octave-UNet), the mean intersection over a union of the Octave-UNet model was 83.44%, the mean pixel accuracy was 94.58%, and the accuracy was 96.06%, respectively. The results showed that the state-of-art Octave-UNet model can better represent and discern the semantic information over a small region and improve the segmentation accuracy of spores, leaves, and backgrounds in our constructed dataset.
... Fully convolutional neural networks (FCN) were the first deep learning methods applied for this purpose [13]. Subsequently, the emergence of U-Net pushed deep learning methods to their pinnacle [14]. ...
Article
Full-text available
Diabetic Retinopathy (DR) is a diabetic complication that predisposes patients to visual impairments that could lead to blindness. Lesion segmentation using deep learning algorithms is an effective measure to screen and prevent early DR. However, there are several types of DR with varying sizes and high inter-class similarity, making segmentation difficult. In this paper, we propose a supervised segmentation method (MSLF-Net) based on multi-scale–multi-level feature fusion to achieve accurate end-to-end DR lesion segmentation. MSLF-Net builds a Multi-Scale Feature Extraction (MSFE) module to extract multi-scale information and provide more comprehensive features for segmentation. This paper further introduces the Multi-Level Feature Fusion (MLFF) module to improve feature fusion using a cross-layer structure. This structure only fuses low- and high-level features of the same class based on category supervision, avoiding feature contamination. Moreover, this paper produces additional masked images for the dataset and performs image enhancement operations to ensure that the proposed method is trainable and functional on small datasets. The extensive experiments are conducted on public datasets IDRID and e_ophtha. The results showed that our proposed feature enhancement method can perform feature fusion more effectively. Therefore, In the end-to-end DR segmentation neural network model, MSLF Net is superior to other similar models in segmentation, and can effectively improve the DR lesion segmentation performance.
... The classic development path of semantic segmentation networks includes FCN [16], U-Net [23], DeepLab series: v1 [5], v2 [6], v3 [7], RefineNet [15], and MTI-Net [27] etc. We adopt RefineNet as the semantic segmentation branch unless otherwise specified. ...
Conference Paper
Full-text available
From traditional handcrafted priors to learning-based neural networks, image dehazing technique has gone through great development. In this paper, we propose an end-to-end Semantic Guided Network (SGNet1) for directly restoring the haze-free images. Inspired by the high similarity (mapping relationship) between the transmission maps and the segmentation results of hazy images, we found that the semantic information of the scene provides a strong natural prior for image restoration. To guide the dehazing more effectively and systematically, we utilize the information of semantic segmentation with three easily portable modes: Semantic Fusion (SF), Semantic Attention (SA), and Semantic Loss (SL), which compose our Semantic Guided (SG) mechanisms. By embedding these SG mechanisms into existing dehazing networks, we construct the SG-Net series: SG-AOD, SG-GCA, SG-FFA, and SG-AECR. The outperformance on image dehazing of these SG networks is demonstrated by the experiments in terms of both quantity and quality. It is worth mentioning that SG-FFA achieves the state-of-the-art performance.
... The comparisons with two renowned models, the U-net [25] and LeNet [26], are made in the five-fold validation test. Frames with an IoU ≥ 0.5 are fed to the hard attention models, LenNet-5 U-net. ...
Article
Full-text available
This article designs a cascaded neural network to diagnose colonoscopic images automatically. With the limited number of labeled polyps masked in binary, the proposed detection network uses a hetero-encoder to map a colonoscopic image to an aggregated set of exemplified images as data argumentation to force the successive autoencoder to learn important features acting as a denoising autoencoder. In other words, the autoencoder denoises the transient images generated in the precedent hetero-encoder training process by auto-associating the ground truth and its variants. A hard attention model classifies the segmented image and applies a local region proposal network (RPN) to the generation and aggression of bounding boxes only on the segmented images to allow a more precise detection such that computations on bounding boxes with less information are avoided. The proposed system can outperform current complex state-of-art methods like faster-R-CNN from the experiments on endoscopic images.
... Here, 'Cin/Cout' represent input/output channels of an encoder/decoder block, K, S,and P ' represents the kernel size, stride and padding parameters used for convolution layers. We use a 6-layer U-Net [29], which is an encoder-decoder network with two layers of gated recurrent units (GRU). The encoder extracts spectral and temporal features from an input complex spectrogram, and the decoder constructs an enhanced complex spectrogram from the encoded features. ...
Preprint
Full-text available
Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end speech applications, such as automatic speech recognition, compared to earlier approaches for self-attention.
Article
Full-text available
Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the “fixed filters” principle that all spatial filter weights of convolutional neural networks can be fixed at initialization and never learned, and the “nimbleness” principle that only few network parameters suffice. We contribute (a) visual model‐based explanations, (b) speed and accuracy gains, and (c) novel tools for deep convolutional neural networks. ExplainFix gives key insights that spatially fixed networks should have a steered initialization, that spatial convolution layers tend to prioritize low frequencies, and that most network parameters are not necessary in spatially fixed models. ExplainFix models have up to ×100 fewer spatial filter kernels than fully learned models and matching or improved accuracy. Our extensive empirical analysis confirms that ExplainFix guarantees nimbler models (train up to 17% faster with channel pruning), matching or improved predictive performance (spanning 13 distinct baseline models, four architectures and two medical image datasets), improved robustness to larger learning rate, and robustness to varying model size. We are first to demonstrate that all spatial filters in state‐of‐the‐art convolutional deep networks can be fixed at initialization, not learned. This article is categorized under: Technologies > Machine Learning Fundamental Concepts of Data and Knowledge > Explainable AI Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining
Article
Full-text available
This work presents the use of a high-fidelity neural network surrogate model within a Modular Optimization Framework for treatment of crud deposition as a constraint within light-water reactor core loading pattern optimization. The neural network was utilized for the treatment of crud constraints within the context of an advanced genetic algorithm applied to the core design problem. This proof-of-concept study shows that loading pattern optimization aided by a neural network surrogate model can optimize the manner in which crud distributes within a nuclear reactor without impacting operational parameters such as enrichment or cycle length. Several analysis methods were investigated. Analysis found that the surrogate model and genetic algorithm successfully minimized the deviation from a uniform crud distribution against a population of solutions from a reference optimization in which the crud distribution was not optimized. Strong evidence is presented that shows boron deposition in crud can be optimized through the loading pattern. This proof-of-concept study shows that the methods employed provide a powerful tool for mitigating the effects of crud deposition in nuclear reactors.
Article
Full-text available
Visual Attention Prediction (VAP) is widely applied in GIS research, such as navigation task identification and driver assistance systems. Previous studies commonly took color information to detect the visual saliency of natural scene images. However, these studies rarely considered adaptively feature integration to different geospatial scenes in specific tasks. To better predict visual attention while driving tasks, in this paper, we firstly propose an Adaptive Feature Integration Fully Convolutional Network (AdaFI-FCN) using Scene-Adaptive Weights (SAW) to integrate RGB-D, motion and semantic features. The quantitative comparison results on the DR(eye)VE dataset show that the proposed framework achieved the best accuracy and robustness performance compared with state-of-the-art models (AUC-Judd = 0.971, CC = 0.767, KL = 1.046, SIM = 0.579). In addition, the experimental results of the ablation study demonstrated the positive effect of the SAW method on the prediction robustness in response to scene changes. The proposed model has the potential to benefit adaptive VAP research in universal geospatial scenes, such as AR-aided navigation, indoor navigation, and street-view image reading.
Article
Full-text available
Fibrous materials play a significant role in many industries, such as lightweight automotive materials, filtration, or as constituents of hygiene products. The properties of fibrous materials are governed to a large extent by their microstructure. One way to access the microstructure is to use micro-Computed Tomography (micro-CT). Completely characterizing the microstructure requires geometrically characterizing each individual fiber. To make this possible, one must identify the individual fibers. Our method achieves this by finding in segmented µCT scans the centerline of all individual fibers. It uses a convolutional neural network that was trained on automatically generated synthetic training data. From the centerlines, analytic descriptions of the individual fibers are constructed. These analytic representations allow detailed insights into the statistics of the geometric properties of the fibrous material, such as the fibers’ orientation, length, or curvature. The method is validated on artificial data sets and its usefulness demonstrated on a very large micro-CT scan of a nonwoven composed of long fibers with random curvature.
Preprint
Full-text available
Neovascular age-related macular degeneration (nAMD) is one of the major causes of irreversible blindness and is characterized by accumulations of different fluids inside the retina. An early detection and activity monitoring of predominately three types of fluids, namely intra-retinal fluid (IRF), sub-retinal fluid (SRF), and pigment epithelium detachment (PED), is critical for a successful treatment. Spectral-domain optical coherence tomography (SD-OCT) revolutionized nAMD treatment by providing cross-sectional, high-resolution images of the retina. Automatic segmentation and quantification of IRF, SRF, and PED in SD-OCT images can be extremely useful for clinical decision-making. Despite the use of state-of-the-art convolutional neural network (CNN)-based methods, the task remains challenging due to relevant variations in the location, size, shape, and texture of the fluids. This work is the first to adopt a transformer-based method to automatically segment retinal fluid from SD-OCT images and qualitatively and quantitatively evaluate its performance against CNN-based methods. The method combines the efficient long-range feature extraction and aggregation capabilities of Vision Transformers (ViTs) with data-efficient training of CNNs. The proposed method was tested on a private dataset containing 3842 2-dimensional SD-OCT retina images, manually labeled by experts of the Franziskus-Eye-Hospital. While one of the competitors presents a better performance in terms of Dice score, the proposed method is significantly less computationally expensive. Thus, future research will focus on the proposed network's architecture to increase its segmentation performance while maintaining its computational efficiency.
Article
Purpose: Personalized synthetic MRI (syn-MRI) uses MR images of an individual subject acquired at a few design parameters (echo time, repetition time, flip angle) to obtain underlying parametric ( ρ , T 1 , T 2 ) $$ \left(\rho, {\mathrm{T}}_1,{\mathrm{T}}_2\right) $$ maps, from where MR images of that individual at other design parameter settings are synthesized. However, classical methods that use least-squares (LS) or maximum likelihood estimators (MLE) are unsatisfactory at higher noise levels because the underlying inverse problem is ill-posed. This article provides a pipeline to enhance the synthesis of such images in three-dimensional (3D) using a deep learning (DL) neural network architecture for spatial regularization in a personalized setting where having more than a few training images is impractical. Methods: Our DL enhancements employ a Deep Image Prior (DIP) with a U-net type denoising architecture that includes situations with minimal training data, such as personalized syn-MRI. We provide a general workflow for syn-MRI from three or more training images. Our workflow, called DIPsyn-MRI, uses DIP to enhance training images, then obtains parametric images using LS or MLE before synthesizing images at desired design parameter settings. DIPsyn-MRI is implemented in our publicly available Python package DeepSynMRI available at: https://github.com/StatPal/DeepSynMRI. Results: We demonstrate feasibility and improved performance of DIPsyn-MRI on 3D datasets acquired using the Brainweb interface for spin-echo and FLASH imaging sequences, at different noise levels. Our DL enhancements improve syn-MRI in the presence of different intensity nonuniformity levels of the magnetic field, for all but very low noise levels. Conclusion: This article provides recipes and software to realistically facilitate DL-enhanced personalized syn-MRI.
Article
Full-text available
Spartina alterniflora (S. alterniflora) was introduced into China in the 1980s to control coastal erosion. However, it has significant negative impacts on coastal wetlands by encroaching on the habitat of native plant communities, which has seriously threatened coastal wetland function and biodiversity maintenance. The core area of the Migratory Bird Sanctuaries along the Coast of Yellow Sea-Bohai Gulf of China was taken as the study area and GF-2 remote sensing images from 2017, 2019 and 2020 were selected in this study. By using U-Net, this study extracted habitat information, identified the variation in habitat pattern, investigated interspecific competition and the characteristics of habitat type shifts. The results indicated that the U-Net model showed excellent classification performance, with the highest F1-score and MIoU. The habitat of S. alterniflora expanded and increased from 3,920 to 4,350 hm 2 between 2017 and 2020. The habitat of Suaeda salsa (S. salsa) continuously fragmented and had been reduced to 1,414 hm 2 by 2020. The degree of habitat fragmentation in the core area was strengthened and heterogeneity was enhanced. Moreover, while the competition between S. alterniflora and Phragmites australis (P. australis) was intensifying, both encroached on the habitat of S. salsa, which aggravated the reduction and fragmentation of the habitat of S. salsa, causing the loss of food and habitat of red-crowned cranes and further threatening biodiversity maintenance and ecosystem stability. These results can offer new insights on the habitat competitive mechanism among native plant communities and alien invasive species in the coastal wetland under global warming and anthropogenic influences.
Article
Purpose: The objective of this study was to develop a numeric tool to automate the analysis of deformity from lower limb telemetry and assess its accuracy. Our hypothesis was that artificial intelligence (AI) algorithm would be able to determine mechanical and anatomical angles to within 1°. Methods: After institutional review board approval, 1175 anonymized patient telemetries were extracted from a database of more than ten thousand telemetries. From this selection, 31 packs of telemetries were composed and sent to 11 orthopaedic surgeons for analysis. Each surgeon had to identify on the telemetries fourteen landmarks allowing determination of the following four angles: hip-knee-ankle angle (HKA), medial proximal tibial angle (MPTA), lateral distal femoral angle (LDFA), and joint line convergence angle (JLCA). An algorithm based on a machine learning process was trained on our database to automatically determine angles. The reliability of the algorithm was evaluated by calculating the difference of determination precision between the surgeons and the algorithm. Results: The analysis time for obtaining 28 points and 8 angles per image was 48 ± 12 s for the algorithm. The average difference between the angles measured by the surgeons and the algorithm was around 1.9° for all the angles of interest: 1.3° for HKA, 1.6° for MPTA, 2.1° for LDFA, and 2.4° for JLCA. Intraclass correlation was greater than 95% for all angles. Conclusion: The algorithm showed high accuracy for automated angle measurement, allowing the estimation of limb frontal alignment to the nearest degree.
Article
Cage farming is the mainstream farming mode in China. Accurate individual identification and behavioral detection of caged chickens can provide managers with a better understanding of chicken status. However, for image detection of caged chickens, the cage may affect the accuracy of the detection algorithm. For this reason, CCD (caged chicken defencing), a defencing algorithm based on U-Net and pix2pixHD, was proposed to improve caged chickens' detection accuracy. The proposed defencing algorithm can accurately identify the cage wire mesh and recover the chicken contours completely. In the test set, the detection accuracy of the cage wire mesh was 94.71%, while a structural similarity (SSIM) of 90.04% and a peak signal-to-noise ratio (PSNR) of 25.24 dB were obtained in the image recovery. To verify the practicality of the method proposed in this paper, we analyzed the performance of the object detection algorithm before and after defencing from the perspective of the most basic individual detection in the caged chicken detection task. We validated the defencing algorithm with different YOLOv5 detection algorithms, including YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The experimental results showed that the defencing algorithm improved the detection precision of caged chickens by 16.1%, 12.1%, 7.3%, and 5.4%, respectively, compared with before defencing. The recall improvement was 29.1%, 16.4%, 8.5%, and 6.8%. To our knowledge, this is the first time that a deep learning-based defencing algorithm has been applied to caged chickens, and the detection accuracy can be significantly improved. The method proposed in this paper can remove cage wire mesh greatly and provide a technical reference for subsequent poultry researchers.
Article
Full-text available
The plant leaf veins coupling feature representation and measurement method based on DeepLabV3+ is proposed to solve problems of slow segmentation, partial occlusion of leaf veins, and low measurement accuracy of leaf veins parameters. Firstly, to solve the problem of slow segmentation, the lightweight MobileNetV2 is selected as the extraction network for DeepLabV3+. On this basis, the Convex Hull-Scan method is applied to repair leaf veins. Subsequently, a refinement algorithm, Floodfill MorphologyEx Medianblur Morphological Skeleton (F-3MS), is proposed, reducing the burr phenomenon of leaf veins’ skeleton lines. Finally, leaf veins’ related parameters are measured. In this study, mean intersection over union (MIoU) and mean pixel accuracy (mPA) reach 81.50% and 92.89%, respectively, and the average segmentation speed reaches 9.81 frames per second. Furthermore, the network model parameters are compressed by 89.375%, down to 5.813M. Meanwhile, leaf veins’ length and width are measured, yielding an accuracy of 96.3642% and 96.1358%, respectively.
Article
Convolutional neural networks (CNNs) play a crucial role and achieve top results in computer vision tasks but at the cost of high computational cost and storage complexity. One way to solve this problem is the approximation of the convolution kernel using tensor decomposition methods. In this way, the original kernel is replaced with a sequence of kernels in a lower-dimensional space. This study proposes a novel CNN compression technique based on the hierarchical Tucker-2 (HT-2) tensor decomposition and makes an important contribution to the field of neural network compression based on low-rank approximations. We demonstrate the effectiveness of our approach on many CNN architectures on CIFAR-10 and ImageNet datasets. The obtained results show a significant reduction in parameters and FLOPS with a minor drop in classification accuracy. Compared to different state-of-the-art compression methods, including pruning and matrix/tensor decomposition, the HT-2, as a new alternative, outperforms most of the cited methods. The implementation of the proposed approach is very straightforward and can be easily coded in every deep learning library.
Article
Full-text available
Medical studies have shown that the condition of human retinal vessels may reveal the physiological structure of the relationship between age-related macular degeneration, glaucoma, atherosclerosis, cataracts, diabetic retinopathy, and other ophthalmic diseases and systemic diseases, and their abnormal changes often serve as a diagnostic basis for the severity of the condition. In this paper, we design and implement a deep learning-based algorithm for automatic segmentation of retinal vessel (CSP_UNet). It mainly adopts a U-shaped structure composed of an encoder and a decoder and utilizes a cross-stage local connectivity mechanism, attention mechanism, and multi-scale fusion, which can obtain better segmentation results with limited data set capacity. The experimental results show that compared with several existing classical algorithms, the proposed algorithm has the highest blood vessel intersection ratio on the dataset composed of four retinal fundus images, reaching 0.6674. Then, based on the CSP_UNet and introducing hard parameter sharing in multi-task learning, we innovatively propose a combined diagnosis algorithm vessel segmentation and diabetic retinopathy for retinal images (MTNet). The experiments show that the diagnostic accuracy of the MTNet algorithm is higher than that of the single task, with 0.4% higher vessel segmentation IoU and 5.2% higher diagnostic accuracy of diabetic retinopathy classification.
Article
Deep learning-based segmentation methods have demonstrated significant performance over their traditional counterparts. However, striving for better accuracy with such networks usually leads to the deterioration of the network’s computational efficiency, thereby rendering them inefficient for deployment on resource constraint devices. Establishing the required tradeoff between the accuracy of pixel prediction and computational efficiency remains challenging. In this article, a lightweight multiscale segmentation framework is proposed. We leverage the representation power of different receptive fields to attain optimal accuracy while maintaining computational efficiency by embedding the sparse network architecture with the depthwise separable convolution at the multiscale level. Experimental results from two challenging remote sensing segmentation datasets show that the proposed network can achieve substantial pixel prediction accuracy at relatively low computational overhead compared to state-of-the-art networks.
Article
The research on developing CNN-based fully-automated brain-tumor-segmentation systems has been progressing rapidly. For the systems to be applicable in practice, a good processing quality and reliability are necessary. Moreover, as the parameters in a CNN are determined by training, based on statistical losses in training epochs, more parameters may cause more randomness in the process and a minimization of the number of parameters is required to achieve a good reproducibility of the results. To this end, the CNN in the proposed system has a unique structure with 2 distinguished characters. Firstly, the three paths of its feature extraction block are designed to extract, from the multi-modality input, comprehensive feature information of mono-modality, paired-modality and cross-modality data, respectively. Also, it has a particular three-branch classification block to distinguish pixels in each of the 3 intra-tumoral classes from the background. Each branch is trained separately so that the parameters are updated specifically with the corresponding ground truth data of target tumor areas. The convolutional layers of the system are custom-designed with specific purposes, resulting in a very simple config of 61,843 parameters in total. The proposed system has been tested extensively with BraTS2019 and BraTS2018 datasets. The mean Dice scores, obtained from the ten experiments on BraTS2019 validation samples, are 0.751 ± 0.007, 0.885 ± 0.002, 0.776 ± 0.004, for enhancing tumor, whole tumor and tumor core, respectively. The test results demonstrate that the proposed system is able to reproduce a high-quality segmentation result quite consistently. Furthermore, its extremely low computation complexity will facilitate its implementation/application in various environments.
Article
Many image translation methods based on conditional generative adversarial networks can transform images from one domain to another, but the results of many methods are at a low resolution. We present a modified pix2pixHD model, which generates high-resolution tiled clothing from a model wearing clothes. We choose a single Markovian discriminator instead of a multi-scale discriminator for a faster training speed, added a perceptual loss term, and improved the feature matching loss. Deeper feature maps have lower weights when calculating losses. A dataset was specifically built for this improved model, which contains over 20,000 paired high-quality tiled clothing. The experimental results demonstrate the feasibility of our improved method and can be extended to other fields.
Article
Crowd counting is an effective tool for situational awareness in public places. Automated crowd counting using images and videos is an interesting yet challenging problem that has gained significant attention in computer vision. Over the past few years, various deep learning methods have been developed to achieve state-of-the-art performance. The methods evolved over time vary in many aspects such as model architecture, input pipeline, learning paradigm, computational complexity, and accuracy gains etc. In this paper, we present a systematic and comprehensive review of the most significant contributions in the area of crowd counting. Although few surveys exist on the topic, our survey is most up-to date and different in several aspects. First, it provides a more meaningful categorization of the most significant contributions by model architectures, learning methods (i.e., loss functions), and evaluation methods (i.e., evaluation metrics). We chose prominent and distinct works and excluded similar works. We also sort the well-known crowd counting models by their performance over benchmark datasets. We believe that this survey can be a good resource for novice researchers to understand the progressive developments and contributions over time and the current state-of-the-art.
Chapter
In Northern Europe, parish records provide centuries of lineage information, useful not only for settling inheritance disputes, but also for studying hereditary diseases, social mobility, etc. The key information to extract from scans of parish records to obtain lineage information is dates: birth dates (of children and their parents) and dates of baptisms. We present a new dataset of birth dates from Danish parish records and use it to benchmark different approaches to handwritten date recognition, some based on classification and some based on transduction. We evaluate these approaches across several experimental protocols and different segmentation strategies. A state-of-the-art transformer-based transduction model exhibits lower error rates than image classifiers in most scenarios. The image classifiers can nevertheless offer a compelling trade-off in terms of accuracy and computational resource requirements.
Article
Fall is one of the most critical issues faced by elders in their daily life. The consequences of falls range from fatal injuries, severe injuries to no injuries. Therefore, an effective system for detecting falls and treating post-fall injuries is important. Unlike wearable sensors, camera-based systems seem more comfortable and flexible for daily life monitoring. However, monitoring human activities in real settings using cameras is not only challenging but also pose privacy issues. To mitigate this problem, we propose a surveillance camera-based framework for fall detection and post-fall classification where the human silhouette is extracted and used instead of raw images. The human silhouette is obtained using a pixel-level classification based on a Multi-Scale Skip Connection Segmentation Network (MSSkip), which is shown to achieve state-of-the-art performance on the validation set from PASCAL VOC 2012 dataset for the class Person with an IoU of 90%. The temporal and spatial variations of the human poses are fed to a Convolutional Long Short Term Memory (ConvLSTM) network to detect whether or not a fall has occurred. The proposed fall detection method achieved an F1-score of 97.68% on the UP Fall dataset, and state-of-art performance on the UR Fall detection database. For post-fall posture classification, the Xception network is shown to achieve an F1-score of 97.85% on our customized post-fall dataset.
Article
Seismic reflection is one of the most widely used geophysical methods in the oil and gas (O&G) industry for hydrocarbon prospecting. In particular, for some Brazilian onshore fields, this method has been used to estimate the location and volume of gas accumulations. However, the analysis and interpretation of seismic data are time-consuming due to the large amount of information and noisy nature of the acquisitions. To help geoscientists with these tasks, computational tools based on machine learning have been proposed considering direct hydrocarbon indicators. In this study, we present a methodology for detecting gas accumulation based on the convolutional long short-term memory model and particle swarm optimization scheme. In the best scenario, the proposed method achieved an F1-score of 84.22%, sensitivity of 98.06%, specificity of 99.44%, and accuracy of 99.42%. We present tests performed on the Parnaíba Basin, indicating that the proposed method is promising for gas exploration.
Article
Full-text available
Accurate and reliable lung nodule segmentation in computed tomography (CT) images is required for early diagnosis of lung cancer. Some of the difficulties in detecting lung nodules include the various types and shapes of lung nodules, lung nodules near other lung structures, and similar visual aspects. This study proposes a new model named Lung_PAYNet, a pyramidal attention-based architecture, for improved lung nodule segmentation in low-dose CT images. In this architecture, the encoder and decoder are designed using an inverted residual block and swish activation function. It also employs a feature pyramid attention network between the encoder and decoder to extract exact dense features for pixel classification. The proposed architecture was compared to the existing UNet architecture, and the proposed methodology yielded significant results. The proposed model was comprehensively trained and validated using the LIDC-IDRI dataset available in the public domain. The experimental results revealed that the Lung_PAYNet delivered remarkable segmentation with a Dice similarity coefficient of 95.7%, mIOU of 91.75%, sensitivity of 92.57%, and precision of 96.75%.
Article
This paper reviews recent developments in deep learning-based crack segmentation methods and investigates their performance under the impact from different image types. Publicly available datasets and commonly adopted performance evaluation metrics are also summarized. Moreover, an image dataset, namely the Fused Image dataset for convolutional neural Network based crack Detection (FIND), was released to the public for deep learning analysis. The FIND dataset consists of four different image types including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused image by combining the raw intensity and raw range image. To validate and demonstrate the performance boost through data fusion, a benchmark study is performed to compare the performance of nine (9) established convolutional neural network architectures trained and tested on the FIND dataset; furthermore, through the cross comparison, the optimal architectures and image types can be determined, offering insights to future studies and applications.
Article
The development of self-driving cars increases driving safety and accelerates urban transportation. These systems must have robust and real-time understanding of traffic conditions and surroundings, both at day and night. Many semantic image segmentation techniques have been proposed based on deep neural networks to partition the traffic scene images as a substantial step. However, the proposed algorithms and public datasets are mostly based on visible images during the daytime. Also, most of these algorithms are computationally intensive. However, little research has been done to date to address the application of the fusion of thermal and visible images and the high-performance low-volume deep convolutional networks. In this paper, a multispectral Encoder Fused Atrous Spatial Pyramid Pooling (EFASPP) U-Net deep network is proposed to merge the features of the visible and thermal images recorded at night traffic scenes. The proposed network is designed based on the structure of the U-Net, due to its high accuracy and speed of processing, as well as no need for large training datasets. The fusion of visible and thermal features in the encoders of EFASPP U-Net network is performed using standard and atrous convolution layers. Also, a new multispectral dataset is developed in this work for night-time traffic scenes due to the lack of sufficient public dataset in this field. The major contributions of this work include a low-volume high-performance multispectral semantic segmentation network for smart vehicles and a new dataset for this application. The experimental results show the high accuracy and speed of the proposed method.
Article
Deep learning methods have been successfully applied to Brain tumor segmentation. However, the extreme data imbalance exists in the different sub-regions of tumor, results in training the deep learning methods on these data will reduce the accuracy of segmentation. We introduce the deep mutual learning strategy to address the challenges, the proposed integrates transformer layers in both encoder and decoder of a U-Net architecture. In the network, using the prediction of up-sampled layer is to deep supervise the training process for enlarging the receptive field to extract features, the feature map of the shallowest layer supervises the subsequent feature map of layers to keep more edge information to guide the sub-region segmentation accuracy. the classification logits of the deepest layer supervise the previous layer of logits to get more semantic information for distinguish of tumor sub-regions. Furthermore, the feature map and the classification logits supervise mutually to improve the overall segmentation accuracy. The experimental results on benchmark dataset shows that our method has significant performance gain over existing methods.
Article
Background Stereotactic radiotherapy is a standard treatment option for patients with brain metastases. The planning target volume is based on gross tumor volume (GTV) segmentation. The aim of this work is to develop and validate a neural network for automatic GTV segmentation to accelerate clinical daily routine practice and minimize interobserver variability. Methods We analyzed MRIs (T1-weighted sequence ± contrast-enhancement, T2-weighted sequence, and FLAIR sequence) from 348 patients with at least one brain metastasis from different cancer primaries treated in six centers. To generate reference segmentations, all GTVs and the FLAIR hyperintense edematous regions were segmented manually. A 3D-U-Net was trained on a cohort of 260 patients from two centers to segment the GTV and the surrounding FLAIR hyperintense region. During training varying degrees of data augmentation were applied. Model validation was performed using an independent international multicenter test cohort (n=88) including four centers. Results Our proposed U-Net reached a mean overall Dice similarity coefficient (DSC) of 0.92 ± 0.08 and a mean individual metastasis-wise DSC of 0.89 ± 0.11 in the external test cohort for GTV segmentation. Data augmentation improved the segmentation performance significantly. Detection of brain metastases was effective with a mean F1-Score of 0.93 ± 0.16. The model performance was stable independent of the center (p = 0.3). There was no correlation between metastasis volume and DSC (Pearson correlation coefficient 0.07). Conclusion Reliable automated segmentation of brain metastases with neural networks is possible and may support radiotherapy planning by providing more objective GTV definitions.
Chapter
We develop a machine-learning image segmentation pipeline that detects ductile (as opposed to brittle) fracture in fractography images. To demonstrate the validity of our approach, use is made of a set of fractography images representing fracture surfaces from cold-spray deposits. The coatings have been subjected to varying heat treatments in an effort to improve their mechanical properties. These treatments yield markedly different microstructures and result in a wide range of mechanical properties that combine brittle and ductile fracture once the materials undergo rupture. To detect regions of ductile fracture, we propose a simple machine learning network based on a 32-layers U-Net framework and trained on a set of small image patches. These regions most often contain small dimples and differ by the surface roughness. Overall, the machine-learning method shows good predictive capabilities when compared to segmentation by a human expert. Finally, we highlight other possible applications and improvements of the proposed method.
Article
Full-text available
The article solves the problem of verifying oil spills on the water surfaces of rivers, seas and oceans using optical aerial photographs, which are obtained from cameras of unmanned aerial vehicles, based on deep learning methods. The specificity of this problem is the presence of areas visually similar to oil spills on water surfaces caused by blooms of specific algae, substances that do not cause environmental damage (for example, palm oil), or glare when shooting (so-called look-alikes). Many studies in this area are based on the analysis of synthetic aperture radars (SAR) images, which do not provide accurate classification and segmentation. Follow-up verification contributes to reducing environmental and property damage, and oil spill size monitoring is used to make further response decisions. A new approach to the verification of optical images as a binary classification problem based on the Siamese network is proposed, when a fragment of the original image is repeatedly compared with representative examples from the class of marine oil slicks. The Siamese network is based on the lightweight VGG16 network. When the threshold value of the output function is exceeded, a decision is made about the presence of an oil spill. To train the networks, we collected and labeled our own dataset from open Internet resources. A significant problem is an imbalance of classes in the dataset, which required the use of augmentation methods based not only on geometric and color manipulations, but also on the application of a Generative Adversarial Network (GAN). Experiments have shown that the classification accuracy of oil spills and look-alikes on the test set reaches values of 0.91 and 0.834, respectively. Further, an additional problem of accurate semantic segmentation of an oil spill is solved using convolutional neural networks (CNN) of the encoder-decoder type. Three deep network architectures U-Net, SegNet, and Poly-YOLOv3 have been explored for segmentation. The Poly-YOLOv3 network demonstrated the best results, reaching an accuracy of 0.97 and an average image processing time of 385 s with the Google Colab web service. A database was also designed to store both original and verified images with problem areas.
Article
Full-text available
Longitudinal magnetic resonance imaging (MRI) has an important role in multiple sclerosis (MS) diagnosis and follow-up. Specifically, the presence of new lesions on brain MRI scans is considered a robust predictive biomarker for the disease progression. New lesions are a high-impact prognostic factor to predict evolution to MS or risk of disability accumulation over time. However, the detection of this disease activity is performed visually by comparing the follow-up and baseline scans. Due to the presence of small lesions, misregistration, and high inter-/intra-observer variability, this detection of new lesions is prone to errors. In this direction, one of the last Medical Image Computing and Computer Assisted Intervention (MICCAI) challenges was dealing with this automatic new lesion quantification. The MSSEG-2: MS new lesions segmentation challenge offers an evaluation framework for this new lesion segmentation task with a large database (100 patients, each with two-time points) compiled from the OFSEP (Observatoire français de la sclérose en plaques) cohort, the French MS registry, including 3D T2-w fluid-attenuated inversion recovery (T2-FLAIR) images from different centers and scanners. Apart from a change in centers, MRI scanners, and acquisition protocols, there are more challenges that hinder the automated detection process of new lesions such as the need for large annotated datasets, which may be not easily available, or the fact that new lesions are small areas producing a class imbalance problem that could bias trained models toward the non-lesion class. In this article, we present a novel automated method for new lesion detection of MS patient images. Our approach is based on a cascade of two 3D patch-wise fully convolutional neural networks (FCNNs). The first FCNN is trained to be more sensitive revealing possible candidate new lesion voxels, while the second FCNN is trained to reduce the number of misclassified voxels coming from the first network. 3D T2-FLAIR images from the two-time points were pre-processed and linearly co-registered. Afterward, a fully CNN, where its inputs were only the baseline and follow-up images, was trained to detect new MS lesions. Our approach obtained a mean segmentation dice similarity coefficient of 0.42 with a detection F1-score of 0.5. Compared to the challenge participants, we obtained one of the highest precision scores (PPVL = 0.52), the best PPVL rate (0.53), and a lesion detection sensitivity (SensL of 0.53).
Article
Fashion image understanding is a popular research field with many different machine learning applications. There have been many studies regarding outfit prediction and outfit composition in the field of fashion. However, there are few works that explain the prediction. This paper addresses a method of diagnosing outfit compatibility through clothing images. The proposed system not only predicts compatibility, but also diagnoses incompatible clothing items in outfits. First, a new dataset named ModAI, which has clothing images and compatibility comments from different users was created. After this, a common compatibility comment was created according to user comments for each clothing image. Lastly, image captioning techniques were used to generate compatibility suggestion texts from clothing images. Different segmentation techniques were also used to improve captioning capabilities. The model achieves a 0.62 BLEU-4 score. Experiments show that image captioning techniques can also be used to diagnose outfit compatibility.
Article
Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in Generative Adversarial Networks (GANs), data synthesis, and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on adversarial networks in the artificial intelligence community.
Article
Liver segmentation is a critical step in liver cancer diagnosis and surgical planning. The U-Net's architecture is one of the most efficient deep networks for medical image segmentation. However, the continuous downsampling operators in U-Net causes the loss of spatial information. To solve these problems, we propose a global context and hybrid attention network, called GCHA-Net, to adaptive capture the structural and detailed features. To capture the global features, a global attention module (GAM) is designed to model the channel and positional dimensions of the interdependencies. To capture the local features, a feature aggregation module (FAM) is designed, where a local attention module (LAM) is proposed to capture the spatial information. LAM can make our model focus on the local liver regions and suppress irrelevant information. The experimental results on the dataset LiTS2017 show that the dice per case (DPC) value and dice global (DG) value of liver were 96.5% and 96.9%, respectively. Compared with the state-of-the-art models, our model has superior performance in liver segmentation. Meanwhile, we test the experiment results on the 3Dircadb dataset, and it shows our model can obtain the highest accuracy compared with the closely related models. From these results, it can been seen that the proposed model can effectively capture the global context information and build the correlation between different convolutional layers. The code is available at the website: https://github.com/HuaxiangLiu/GCAU-Net.
Article
Full-text available
Identifying and locating track areas in images through machine vision technology is the primary task of autonomous UAV inspection. Aiming at the problems that railway track images are greatly affected by light and perspective, the background environment is complex and easy to misidentify, and existing methods are difficult to reason correctly about the obscured track area, this paper proposes a generative adversarial network (GAN)-based railway track precision segmentation framework, RT-GAN. RT-GAN consists of an encoder–decoder generator (named RT-seg) and a patch-based track discriminator. For the generator design, a linear span unit (LSU) and linear extension pyramid (LSP) are used to concatenate network features with different resolutions. In addition, a loss function containing gradient information is designed, and the gradient image of the segmentation result is added into the input of the track discriminator, aiming to guide the generator, RT-seg, to focus on the linear features of the railway tracks faster and more accurately. Experiments on the railway track dataset proposed in this paper show that with the improved loss function and adversarial training, RT-GAN provides a more accurate segmentation of rail tracks than the state-of-the-art techniques and has stronger occlusion inference capabilities, achieving 88.07% and 81.34% IoU in unaugmented and augmented datasets.
Article
Full-text available
Objectives Evaluation and follow-up of idiopathic pulmonary fibrosis (IPF) mainly rely on high-resolution computed tomography (HRCT) and pulmonary function tests (PFTs). The elastic registration technique can quantitatively assess lung shrinkage. We aimed to investigate the correlation between lung shrinkage and morphological and functional deterioration in IPF. Methods Patients with IPF who underwent at least two HRCT scans and PFTs were retrospectively included. Elastic registration was performed on the baseline and follow-up HRCTs to obtain deformation maps of the whole lung. Jacobian determinants were calculated from the deformation fields and after logarithm transformation, log_jac values were represented on color maps to describe morphological deterioration, and to assess the correlation between log_jac values and PFTs. Results A total of 69 patients with IPF (male 66) were included. Jacobian maps demonstrated constriction of the lung parenchyma marked at the lung base in patients who were deteriorated on visual and PFT assessment. The log_jac values were significantly reduced in the deteriorated patients compared to the stable patients. Mean log_jac values showed positive correlation with baseline percentage of predicted vital capacity (VC%) ( r = 0.394, p < 0.05) and percentage of predicted forced vital capacity (FVC%) ( r = 0.395, p < 0.05). Additionally, the mean log_jac values were positively correlated with pulmonary vascular volume ( r = 0.438, p < 0.01) and the number of pulmonary vascular branches ( r = 0.326, p < 0.01). Conclusions Elastic registration between baseline and follow-up HRCT was helpful to quantitatively assess the morphological deterioration of lung shrinkage in IPF, and the quantitative indicator log_jac values were significantly correlated with PFTs. Key Points • The elastic registration on HRCT was helpful to quantitatively assess the deterioration of IPF . • Jacobian logarithm was significantly reduced in deteriorated patients and mean log_jac values were correlated with PFTs . • The mean log_jac values were related to the changes of pulmonary vascular volume and the number of vascular branches .
Article
Full-text available
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefficient (86.58%)
Article
Full-text available
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation[21], where we improve state-of-the-art from 49.7[21] mean AP^r to 59.0, keypoint localization, where we get a 3.3 point boost over[19] and part labeling, where we show a 6.6 point gain over a strong baseline.
Article
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Conference Paper
Full-text available
Contextual information plays an important role in solving vision problems such as image segmentation. However, extracting contextual information and using it in an effective way remains a difficult problem. To address this challenge, we propose a multi-resolution contextual framework, called cascaded hierarchical model (CHM), which learns contextual information in a hierarchical framework for image segmentation. At each level of the hierarchy, a classifier is trained based on down sampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. We repeat this procedure by cascading the hierarchical framework to improve the segmentation accuracy. Multiple classifiers are learned in the CHM, therefore, a fast and accurate classifier is required to make the training tractable. The classifier also needs to be robust against over fitting due to the large number of parameters learned during training. We introduce a novel classification scheme, called logistic disjunctive normal networks (LDNN), which consists of one adaptive layer of feature detectors implemented by logistic sigmoid functions followed by two fixed layers of logical units that compute conjunctions and disjunctions, respectively. We demonstrate that LDNN outperforms state-of-the-art classifiers and can be used in the CHM to improve object segmentation performance.
Article
Full-text available
Motivation: Automatic tracking of cells in multidimensional timelapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this paper, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. Results: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately. Availability and implementation: The challenge website (http://www.codesolorzano.com/celltrackingchallenge) provides access to the training and competition datasets, along with the ground truth of the training videos. It also provides access to Windows and Linux executable files of the evaluation software and most of the algorithms that competed in the challenge. Contact: codesolorzano@unav.es Supplementary information: Supplementary data, including video samples and algorithm descriptions are available at Bioinformatics online.
Article
Full-text available
The analysis of microcircuitry (the connectivity at the level of individual neuronal processes and synapses), which is indispensable for our understanding of brain function, is based on serial transmission electron microscopy (TEM) or one of its modern variants. Due to technical limitations, most previous studies that used serial TEM recorded relatively small stacks of individual neurons. As a result, our knowledge of microcircuitry in any nervous system is very limited. We applied the software package TrakEM2 to reconstruct neuronal microcircuitry from TEM sections of a small brain, the early larval brain of Drosophila melanogaster. TrakEM2 enables us to embed the analysis of the TEM image volumes at the microcircuit level into a light microscopically derived neuro-anatomical framework, by registering confocal stacks containing sparsely labeled neural structures with the TEM image volume. We imaged two sets of serial TEM sections of the Drosophila first instar larval brain neuropile and one ventral nerve cord segment, and here report our first results pertaining to Drosophila brain microcircuitry. Terminal neurites fall into a small number of generic classes termed globular, varicose, axiform, and dendritiform. Globular and varicose neurites have large diameter segments that carry almost exclusively presynaptic sites. Dendritiform neurites are thin, highly branched processes that are almost exclusively postsynaptic. Due to the high branching density of dendritiform fibers and the fact that synapses are polyadic, neurites are highly interconnected even within small neuropile volumes. We describe the network motifs most frequently encountered in the Drosophila neuropile. Our study introduces an approach towards a comprehensive anatomical reconstruction of neuronal microcircuitry and delivers microcircuitry comparisons between vertebrate and insect neuropile.
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Article
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
We address a central problem of neuroanatomy, namely, the automatic segmen-tation of neuronal structures depicted in stacks of electron microscopy (EM) im-ages. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or non-membrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succes-sion of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific post-processing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
The ability of learning networks to generalize can be greatly enhanced by providing constraints from the task domain. This paper demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. This approach has been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. A single network learns the entire recognition operation, going from the normalized image of the character to the final classification.
Discriminative unsupervised feature learning with convolutional neural networks
  • A Dosovitskiy
  • J T Springenberg
  • M Riedmiller
  • T Brox
Deep neural networks segment neuronal membranes in electron microscopy images
  • D C Ciresan
  • L M Gambardella
  • A Giusti
  • J Schmidhuber