Conference PaperPDF Available

Semantic Segmentation for Various Applications: Research Contribution and Comprehensive Review


Abstract and Figures

Semantic image segmentation is used to analyse visual content and carry out real-time decision-making. This narrative literature analysis evaluates the multiple innovations and advancements in the semantic algorithm-based architecture by presenting an overview of the algorithms used in medical image analysis, lane detection, and face recognition. Numerous groundbreaking works are examined from a variety of angles (e.g., network structures, algorithms, and the problems addressed). A review of the recent development in semantic segmentation networks, such as U-Net, ResNet, SegNet, LCSegnet, FLSNet, and GNet, is presented with evaluation metrics across a range of applications to facilitate new research in this field.
Content may be subject to copyright.
Citation: Mazhar, M.; Fakhar, S.;
Rehman, Y. Semantic Segmentation
for Various Applications: Research
Contribution and Comprehensive
Review. Eng. Proc. 2023,32, 21.
Academic Editors: Muhammad
Faizan Shirazi, Saba Javed,
Sundus Ali and Muhammad
Imran Aslam
Published: 5 May 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
Proceeding Paper
Semantic Segmentation for Various Applications: Research
Contribution and Comprehensive Review
Madiha Mazhar *, Saba Fakhar and Yawar Rehman
Department of Electronic Engineering, NED University of Engineering and Technology,
Karachi 75270, Pakistan; (S.F.); (Y.R.)
Presented at the 2nd International Conference on Emerging Trends in Electronic and Telecommunication
Engineering, Karachi, Pakistan, 15–16 March 2023.
Semantic image segmentation is used to analyse visual content and carry out real-time
decision-making. This narrative literature analysis evaluates the multiple innovations and advance-
ments in the semantic algorithm-based architecture by presenting an overview of the algorithms
used in medical image analysis, lane detection, and face recognition. Numerous groundbreaking
works are examined from a variety of angles (e.g., network structures, algorithms, and the problems
addressed). A review of the recent development in semantic segmentation networks, such as U-Net,
ResNet, SegNet, LCSegnet, FLSNet, and GNet, is presented with evaluation metrics across a range of
applications to facilitate new research in this field.
semantic segmentation; encoder decoder; applications; medical imaging; face recognition;
lane detection
1. Introduction
Convolutional neural networks (CNNs) have achieved amazing success in semantic
segmentation in recent years. Semantic segmentation is the labelling of pixels of an image
into different labels, such as cars, pedestrians, and trees. Nowadays, most techniques for
generating pixel-by-pixel segmentation prediction use an encoder–decoder architecture.
The decoder recovers feature map resolution, while the encoder is used to extract the
feature maps.
Due to the significant improvement in diagnostic efficiency and accuracy, medical
image segmentation frequently plays a crucial part in computer-aided diagnosis and smart
medicine. Liver and liver tumor segmentation [
], as well as brain and brain tumor
segmentation [
], are common medical image segmentation tasks. Moreover, the seg-
mentation of the optic disc [5,6] and cell segmentation [7], lung segmentation, pulmonary
nodules [
], and heart image segmentation [
] are commonly used techniques. The
early methods of segmenting medical images frequently rely on edge detection, machine
learning, template matching methods, statistical shape models, active contours, and statisti-
cal shape models. Convolutional Neural Networks—CNNs—(Deep learning models) have
recently proven to be useful for a variety of image segmentation tasks.
2. Applications of Semantic Segmentation
Semantic segmentations have found a variety of applications in many areas, such as
medical diagnostics and scanning, face recognition, scene understanding, autonomous
driving, handwriting recognition, etc. This literature survey covers three broad applications
of semantic segmentation to facilitate researchers in applying the network architectures of
one application to the other application.
Eng. Proc. 2023,32, 21.
Eng. Proc. 2023,32, 21 2 of 6
2.1. Semantic Segmentation in Medical Imaging
One of the most well-known CNN designs for semantic segmentation is the U-Net
architecture, which has achieved outstanding results in a wide range of medical image
segmentation applications. A novel Dens-Res-Inception Net (DRINet) is proposed in [
to address this challenging problem by learning distinctive features and has found appli-
cations in brain CT, brain tumor, and abdominal CT images. [
] proposed a brand-new
high-resolution multi-scale encoder–decoder network (HMEDN), in which dense multi-
scale connections are provided to allow the encoder–decoder structure to precisely use all
of the available semantic data. Skip connections are added, as well as extra extensively
trained high-resolution pathways (made up of densely connected dilated convolutions)
to gather high-resolution semantic data for precise border localization, which were suc-
cessfully validated on pelvic CT images and a multi-modal brain tumor dataset. In [
an assessment of the prediction uncertainty in FCNs for segmentation was investigated
by systematically comparing cross-entropy loss with Dice loss in terms of segmentation
quality and uncertainty estimation and model ensemble for confidence calibration of the
FCNs trained with batch normalization and Dice loss and tested on applications that in-
cluded the prostate, heart, and brain. For an accurate diagnosis of interstitial lung diseases
(ILDs), [
] proposed an FCN-based semantic segmentation of ILD pattern recognition to
avoid sliding window model limitations. Training complexities are addressed in [
] by
decomposing a single task into three sub-tasks, such as pixel-wise segmentation, prediction,
and classification of an image and a novel sync-regularization was proposed to penalize
the nonconformity between the outputs.
To overcome the drawbacks of feature fusion methods, INet was proposed in [
that used two overlapping max-pooling to extract the sharp features and contributed
positively to applications such as biomedical MRI, X-Ray, CT, and endoscopic imaging. The
automatic identification of BAC in mammograms is not yet possible with any currently
used methods. In [
], the UNet model with dense connectivity is proposed that aids
in reusing computation and enhances gradient flow, resulting in greater accuracy and
simpler model training. A novel architecture [
] Multi-Scale Residual Fusion Network
(MSRF-Net) uses a Dual-Scale Dense Fusion (DSDF) block; the proposed MSRF-Net is
able to communicate multi-scale features with different receptive fields. Table 1illustrates
network architectures, methods, problem addressed, performance metrics, and the regions
of interest/application.
Table 1. Network Architectures implementation in Medical Imaging.
S. No. Method/CNN Backbone/Network
Architecture Problem Addressed Performnace
Metric Applications
1 UNET DRI-Net Distinctive features
Dice Coefficient,
Sensitivity Medical Imaging
2 Encoder–Decoder HMEDN Exploits comprehensive
semantic information
Dice Ratio,
region of research
is pelvic CT and
brain Tumor
3 FCN Predictive uncertainty
estimation addressed
Dice Loss, Cross
Entropy loss,
Medical Imaging
4 FCN FCN with Dilated
filters Sliding window model Accuracy Medical Imaging
5 FCN Source image
Training complexities
Loss Function,
Dice, IoU Medical Imaging
Eng. Proc. 2023,32, 21 3 of 6
Table 1. Cont.
S. No. Method/CNN Backbone/Network
Architecture Problem Addressed Performnace
Metric Applications
INet and Dense INet
compared with Dense
Unet and
Feature fusion and
feature concatination
Dice ratio, TPR,
Specificity, TNR,
Biomedical (MRI,
X-Ray, CT,
Endoscopic image,
7 UNet Re-use of computation
Calcification in
8 CNN MSRF-Net efficiently segments
Dice Coefficient
(DSC) Skin lesion
2.2. Semantic Segmentation in Face Recognition
In the realm of machine vision, facial analysis has recently emerged as an active study
subject. Neural networks are trained to accurately predict age classification, gender, and
other things by using the extracted characteristics.
A particular type of semantic segmentation is face labelling. The goal of face la-
belling is to give each pixel in a picture a specific semantic category, such as an eye, brow,
nose, mouth, etc. End-to-end face labelling is proposed in [
] with pyramid FCN while
maintaining a small network size. In order to detect each face in the frame regardless
of alignment, [
] created a binary face classifier and presented a technique for creating
precise face segmentation masks from input images of any size. A method for enhancing
the prediction of facial attributes is discussed in [
]. In this study, we suggest using se-
mantic segmentation to enhance the prediction of facial attributes. FaceNet and VGG-face
were utilized [
] as the foundation for face semantic segmentation, which solves the issue
of exact and pixel-level localization of face regions. A technique for precisely obtaining
facial landmarks is presented in [
] to enhance pixel classification performance by altering
the imbalance of the number of pixels in accordance with the facial landmark. Table 2
illustrates network architectures, methods, problem addressed, performance metrics, and
region of interest/application.
Table 2. Network Architectures implementation in Face Recognition.
S. No. Method/CNN Backbone/Network
Architecture Problem addressed Performance Metric
1 (Pyramid FCN) End-to-end face
labelling End-to-end manner Fscore
2 FCN Binary face classifier mask generated from arbitrary size
input image Pixel accuracy
3 FCN improvement in facial attribute
Classification error, average
4 FCN Face Semantic
added generalization and features
local information
P(Pearson correlation), MAE,
RMS error
5 FCN (Facial landmark Net) improved imbalance pixels Pixel accuracy, IoU
6 CNN UNet
Supplemental bypass in the
conventional optical character
recognition (OCR) process
Recall, precision, F-measure
LCSegNet(Label coding
Segmentation net)
recognition of large-scale Chinese
character recognition
8 Deep Learning UNet Improve quality of output,
digitization Jaccard index, TN, TP
Eng. Proc. 2023,32, 21 4 of 6
2.3. Semantic Segmentation in Lane Detection
To increase the road safety of cars and reduce road accidents, Advanced Driver
Assistant Systems (ADAS) play a vital role in designing intelligent driving systems (IDSs).
In lane segmentation algorithms, each pixel of an image is labelled into lane and
non-lane classes. Some commonly used lane detection algorithms have been reviewed.
SUPER, a novel lane detection system, was introduced by [
], which consists of a semantic
segmentation network and physics-enhanced multilane parameters with enhanced learning-
based and physics-based techniques. To overcome the drawback of convolutional neural
networks (CNNs), which relies only on information transfer between layers without using
the spatial information within the layers, an attention-based segmentation network SCNN
(Spatial CNN) was proposed by [
]. Further improvements in spatial information in CNN
layers are introduced by [
]. Airborne imagery is proposed by [
] in Aerial LaneNet,
which is based on a Lane-making segmentation network to apprehend bigger areas in a
short span of time. The problem of essential information in features, which is overlooked
by most of the lane segmentation problems, is resolved by [
] in which an aggregator
network based on multiscale features is proposed.
Since pixel-level segmentation is a tedious task and poses a burden on computation,
an alternate scheme wherein grid-level semantic segmentation GNET [
] is proposed. An-
other grid-based segmentation is proposed by [
] for free space and lane-based detection.
Helping blind people in walking and crossing roads is the responsibility of society
and it is also society’s responsibility to efficiently design devices that help them in crossing
roads. For this, a low depth semantic segmentation network is proposed [
] for blind
roads and crosswalks. Accurate features are extracted by using the atrous pyramid module.
The dual power of handcrafted features and convolutional neural networks is uti-
lized in [
]. The localization ability is achieved by using hand-crafted features, and the
integration of both also predicts a vanishing line. Semantic segmentation utilizing encoder–
decoder for detecting multiple lines is proposed by [
]. In this work, the pixel accuracy
of weak class objects is improved by depicting a ground truth dataset. Table 3illustrates
network architectures, methods, problem addressed, performance metrics, and region of
Table 3. Network Architectures implementation in Lane Detection.
S. No. Method/CNN Backbone/Network
architecture Problem addressed Performance Metric
1 CNN SUPER Optimization of Lane parameters TPR, FPR, Fmax
(Spatial-SCNN) ABSSNet Spatial information inside the layers MIoU
(Encoder–Decoder) Aerial LaneNet Captures Large area in short span of
Loss function, Dice
Coefficient, Forward time
4 Encoder–Decoder MFIA Lane Simultaneous handling of multiple
perceptual task Accuracy, F1, PA, IoU
5 Encoder–Decoder G-Net Releases the detection burden Accuracy, FP, FN, FPS
6 CNN Network can learn the spatial
relationship for point of interest MIoU
7 Encoder–Decoder Light weight
segmentation network Reduce the number of parameters Computation time per
8 CNN Accuracy of location Correct rate
9 CNN Multilane
encoder–decoder accuracy of weak class objects Speed and accuracy
10 CNN Spatial-SCNN Strong Spatial relationship Accuracy
Eng. Proc. 2023,32, 21 5 of 6
3. Conclusions
The purpose of the proposed study aims to establish the state of the art as a baseline
for researchers to compare their knowledge of various machine learning and deep learning
techniques for semantic segmentation. In total, 34 research articles were chosen for this
investigation and were gathered from different research databases. It is concluded that
convolutional neural networks and encoder–decoder architectures have been used as a
backbone for implementing semantic segmentation. However, the detection accuracy of
the network depends on the depth of the neural network chosen.
Author Contributions:
Conceptualization, M.M., S.F. and Y.R.; literature review, M.M. and S.F.;
writing, original draft, M.M. and S.F.; writing, review, Y.R. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Li, W.; Jia, F.; Hu, Q. Automatic Segmentation of Liver Tumor in CT Images with Deep Convolutional Neural Networks. J. Comput.
Commun. 2015,3, 146–151. [CrossRef]
Vivanti, R.; Ephrat, A.; Joskowicz, L.; Karaaslan, O.A.; Lev-Cohain, N.; Sosna, J. Automatic Liver Tumor Segmentation in
Follow-up CT Studies Using Convolutional Neural Networks. Available online:
and_Results/links/58f84cfd0f7e9bfcf93c1292/Automatic-Liver-Tumor-Segmentation-in-Follow- Up-CT-Scans-Preliminary-
Method-and-Results.pdf (accessed on 25 January 2023).
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The
Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2014,34, 1993–2024. [CrossRef]
Cherukuri, V.; Ssenyonga, P.; Warf, B.C.; Kulkarni, A.V.; Monga, V.; Schiff, S.J. Learning Based Segmentation of CT Brain Images:
Application to Postoperative Hydrocephalic Scans. IEEE Trans. Biomed. Eng. 2017,65, 1871–1884. [CrossRef] [PubMed]
Cheng, J.; Liu, J.; Xu, Y.; Yin, F.; Wong, D.W.K.; Tan, N.-M.; Tao, D.; Cheng, C.-Y.; Aung, T.; Wong, T.Y. Superpixel Classification
Based Optic Disc and Optic Cup Segmentation for Glaucoma Screening. IEEE Trans. Med. Imaging
,32, 1019–1032. [CrossRef]
Fu, H.; Cheng, J.; Xu, Y.; Wong, D.W.K.; Liu, J.; Cao, X. Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep
Network and Polar Transformation. IEEE Trans. Med. Imaging 2018,37, 1597–1605. [CrossRef] [PubMed]
Song, T.-H.; Sanchez, V.; Eidaly, H.; Rajpoot, N.M. Dual-Channel Active Contour Model for Megakaryocytic Cell Segmentation in
Bone Marrow Trephine Histology Images. IEEE Trans. Biomed. Eng. 2017,64, 2913–2923. [CrossRef]
Wang, S.; Zhou, M.; Liu, Z.; Liu, Z.; Gu, D.; Zang, Y.; Dong, D.; Gevaert, O.; Tian, J. Central focused convolutional neural networks:
Developing a data-driven model for lung nodule segmentation. Med. Image Anal. 2017,40, 172–183. [CrossRef]
Onishi, Y.; Teramoto, A.; Tsujimoto, M.; Tsukamoto, T.; Saito, K.; Toyama, H.; Imaizumi, K.; Fujita, H. Multiplanar analysis for
pulmonary nodule classification in CT images using deep convolutional neural network and generative adversarial networks. Int.
J. Comput. Assist. Radiol. Surg. 2019,15, 173–178. [CrossRef]
Chen, C.; Qin, C.; Qiu, H.; Tarroni, G.; Duan, J.; Bai, W.; Rueckert, D. Deep Learning for Cardiac Image Segmentation: A Review.
Front. Cardiovasc. Med. 2020,7, 25. [CrossRef]
Wu, F.; Zhuang, X. CF Distance: A New Domain Discrepancy Metric and Application to Explicit Domain Adaptation for
Cross-Modality Cardiac Image Segmentation. IEEE Trans. Med. Imaging 2020,39, 4274–4285. [CrossRef]
Chen, L.; Bentley, P.; Mori, K.; Misawa, K.; Fujiwara, M.; Rueckert, D. DRINet for Medical Image Segmentation. IEEE Trans. Med.
Imaging 2018,37, 2453–2462. [CrossRef] [PubMed]
Zhou, S.; Nie, D.; Adeli, E.; Yin, J.; Lian, J.; Shen, D. High-Resolution Encoder–Decoder Networks for Low-Contrast Medical
Image Segmentation. IEEE Trans. Image Process. 2019,29, 461–475. [CrossRef] [PubMed]
Mehrtash, A.; Wells, W.M.; Tempany, C.M.; Abolmaesumi, P.; Kapur, T. Confidence Calibration and Predictive Uncertainty
Estimation for Deep Medical Image Segmentation. IEEE Trans. Med. Imaging 2020,39, 3868–3878. [CrossRef] [PubMed]
Anthimopoulos, M.M.; Christodoulidis, S.; Ebner, L.; Geiser, T.; Christe, A.; Mougiakakou, S.G. Semantic Segmentation of
Pathological Lung Tissue With Dilated Fully Convolutional Networks. IEEE J. Biomed. Health Inform.
,23, 714–722. [CrossRef]
Eng. Proc. 2023,32, 21 6 of 6
Ren, X.; Ahmad, S.; Zhang, L.; Xiang, L.; Nie, D.; Yang, F.; Wang, Q.; Shen, D. Task Decomposition and Synchronization for
Semantic Biomedical Image Segmentation. IEEE Trans. Image Process. 2020,29, 7497–7510. [CrossRef]
Weng, W.; Zhu, X. INet: Convolutional Networks for Biomedical Image Segmentation. IEEE Access
,9, 16591–16603.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image
Computing and Computer-Assisted Intervention 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International
Publishing: Cham, Switzerland, 2015; pp. 234–241. [CrossRef]
Srivastava, A.; Jha, D.; Chanda, S.; Pal, U.; Johansen, H.; Johansen, D.; Riegler, M.; Ali, S.; Halvorsen, P. MSRF-Net: A Multi-Scale
Residual Fusion Network for Biomedical Image Segmentation. IEEE J. Biomed. Health Inform.
,26, 2252–2263. [CrossRef]
Wen, S.; Dong, M.; Yang, Y.; Zhou, P.; Huang, T.; Chen, Y. End-to-End Detection-Segmentation System for Face Labeling. IEEE
Trans. Emerg. Top. Comput. Intell. 2019,5, 457–467. [CrossRef]
Meenpal, T.; Balakrishnan, A.; Verma, A. Facial Mask Detection using Semantic Segmentation. In Proceedings of the 2019 4th
International Conference on Computing, Communications and Security, ICCCS, Rome, Italy, 10–12 October 2019. [CrossRef]
Kalayeh, M.M.; Shah, M. Improving Facial Attribute Prediction using Semantic Segmentation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
Yousaf, N.; Hussein, S.; Sultani, W. Estimation of BMI from facial images using semantic segmentation based region-aware
pooling. Comput. Biol. Med. 2021,133, 104392. [CrossRef]
Kim, H.; Kim, H.; Rew, J.; Hwang, E. FLSNet: Robust Facial Landmark Semantic Segmentation. IEEE Access
,8, 116163–116175.
Lu, P.; Cui, C.; Xu, S.; Peng, H.; Wang, F. SUPER: A Novel Lane Detection System. IEEE Trans. Intell. Veh.
,6, 583–593.
26. Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as Deep: Spatial CNN for Traffic Scene Understanding. Proc. Conf. AAAI Artif.
Intell. 2018,32, 7276–7283. [CrossRef]
Li, X.; Zhao, Z.; Wang, Q. ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding. IEEE Trans.
Cybern. 2021,52, 9352–9362. [CrossRef] [PubMed]
Azimi, S.M.; Fischer, P.; Korner, M.; Reinartz, P. Aerial LaneNet: Lane-Marking Semantic Segmentation in Aerial Imagery Using
Wavelet-Enhanced Cost-Sensitive Symmetric Fully Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens.
2920–2938. [CrossRef]
Qiu, Z.; Zhao, J.; Sun, S. MFIALane: Multiscale Feature Information Aggregator Network for Lane Detection. IEEE Trans. Intell.
Transp. Syst. 2022,23, 24263–24275. [CrossRef]
30. Wang, H.; Liu, B. G-NET: Accurate Lane Detection Model for Autonomous Vehicle. IEEE Syst. J. 2022.early access. [CrossRef]
Shao, M.-E.; Haq, M.A.; Gao, D.-Q.; Chondro, P.; Ruan, S.-J. Semantic Segmentation for Free Space and Lane Based on Grid-Based
Interest Point Detection. IEEE Trans. Intell. Transp. Syst. 2021,23, 8498–8512. [CrossRef]
Cao, Z.; Xu, X.; Hu, B.; Zhou, M. Rapid Detection of Blind Roads and Crosswalks by Using a Lightweight Semantic Segmentation
Network. IEEE Trans. Intell. Transp. Syst. 2020,22, 6188–6197. [CrossRef]
Wang, Q.; Han, T.; Qin, Z.; Gao, J.; Li, X. Multitask Attention Network for Lane Detection and Fitting. IEEE Trans. Neural Networks
Learn. Syst. 2020,33, 1066–1078. [CrossRef]
Chougule, S.; Ismail, A.; Soni, A.; Kozonek, N.; Narayan, V.; Schulze, M. An efficient encoder-decoder CNN architecture for
reliable multilane detection in real time. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China,
26–30 June 2018; pp. 1444–1451. [CrossRef]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Methods based on convolutional neural networks have improved the performance of biomedical image segmentation. However, most of these methods cannot efficiently segment objects of variable sizes and train on small and biased datasets, which are common for biomedical use cases. While methods exist that incorporate multi-scale fusion approaches to address the challenges arising with variable sizes, they usually use complex models that are more suitable for general semantic segmentation problems. In this paper, we propose a novel architecture called Multi-Scale Residual Fusion Network (MSRF-Net), which is specially designed for medical image segmentation. The proposed MSRF-Net is able to exchange multi-scale features of varying receptive fields using a Dual-Scale Dense Fusion (DSDF) block. Our DSDF block can exchange information rigorously across two different resolution scales, and our MSRF sub-network uses multiple DSDF blocks in sequence to perform multi-scale fusion. This allows the preservation of resolution, improved information flow and propagation of both high- and low-level features to obtain accurate segmentation maps. The proposed MSRF-Net allows to capture object variabilities and provides improved results on different biomedical datasets. Extensive experiments on MSRF-Net demonstrate that the proposed method outperforms the cutting edge medical image segmentation methods on four publicly available datasets. We achieve the Dice Coefficient (DSC) of 0.9217, 0.9420, and 0.9224, 0.8824 on Kvasir-SEG, CVC-ClinicDB, 2018 Data Science Bowl dataset, and ISIC-2018 skin lesion segmentation challenge dataset respectively. We further conducted generalizability tests that also achieved the highest DSC score with 0.7921 and 0.7575 on CVC-ClinicDB and Kvasir-SEG, respectively.
Full-text available
AI-based lane detection algorithms were actively studied over the last few years. Many have demonstrated superior performance compared with traditional feature-based methods. However, most methods remain riddled with assumptions and limitations, still not good enough for safe and reliable driving in the real world. In this paper, we propose a novel lane detection system, called Scene Understanding Physics-Enhanced Real-time (SUPER) algorithm. The proposed method consists of two main modules: 1) a hierarchical semantic segmentation network as the scene feature extractor and 2) a physics enhanced multi-lane parameter optimization module for lane inference. We train the proposed system using heterogeneous data from Cityscapes, Vistas and Apollo, and evaluate the performance on four completely separate datasets (that were never seen before), including Tusimple, Caltech, URBAN KITTI-ROAD, and Mcity-3000. The proposed approach performs the same or better than lane detection models already trained on the same dataset, and performs well even on datasets it was never trained on. Real-world vehicle tests were also conducted. Preliminary test results show promising real-time lane-detection performance compared with the Mobileye.
Full-text available
Encoder–decoder networks are state-of-the-art approaches to biomedical image segmentation, but have two problems: i.e., the widely used pooling operations may discard spatial information, and therefore low-level semantics are lost. Feature fusion methods can mitigate these problems but feature maps of different scales cannot be easily fused because down- and upsampling change the spatial resolution of feature map. To address these issues, we propose INet, which enlarges receptive fields by increasing the kernel sizes of convolutional layers in steps (e.g., from 3 × 3 to 7 × 7 and then 15 × 15) instead of downsampling. Inspired by an Inception module, INet extracts features by kernels of different sizes through concatenating the output feature maps of all preceding convolutional layers.We also find that the large kernel makes the network feasible for biomedical image segmentation. In addition, INet uses two overlapping max-poolings, i.e., max-poolings with stride 1, to extract the sharpest features. Fixed-size and fixed-channel feature maps enable INet to concatenate feature maps and add multiple shortcuts across layers. In this way, INet can recover low-level semantics by concatenating the feature maps of all preceding layers and expedite the training by adding multiple shortcuts. Because INet has additional residual shortcuts, we compare INet with a UNet system that also has residual shortcuts (ResUNet). To confirm INet as a backbone architecture for biomedical image segmentation, we implement dense connections on INet (called DenseINet) and compare it to a DenseUNet system with residual shortcuts (ResDenseUNet). INet and DenseINet require 16.9% and 37.6% fewer parameters than ResUNet and ResDenseUNet, respectively. In comparison with six encoder–decoder approaches using nine public datasets, INet and DenseINet demonstrate efficient improvements in biomedical image segmentation. INet outperforms DeepLabV3, which implementing atrous convolution instead of downsampling to increase receptive fields. INet also outperforms two recent methods (named HRNet and MS-NAS) that maintain high-resolution representations and repeatedly exchange the information across resolutions.
Lane detection is an essential task in autonomous driving. A good lane detection model should achieve many objectives, such as high accuracy, rapid detection, and low memory. In this article, a grid-based network (G-NET) is designed to realize the aforementioned goals. In G-NET, the traditional pixel-level semantic segmentation is replaced with the area-level grid segmentation to release the detection burden. Then, a position vector is introduced to indicate where lane key point is in the grid. Meanwhile, the novel rolling convolution layer following with the down-sampling and up-sampling convolution layer has been designed for good feature extraction, ensuring each feature grid perceives all other grid features in the feature map. Then, an adaptive hyperparameter branch is introduced to calculate the binary threshold effectively. Finally, the detected lane points are classified into different lanes by introducing distance-based quaternion. G-NET is extensively evaluated on three most widely datasets: TuSimple, CULane, and CurveLanes. The results show that G-NET has a state-of-the-art performance. Meanwhile, field tests are conducted.
Lane detection differs from general object detection in that lane lines are usually long and narrow in the road image, and more attention to image features at different scales is required to reason about lane lines under occlusion, degradation, and bad weather. However, most existing semantic segmentation-based lane detection methods focus on solving the convolutional receptive field through aggregating information vertically and horizontally in the same feature map, which may ignore important information contained in multi-scale features. Besides, the high-level semantic information of whether the lane exists is not fully utilized, as they often add a module at the final stage of the network output to determine whether the lane exists, which is a dispensable for their network. Based on the above analysis, we design a novel lane detection network based on semantic segmentation which consists of a Multi-scale Feature Information Aggregator (MFIA) module and a Channel Attention (CA) module. Many experiments on the TRLane dataset, the generated Lane dataset, BDD100K dataset, TuSimple dataset, VIL-100 dataset and CULane dataset show that our approach can achieve the state-of-the-art performance (our code will be available at ). In addition, considering that different perceptual tasks in autonomous driving are able to share the feature extraction network, we also conduct the experiment for drivable area segmentation on BDD100K dataset. Our approach also achieves good results compared to many existing methods, showing that our proposed model is capable of simultaneously handling multiple perceptual tasks in autonomous driving scenarios.
An increasing number of tasks have been developed for autonomous driving and advanced driver assistance systems. However, this gives rise to the problem of incorporating plural functionalities to be ported into a power-constrained computing device. Therefore, the objective of this work is to alleviate the complex learning procedure of the pixel-wise approach for driving scene understanding. In this paper, we go beyond the pixel-wise detection of the semantic segmentation task as a point detection task and implement it to detect free space and lane. Instead of pixel-wise learning, we trained a single deep convolution neural network for point of interest detection in a grid-based level and followed with a computer vision (CV) based post-processing of end branches corresponding to the characteristic of target classes. To achieve the corresponding final result of pixel-wise detection of semantic segmentation and parametric description of lanes, we propose a CV-based post-processing to decode points of output from the neural network. The final results showed that the network could learn the spatial relationship for point of interest, including the representative points on the contour of the free space segmented region and the representative points along the center of the road lane. We verify our method on two publicly available datasets, which achieved 98.2% mIoU on the KITTI dataset for the evaluation of free space and 97.8% accuracy on the TuSimple dataset (with the field of view below the $y=320$ axis) for the evaluation of the lane.
Body-Mass-Index (BMI) conveys important information about one’s life such as health and socio-economic conditions. Large-scale automatic estimation of BMIs can help predict several societal behaviors such as health, job opportunities, friendships, and popularity. The recent works have either employed hand-crafted geometrical face features or face-level deep convolutional neural network features for face to BMI prediction. The hand-crafted geometrical face feature lack generalizability and face-level deep features don’t have detailed local information. Although useful, these methods missed the detailed local information which is essential for exact BMI prediction. In this paper, we propose to use deep features that are pooled from different face regions (eye, nose, eyebrow, lips, etc.,) and demonstrate that this explicit pooling from face regions can significantly boost the performance of BMI prediction. To address the problem of accurate and pixel-level face regions localization, we propose to use face semantic segmentation in our framework. Extensive experiments are performed using different Convolutional Neural Network (CNN) backbones including FaceNet and VGG-face on three publicly available datasets: VisualBMI, Bollywood and VIP attributes. Experimental results demonstrate that, as compared to the recent works, the proposed Reg-GAP gives a percentage improvement of 22.4% on VIP-attribute, 3.3% on VisualBMI, and 63.09% on the Bollywood dataset.
The location information of road and lane lines is the supremely important thing for the automatic drive and auxiliary drive. The detection accuracy of these two elements dramatically affects the reliability and practicality of the whole system. In real applications, the traffic scene can be very complicated, which makes it particularly challenging to obtain the precise location of road and lane lines. Commonly used deep learning-based object detection models perform pretty well on the lane line and road detection tasks, but they still encounter false detection and missing detection frequently. Besides, existing convolution neural network (CNN) structures only pay attention to the information flow between layers, while it cannot fully utilize the spatial information inside the layers. To address those problems, we propose an attention-based spatial segmentation network for traffic scene understanding. We use the convolutional attention module to improve the network's understanding capacity of spatial location distribution. Spatial CNN (SCNN) obtains through the information flow within one single convolutional layer and improves the spatial relationship modeling ability of the network. The experimental results demonstrate that this method effectively improves the neural network's application ability of the spatial information, thereby improving the effect of traffic scene understanding. Furthermore, a pixel-level road segmentation dataset called NWPU Road Dataset is built to help improve the process of traffic scene understanding.
Many CNN-based segmentation methods have been applied in lane marking detection recently and gain excellent success for a strong ability in modeling semantic information. Although the accuracy of lane line prediction is getting better and better, lane markings' localization ability is relatively weak, especially when the lane marking point is remote. Traditional lane detection methods usually utilize highly specialized handcrafted features and carefully designed postprocessing to detect the lanes. However, these methods are based on strong assumptions and, thus, are prone to scalability. In this work, we propose a novel multitask method that: 1) integrates the ability to model semantic information of CNN and the strong localization ability provided by handcrafted features and 2) predicts the position of vanishing line. A novel lane fitting method based on vanishing line prediction is also proposed for sharp curves and nonflat road in this article. By integrating segmentation, specialized handcrafted features, and fitting, the accuracy of location and the convergence speed of networks are improved. Extensive experimental results on four-lane marking detection data sets show that our method achieves state-of-the-art performance.
Domain adaptation has great values in unpaired cross-modality image segmentation, where the training images with gold standard segmentation are not available from the target image domain. The aim is to reduce the distribution discrepancy between the source and target domains. Hence, an effective measurement for this discrepancy is critical. In this work, we propose a new metric based on characteristic functions of distributions. This metric, referred to as CF distance, enables explicit domain adaptation, in contrast to the implicit manners minimizing domain discrepancy via adversarial training. Based on this CF distance, we propose an unsupervised domain adaptation framework for cross-modality cardiac segmentation, which consists of image reconstruction and prior distribution matching. We validated the method on two tasks, i.e., the CT-MR cross-modality segmentation and the multi-sequence cardiac MR segmentation. Results showed that the proposed explicit metric was effective in domain adaptation, and the segmentation method delivered promising and superior performance, compared to other state-of-the-art techniques. The data and source code of this work has been released via .