Citation: Mazhar, M.; Fakhar, S.;
Rehman, Y. Semantic Segmentation
for Various Applications: Research
Contribution and Comprehensive
Review. Eng. Proc. 2023,32, 21.
Academic Editors: Muhammad
Faizan Shirazi, Saba Javed,
Sundus Ali and Muhammad
Published: 5 May 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
Semantic Segmentation for Various Applications: Research
Contribution and Comprehensive Review †
Madiha Mazhar *, Saba Fakhar and Yawar Rehman
Department of Electronic Engineering, NED University of Engineering and Technology,
Karachi 75270, Pakistan; email@example.com (S.F.); firstname.lastname@example.org (Y.R.)
† Presented at the 2nd International Conference on Emerging Trends in Electronic and Telecommunication
Engineering, Karachi, Pakistan, 15–16 March 2023.
Semantic image segmentation is used to analyse visual content and carry out real-time
decision-making. This narrative literature analysis evaluates the multiple innovations and advance-
ments in the semantic algorithm-based architecture by presenting an overview of the algorithms
used in medical image analysis, lane detection, and face recognition. Numerous groundbreaking
works are examined from a variety of angles (e.g., network structures, algorithms, and the problems
addressed). A review of the recent development in semantic segmentation networks, such as U-Net,
ResNet, SegNet, LCSegnet, FLSNet, and GNet, is presented with evaluation metrics across a range of
applications to facilitate new research in this ﬁeld.
semantic segmentation; encoder decoder; applications; medical imaging; face recognition;
Convolutional neural networks (CNNs) have achieved amazing success in semantic
segmentation in recent years. Semantic segmentation is the labelling of pixels of an image
into different labels, such as cars, pedestrians, and trees. Nowadays, most techniques for
generating pixel-by-pixel segmentation prediction use an encoder–decoder architecture.
The decoder recovers feature map resolution, while the encoder is used to extract the
Due to the signiﬁcant improvement in diagnostic efﬁciency and accuracy, medical
image segmentation frequently plays a crucial part in computer-aided diagnosis and smart
medicine. Liver and liver tumor segmentation [
], as well as brain and brain tumor
], are common medical image segmentation tasks. Moreover, the seg-
mentation of the optic disc [5,6] and cell segmentation , lung segmentation, pulmonary
], and heart image segmentation [
] are commonly used techniques. The
early methods of segmenting medical images frequently rely on edge detection, machine
learning, template matching methods, statistical shape models, active contours, and statisti-
cal shape models. Convolutional Neural Networks—CNNs—(Deep learning models) have
recently proven to be useful for a variety of image segmentation tasks.
2. Applications of Semantic Segmentation
Semantic segmentations have found a variety of applications in many areas, such as
medical diagnostics and scanning, face recognition, scene understanding, autonomous
driving, handwriting recognition, etc. This literature survey covers three broad applications
of semantic segmentation to facilitate researchers in applying the network architectures of
one application to the other application.
Eng. Proc. 2023,32, 21. https://doi.org/10.3390/engproc2023032021 https://www.mdpi.com/journal/engproc
Eng. Proc. 2023,32, 21 2 of 6
2.1. Semantic Segmentation in Medical Imaging
One of the most well-known CNN designs for semantic segmentation is the U-Net
architecture, which has achieved outstanding results in a wide range of medical image
segmentation applications. A novel Dens-Res-Inception Net (DRINet) is proposed in [
to address this challenging problem by learning distinctive features and has found appli-
cations in brain CT, brain tumor, and abdominal CT images. [
] proposed a brand-new
high-resolution multi-scale encoder–decoder network (HMEDN), in which dense multi-
scale connections are provided to allow the encoder–decoder structure to precisely use all
of the available semantic data. Skip connections are added, as well as extra extensively
trained high-resolution pathways (made up of densely connected dilated convolutions)
to gather high-resolution semantic data for precise border localization, which were suc-
cessfully validated on pelvic CT images and a multi-modal brain tumor dataset. In [
an assessment of the prediction uncertainty in FCNs for segmentation was investigated
by systematically comparing cross-entropy loss with Dice loss in terms of segmentation
quality and uncertainty estimation and model ensemble for conﬁdence calibration of the
FCNs trained with batch normalization and Dice loss and tested on applications that in-
cluded the prostate, heart, and brain. For an accurate diagnosis of interstitial lung diseases
] proposed an FCN-based semantic segmentation of ILD pattern recognition to
avoid sliding window model limitations. Training complexities are addressed in [
decomposing a single task into three sub-tasks, such as pixel-wise segmentation, prediction,
and classiﬁcation of an image and a novel sync-regularization was proposed to penalize
the nonconformity between the outputs.
To overcome the drawbacks of feature fusion methods, INet was proposed in [
that used two overlapping max-pooling to extract the sharp features and contributed
positively to applications such as biomedical MRI, X-Ray, CT, and endoscopic imaging. The
automatic identiﬁcation of BAC in mammograms is not yet possible with any currently
used methods. In [
], the UNet model with dense connectivity is proposed that aids
in reusing computation and enhances gradient ﬂow, resulting in greater accuracy and
simpler model training. A novel architecture [
] Multi-Scale Residual Fusion Network
(MSRF-Net) uses a Dual-Scale Dense Fusion (DSDF) block; the proposed MSRF-Net is
able to communicate multi-scale features with different receptive ﬁelds. Table 1illustrates
network architectures, methods, problem addressed, performance metrics, and the regions
Table 1. Network Architectures implementation in Medical Imaging.
S. No. Method/CNN Backbone/Network
Architecture Problem Addressed Performnace
1 UNET DRI-Net Distinctive features
Sensitivity Medical Imaging
2 Encoder–Decoder HMEDN Exploits comprehensive
region of research
is pelvic CT and
3 FCN Predictive uncertainty
Dice Loss, Cross
4 FCN FCN with Dilated
ﬁlters Sliding window model Accuracy Medical Imaging
5 FCN Source image
Dice, IoU Medical Imaging
Eng. Proc. 2023,32, 21 3 of 6
Table 1. Cont.
S. No. Method/CNN Backbone/Network
Architecture Problem Addressed Performnace
INet and Dense INet
compared with Dense
Feature fusion and
Dice ratio, TPR,
7 UNet Re-use of computation
8 CNN MSRF-Net efﬁciently segments
(DSC) Skin lesion
2.2. Semantic Segmentation in Face Recognition
In the realm of machine vision, facial analysis has recently emerged as an active study
subject. Neural networks are trained to accurately predict age classiﬁcation, gender, and
other things by using the extracted characteristics.
A particular type of semantic segmentation is face labelling. The goal of face la-
belling is to give each pixel in a picture a speciﬁc semantic category, such as an eye, brow,
nose, mouth, etc. End-to-end face labelling is proposed in [
] with pyramid FCN while
maintaining a small network size. In order to detect each face in the frame regardless
of alignment, [
] created a binary face classiﬁer and presented a technique for creating
precise face segmentation masks from input images of any size. A method for enhancing
the prediction of facial attributes is discussed in [
]. In this study, we suggest using se-
mantic segmentation to enhance the prediction of facial attributes. FaceNet and VGG-face
were utilized [
] as the foundation for face semantic segmentation, which solves the issue
of exact and pixel-level localization of face regions. A technique for precisely obtaining
facial landmarks is presented in [
] to enhance pixel classiﬁcation performance by altering
the imbalance of the number of pixels in accordance with the facial landmark. Table 2
illustrates network architectures, methods, problem addressed, performance metrics, and
region of interest/application.
Table 2. Network Architectures implementation in Face Recognition.
S. No. Method/CNN Backbone/Network
Architecture Problem addressed Performance Metric
1 (Pyramid FCN) End-to-end face
labelling End-to-end manner Fscore
2 FCN Binary face classiﬁer mask generated from arbitrary size
input image Pixel accuracy
3 FCN improvement in facial attribute
Classiﬁcation error, average
4 FCN Face Semantic
added generalization and features
P(Pearson correlation), MAE,
5 FCN (Facial landmark Net) improved imbalance pixels Pixel accuracy, IoU
6 CNN UNet
Supplemental bypass in the
conventional optical character
recognition (OCR) process
Recall, precision, F-measure
recognition of large-scale Chinese
8 Deep Learning UNet Improve quality of output,
digitization Jaccard index, TN, TP
Eng. Proc. 2023,32, 21 4 of 6
2.3. Semantic Segmentation in Lane Detection
To increase the road safety of cars and reduce road accidents, Advanced Driver
Assistant Systems (ADAS) play a vital role in designing intelligent driving systems (IDSs).
In lane segmentation algorithms, each pixel of an image is labelled into lane and
non-lane classes. Some commonly used lane detection algorithms have been reviewed.
SUPER, a novel lane detection system, was introduced by [
], which consists of a semantic
segmentation network and physics-enhanced multilane parameters with enhanced learning-
based and physics-based techniques. To overcome the drawback of convolutional neural
networks (CNNs), which relies only on information transfer between layers without using
the spatial information within the layers, an attention-based segmentation network SCNN
(Spatial CNN) was proposed by [
]. Further improvements in spatial information in CNN
layers are introduced by [
]. Airborne imagery is proposed by [
] in Aerial LaneNet,
which is based on a Lane-making segmentation network to apprehend bigger areas in a
short span of time. The problem of essential information in features, which is overlooked
by most of the lane segmentation problems, is resolved by [
] in which an aggregator
network based on multiscale features is proposed.
Since pixel-level segmentation is a tedious task and poses a burden on computation,
an alternate scheme wherein grid-level semantic segmentation GNET [
] is proposed. An-
other grid-based segmentation is proposed by [
] for free space and lane-based detection.
Helping blind people in walking and crossing roads is the responsibility of society
and it is also society’s responsibility to efﬁciently design devices that help them in crossing
roads. For this, a low depth semantic segmentation network is proposed [
] for blind
roads and crosswalks. Accurate features are extracted by using the atrous pyramid module.
The dual power of handcrafted features and convolutional neural networks is uti-
lized in [
]. The localization ability is achieved by using hand-crafted features, and the
integration of both also predicts a vanishing line. Semantic segmentation utilizing encoder–
decoder for detecting multiple lines is proposed by [
]. In this work, the pixel accuracy
of weak class objects is improved by depicting a ground truth dataset. Table 3illustrates
network architectures, methods, problem addressed, performance metrics, and region of
Table 3. Network Architectures implementation in Lane Detection.
S. No. Method/CNN Backbone/Network
architecture Problem addressed Performance Metric
1 CNN SUPER Optimization of Lane parameters TPR, FPR, Fmax
(Spatial-SCNN) ABSSNet Spatial information inside the layers MIoU
(Encoder–Decoder) Aerial LaneNet Captures Large area in short span of
Loss function, Dice
Coefﬁcient, Forward time
4 Encoder–Decoder MFIA Lane Simultaneous handling of multiple
perceptual task Accuracy, F1, PA, IoU
5 Encoder–Decoder G-Net Releases the detection burden Accuracy, FP, FN, FPS
6 CNN Network can learn the spatial
relationship for point of interest MIoU
7 Encoder–Decoder Light weight
segmentation network Reduce the number of parameters Computation time per
8 CNN Accuracy of location Correct rate
9 CNN Multilane
encoder–decoder accuracy of weak class objects Speed and accuracy
10 CNN Spatial-SCNN Strong Spatial relationship Accuracy
Eng. Proc. 2023,32, 21 5 of 6
The purpose of the proposed study aims to establish the state of the art as a baseline
for researchers to compare their knowledge of various machine learning and deep learning
techniques for semantic segmentation. In total, 34 research articles were chosen for this
investigation and were gathered from different research databases. It is concluded that
convolutional neural networks and encoder–decoder architectures have been used as a
backbone for implementing semantic segmentation. However, the detection accuracy of
the network depends on the depth of the neural network chosen.
Conceptualization, M.M., S.F. and Y.R.; literature review, M.M. and S.F.;
writing, original draft, M.M. and S.F.; writing, review, Y.R. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conﬂicts of Interest: The authors declare no conﬂict of interest.
Li, W.; Jia, F.; Hu, Q. Automatic Segmentation of Liver Tumor in CT Images with Deep Convolutional Neural Networks. J. Comput.
Commun. 2015,3, 146–151. [CrossRef]
Vivanti, R.; Ephrat, A.; Joskowicz, L.; Karaaslan, O.A.; Lev-Cohain, N.; Sosna, J. Automatic Liver Tumor Segmentation in
Follow-up CT Studies Using Convolutional Neural Networks. Available online: https://www.researchgate.net/proﬁle/Leo-
Method-and-Results.pdf (accessed on 25 January 2023).
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The
Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2014,34, 1993–2024. [CrossRef]
Cherukuri, V.; Ssenyonga, P.; Warf, B.C.; Kulkarni, A.V.; Monga, V.; Schiff, S.J. Learning Based Segmentation of CT Brain Images:
Application to Postoperative Hydrocephalic Scans. IEEE Trans. Biomed. Eng. 2017,65, 1871–1884. [CrossRef] [PubMed]
Cheng, J.; Liu, J.; Xu, Y.; Yin, F.; Wong, D.W.K.; Tan, N.-M.; Tao, D.; Cheng, C.-Y.; Aung, T.; Wong, T.Y. Superpixel Classiﬁcation
Based Optic Disc and Optic Cup Segmentation for Glaucoma Screening. IEEE Trans. Med. Imaging
,32, 1019–1032. [CrossRef]
Fu, H.; Cheng, J.; Xu, Y.; Wong, D.W.K.; Liu, J.; Cao, X. Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep
Network and Polar Transformation. IEEE Trans. Med. Imaging 2018,37, 1597–1605. [CrossRef] [PubMed]
Song, T.-H.; Sanchez, V.; Eidaly, H.; Rajpoot, N.M. Dual-Channel Active Contour Model for Megakaryocytic Cell Segmentation in
Bone Marrow Trephine Histology Images. IEEE Trans. Biomed. Eng. 2017,64, 2913–2923. [CrossRef]
Wang, S.; Zhou, M.; Liu, Z.; Liu, Z.; Gu, D.; Zang, Y.; Dong, D.; Gevaert, O.; Tian, J. Central focused convolutional neural networks:
Developing a data-driven model for lung nodule segmentation. Med. Image Anal. 2017,40, 172–183. [CrossRef]
Onishi, Y.; Teramoto, A.; Tsujimoto, M.; Tsukamoto, T.; Saito, K.; Toyama, H.; Imaizumi, K.; Fujita, H. Multiplanar analysis for
pulmonary nodule classiﬁcation in CT images using deep convolutional neural network and generative adversarial networks. Int.
J. Comput. Assist. Radiol. Surg. 2019,15, 173–178. [CrossRef]
Chen, C.; Qin, C.; Qiu, H.; Tarroni, G.; Duan, J.; Bai, W.; Rueckert, D. Deep Learning for Cardiac Image Segmentation: A Review.
Front. Cardiovasc. Med. 2020,7, 25. [CrossRef]
Wu, F.; Zhuang, X. CF Distance: A New Domain Discrepancy Metric and Application to Explicit Domain Adaptation for
Cross-Modality Cardiac Image Segmentation. IEEE Trans. Med. Imaging 2020,39, 4274–4285. [CrossRef]
Chen, L.; Bentley, P.; Mori, K.; Misawa, K.; Fujiwara, M.; Rueckert, D. DRINet for Medical Image Segmentation. IEEE Trans. Med.
Imaging 2018,37, 2453–2462. [CrossRef] [PubMed]
Zhou, S.; Nie, D.; Adeli, E.; Yin, J.; Lian, J.; Shen, D. High-Resolution Encoder–Decoder Networks for Low-Contrast Medical
Image Segmentation. IEEE Trans. Image Process. 2019,29, 461–475. [CrossRef] [PubMed]
Mehrtash, A.; Wells, W.M.; Tempany, C.M.; Abolmaesumi, P.; Kapur, T. Conﬁdence Calibration and Predictive Uncertainty
Estimation for Deep Medical Image Segmentation. IEEE Trans. Med. Imaging 2020,39, 3868–3878. [CrossRef] [PubMed]
Anthimopoulos, M.M.; Christodoulidis, S.; Ebner, L.; Geiser, T.; Christe, A.; Mougiakakou, S.G. Semantic Segmentation of
Pathological Lung Tissue With Dilated Fully Convolutional Networks. IEEE J. Biomed. Health Inform.
,23, 714–722. [CrossRef]
Eng. Proc. 2023,32, 21 6 of 6
Ren, X.; Ahmad, S.; Zhang, L.; Xiang, L.; Nie, D.; Yang, F.; Wang, Q.; Shen, D. Task Decomposition and Synchronization for
Semantic Biomedical Image Segmentation. IEEE Trans. Image Process. 2020,29, 7497–7510. [CrossRef]
Weng, W.; Zhu, X. INet: Convolutional Networks for Biomedical Image Segmentation. IEEE Access
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image
Computing and Computer-Assisted Intervention 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International
Publishing: Cham, Switzerland, 2015; pp. 234–241. [CrossRef]
Srivastava, A.; Jha, D.; Chanda, S.; Pal, U.; Johansen, H.; Johansen, D.; Riegler, M.; Ali, S.; Halvorsen, P. MSRF-Net: A Multi-Scale
Residual Fusion Network for Biomedical Image Segmentation. IEEE J. Biomed. Health Inform.
,26, 2252–2263. [CrossRef]
Wen, S.; Dong, M.; Yang, Y.; Zhou, P.; Huang, T.; Chen, Y. End-to-End Detection-Segmentation System for Face Labeling. IEEE
Trans. Emerg. Top. Comput. Intell. 2019,5, 457–467. [CrossRef]
Meenpal, T.; Balakrishnan, A.; Verma, A. Facial Mask Detection using Semantic Segmentation. In Proceedings of the 2019 4th
International Conference on Computing, Communications and Security, ICCCS, Rome, Italy, 10–12 October 2019. [CrossRef]
Kalayeh, M.M.; Shah, M. Improving Facial Attribute Prediction using Semantic Segmentation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
Yousaf, N.; Hussein, S.; Sultani, W. Estimation of BMI from facial images using semantic segmentation based region-aware
pooling. Comput. Biol. Med. 2021,133, 104392. [CrossRef]
Kim, H.; Kim, H.; Rew, J.; Hwang, E. FLSNet: Robust Facial Landmark Semantic Segmentation. IEEE Access
Lu, P.; Cui, C.; Xu, S.; Peng, H.; Wang, F. SUPER: A Novel Lane Detection System. IEEE Trans. Intell. Veh.
26. Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as Deep: Spatial CNN for Trafﬁc Scene Understanding. Proc. Conf. AAAI Artif.
Intell. 2018,32, 7276–7283. [CrossRef]
Li, X.; Zhao, Z.; Wang, Q. ABSSNet: Attention-Based Spatial Segmentation Network for Trafﬁc Scene Understanding. IEEE Trans.
Cybern. 2021,52, 9352–9362. [CrossRef] [PubMed]
Azimi, S.M.; Fischer, P.; Korner, M.; Reinartz, P. Aerial LaneNet: Lane-Marking Semantic Segmentation in Aerial Imagery Using
Wavelet-Enhanced Cost-Sensitive Symmetric Fully Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens.
Qiu, Z.; Zhao, J.; Sun, S. MFIALane: Multiscale Feature Information Aggregator Network for Lane Detection. IEEE Trans. Intell.
Transp. Syst. 2022,23, 24263–24275. [CrossRef]
30. Wang, H.; Liu, B. G-NET: Accurate Lane Detection Model for Autonomous Vehicle. IEEE Syst. J. 2022.early access. [CrossRef]
Shao, M.-E.; Haq, M.A.; Gao, D.-Q.; Chondro, P.; Ruan, S.-J. Semantic Segmentation for Free Space and Lane Based on Grid-Based
Interest Point Detection. IEEE Trans. Intell. Transp. Syst. 2021,23, 8498–8512. [CrossRef]
Cao, Z.; Xu, X.; Hu, B.; Zhou, M. Rapid Detection of Blind Roads and Crosswalks by Using a Lightweight Semantic Segmentation
Network. IEEE Trans. Intell. Transp. Syst. 2020,22, 6188–6197. [CrossRef]
Wang, Q.; Han, T.; Qin, Z.; Gao, J.; Li, X. Multitask Attention Network for Lane Detection and Fitting. IEEE Trans. Neural Networks
Learn. Syst. 2020,33, 1066–1078. [CrossRef]
Chougule, S.; Ismail, A.; Soni, A.; Kozonek, N.; Narayan, V.; Schulze, M. An efﬁcient encoder-decoder CNN architecture for
reliable multilane detection in real time. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China,
26–30 June 2018; pp. 1444–1451. [CrossRef]
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.