Conference Paper

A Study on Single and Multi-layer Perceptron Neural Network

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Assessing model performance using metrics such as accuracy, F1 Score, Mean Square Error (MSE), and R2 Score, our approach demonstrated comparable or superior performance to traditional methods. This was consistent across various models, including the Random Forest Classifier [37], XGBoost Classifier [8], and a single-layer Neural Network [38] for classification, and their counterparts for regression tasks. ...
... The classification tasks were performed using the Random Forest Classifier [37], XGBoost Classifier [8], and a single-layer Neural Network [38] trained for 5 epochs each. The regression tasks used the Random Forest Regressor [37], XGBoost Regressor [34], and a single-layer Neural Network [38]. ...
... The classification tasks were performed using the Random Forest Classifier [37], XGBoost Classifier [8], and a single-layer Neural Network [38] trained for 5 epochs each. The regression tasks used the Random Forest Regressor [37], XGBoost Regressor [34], and a single-layer Neural Network [38]. The Random Forest and XGBoost were implemented using scikilearn [12] for both classification and regression tasks and PyTorch for developing and training the Neural Network [31]. ...
... The individual perceptron was first invented by Frank Rosenblatt in 1957 [2], and was used to construct single-layered perceptron, which aimed at performing linearly separable classification. The subsequent invention of the more complex multi-layered perceptron was another milestone in the research of artificial neural networks. ...
... The one used for binary classification problems is the bipolar step function (Figure 3), where +1 is taken for the positive parameter and -1 is taken for the negative parameter, i.e., Figure 3. Bipolar step function. Other common activation functions include step function, Sigmoid function, Tanh function, and ReLu function [2]. Ultimately, the values of +1 or -1 are plotted on the graph, separated by a linear function to indicate which category each of them belongs to. ...
Article
Multi-layered perceptron (MLP) is the first artificial neural network with a complete structure, which is mainly used to perform the tasks of pattern classification and function regression. Its original idea was inspired by biological neural networks in animal brains. Based on the process of electrical signals traveling through biological neural networks, this similar structure was designed to receive, process, and transmit data just like the brain. Multi-layered perceptron uses a feedforward path to complete the prediction task and backpropagation to train itself and optimize its performance. Developing until now, artificial neural network pioneered by multi-layered perceptron has been closely related to our life, and many more advanced derivatives that are good at solving more complex problems have emerged. Although the development of multi-layered perceptrons belongs to artificial intelligence and machine learning, its applications can be helpful to researchers in diverse fields such as engineering, finance, and medicine. This paper will focus on multi-layered perceptron, introduce its developing history, network structure, and algorithm (mainly learning algorithm), and briefly discuss its application in the specific field of biotechnology.
... An artificial neural network consists of three main processing layers: the input, hidden, and output layers. Numerous ANN algorithms are employed for classification tasks, but Multi-layer Perceptron (MLP) [5], [41], [61], [63], Convolutional Neural Network [5], [31], [41], [64], and Long Short-Term Memory Recurrent Neural Network [31], [41], [43], [65]- [67] are most popular for classification prediction in literature. ...
Article
Full-text available
This study presents RetenNet, a comprehensive framework for managing customer churn in telecommunications, integrating predictive modelling, prescriptive optimization, and explainable artificial intelligence (XAI) incorporated with Large Language Models (LLMs). The process commences with the IBM Telco dataset, divided in an 80:20 ratio into training and testing sets. Categorical variables are converted by one-hot and label encoding, whilst class imbalance is mitigated using SMOTEENN. Min-max scaling and mutual information-based feature selection guarantee data appropriateness for machine learning models. Five classification algorithms, i.e., Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), Logistic Regression (LR), and Multi-Layer Percep-tron (MLP) are assessed. The SVM model utilizing an RBF kernel exhibits optimal performance. In conjunction with nested cross-validation, Bayesian optimization guarantees excellent hyperparameter optimization and generalization. Performance is evaluated using the F1-score to highlight the implications of false negatives and false positives in churn situations. The methodology additionally incorporates fuzzy rule-based clustering, facilitating flexibility in customer segment identification for intervention priority. Prescriptive optimization uses linear integer programming to distribute retention budget according to model results and business constraints. SHAP waterfall plot employed to guarantee transparency and facilitate actionable insights. Furthermore, Gemini 1.5 Flash, a multimodal LLM, generates analysis and produces contextual recommendations derived from the SHAP waterfall plot. RetenNet offers a comprehensive and interpretable approach to the churn management pipeline, including classical machine learning, prescriptive optimization, and LLM-driven explainable artificial intelligence to enhance decision-making in customer retention efforts.
... A multi-layer perceptron neural network (MLP) is a type of artificial neural network composed of multiple layers of interconnected nodes, known as neurons, which operate based on assigned weights and biases (Singh & Banerjee 2019). These components collectively enable the network to learn complex patterns by processing input data and comparing the generated outputs with known target values. ...
Article
Full-text available
Urbanization and industrial growth have severely impacted water quality, creating urgent demand for intelligent prediction frameworks to support sustainable urban water management. Traditional monitoring methods are often limited by their inability to capture real-time or high-resolution data, necessitating data-driven alternatives. This study proposes a hybrid deep learning and machine learning framework that classifies Water Quality Index (WQI) categories using Convolutional Neural Networks (CNN), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP). CNN processes time-series groundwater data as 2D matrices, leveraging dilated convolutions and regularization techniques to extract multiscale temporal patterns. The model was trained and validated on post-monsoon groundwater datasets from Telangana, India, employing fourfold cross-validation. Among the algorithms tested, CNN achieved the best performance with RMSE: 0.0654 and R²: 0.9981, reducing prediction error by 18–48% over KNN, NB, and MLP. These results highlight CNN’s superior ability to learn from spatial–temporal dynamics while maintaining computational efficiency. Unlike previous studies focused solely on regression models or singular algorithms, this study combines multiple classifiers into a unified prediction pipeline, enhancing adaptability across heterogeneous datasets. Furthermore, the model’s strong performance relative to state-of-the-art hybrid architectures such as LSTM-GRU and attention-based networks demonstrates its suitability for deployment in regions lacking dense sensor infrastructure. The findings support real-time pollution classification, guiding policy decisions for targeted remediation. Future implementations could integrate real-time sensor data and employ federated learning for broader applicability. This framework offers a robust, scalable, and interpretable solution for sustainable water quality management amid rapid urban expansion and climate variability.
... Prediction algorithms. Prediction models are constructed using tree-based gradient boosting algorithms [49] and neural networks with three hidden layers [50]. Different groups of personal human traits are assessed for the capacity to causally explain shifts between different ballot aggregation methods. ...
Preprint
Full-text available
Voting methods are instrumental design element of democracies. Citizens use them to express and aggregate their preferences to reach a collective decision. However, voting outcomes can be as sensitive to voting rules as they are to people's voting choices. Despite the significance and inter-disciplinary scientific progress on voting methods, several democracies keep relying on outdated voting methods that do not fit modern, pluralistic societies well, while lacking social innovation. Here, we demonstrate how one can upgrade real-world democracies, namely by using alternative preferential voting methods such as cumulative voting and the method of equal shares designed for a proportional representation of voters' preferences. By rigorously assessing a new participatory budgeting approach applied in the city of Aarau, Switzerland, we unravel the striking voting outcomes of fair voting methods: more winning projects with the same budget and broader geographic and preference representation of citizens by the elected projects, in particular for voters who used to be under-represented, while promoting novel project ideas. We provide profound causal evidence showing that citizens prefer proportional voting methods, which possess strong legitimacy without the need of very technical specialized explanations. We also reveal strong underlying democratic values exhibited by citizens who support fair voting methods such as altruism and compromise. These findings come with a global momentum to unleash a new and long-awaited participation blueprint of how to upgrade democracies.
... We used the scikit-learn implementation of LR as a binary classifier with all default values for the hyperparameters. The single-layer perceptron was developed in the 1950's by Frank Rosenblatt and is the most basic form of neural network 25 . The input features are weighted in a linear combination which can either be sent through a sigmoidal activation function for binary classification or through a linear activation function for regression. ...
Preprint
Full-text available
Spaceflight presents unique environmental stressors, such as microgravity and radiation, that significantly affect biological systems at the molecular, cellular, and organismal levels. Female astronauts, in particular, face an increased risk of developing breast cancer due to exposure to ionizing radiation and other spaceflight-related factors. Age also plays a crucial role in the mammary gland’s response to these stressors, with younger organisms generally exhibiting more efficient response mechanisms than older ones. In this study, we utilized an ensemble of machine learning algorithms to analyze gene expression profiles from mammary tissue of young and old female mice exposed to spaceflight to predict age (old vs young) and condition (spaceflight vs ground control). Using the genes our ensemble identified as most predictive, we investigate the molecular pathways involved in spaceflight-related health risks, particularly in the context of breast cancer and cardiovascular health. We identified age-dependent differences in the gene expression profiles of spaceflight-exposed mice compared to ground control. Specifically, younger mice exhibited enriched pathways related to cellular structure, while older mice showed activation of pathways involved in cortisol synthesis and muscle contraction. All mice responded to spaceflight with evidence of elevated lipid metabolic function. These findings highlight the critical role of age in modulating the response to spaceflight-induced stress and suggest that these molecular pathways may contribute to differential outcomes in tissue homeostasis, cardiovascular and metabolic disorders, and breast cancer susceptibility.
... The model's output is then compared with the actual value to determine the error. This calculated error is propagated backward through the network to adjust related parameters, a method known as error backpropagation (Singh & Banerjee, 2019). ...
Article
Full-text available
This research introduces an innovative approach to flood vulnerability reduction by integrating the Particle Swarm Optimization (PSO) algorithm with the Random Forest (RF) model to optimize hyperparameters through a parallel and simultaneous search. This methodology aims to accurately identify flood-prone areas. The study also evaluates the performance of the proposed model against other machine learning models, such as the Alternative Decision Tree (ADTree) and Multilayer Perceptron (MLP). Two datasets were utilized for model analysis: ground-based data, including rainfall, proximity to rivers, and roads, and remote sensing data, including elevation, slope, and land use. The Ottawa-Gatineau region in Canada was chosen for modeling. When both ground and remote sensing data were combined, the RF-PSO model achieved a Kappa coefficient of 0.74, outperforming the ADTree (0.70) and MLP (0.69) models. The study further explored the use of remote sensing data alone, with the RF-PSO model yielding a Kappa coefficient of 0.68, suggesting that even without ground-based data, remote sensing alone can produce reliable results. Notably, when high-resolution remote sensing data was applied, the Kappa coefficient increased to 0.80, demonstrating that improved spatial resolution reduces the dependence on ground-based data, thus enhancing model accuracy. This research highlights the potential of using high-resolution satellite data for flood risk assessment, offering significant insights into crisis management and flood vulnerability reduction.
... The ensemble of (SVM + MLP), (KNN + MLP), and (RF + MLP) learners was applied to the battery charging detection data, fall detection data, and motion state recognition data, respectively, where SVM is the Support Vector Machine [27] model, MLP is the Multi-Layer Perceptron [28] model, KNN is the K-Nearest Neighbors [29] model, and RF is the Random Forest [30] model. The hyperparameters of each model are shown in Table 4. ...
Article
Full-text available
Featured Application This work presents an adaptive weight distribution strategy for bagging-based heterogeneous ensemble learning, with a particular focus on improving classification performance in imbalanced datasets. The proposed method can be applied to tasks such as anomaly detection, where class imbalance is common, and it offers a robust solution for enhancing model accuracy and stability in real-world applications, particularly in domains like fall detection, fault diagnosis, and action identification. Abstract In the field of ensemble learning, bagging and stacking are two widely used ensemble strategies. Bagging enhances model robustness through repeated sampling and weighted averaging of homogeneous classifiers, while stacking improves classification performance by integrating multiple models using meta-learning strategies, taking advantage of the diversity of heterogeneous classifiers. However, the fixed weight distribution strategy in traditional bagging methods often has limitations when handling complex or imbalanced datasets. This paper combines the concept of heterogeneous classifier integration in stacking with the weighted averaging strategy of bagging, proposing a new adaptive weight distribution approach to enhance bagging’s performance in heterogeneous ensemble settings. Specifically, we propose three weight generation functions with “high at both ends, low in the middle” curve shapes and demonstrate the superiority of this strategy over fixed weight methods on two datasets. Additionally, we design a specialized neural network, and by training it adequately, validate the rationality of the proposed adaptive weight distribution strategy, further improving the model’s robustness. The above methods are collectively called func-bagging. Experimental results show that func-bagging has an average 1.810% improvement in extreme performance compared to the base classifier, and is superior to stacking and bagging methods. It also has better dataset adaptability and interpretability than stacking and bagging. Therefore, func-bagging is particularly effective in scenarios with class imbalance and is applicable to classification tasks with imbalanced classes, such as anomaly detection.
... In the literature, many AD approaches based on machine learning exist, including principal component analysis [15], clustering [7,13], Support Vector machines [11], logistic regression [16], and decision trees [22]. Deep learning methods are also utilized, such as long shortterm memory (LSTM) [12], auto-encoders [19], and Multi-Layer Perceptron (MLP) models [26]. Additionally, hybrid and ensemble models combining multiple techniques have been investigated. ...
... All processing units from each layer are connected to all processing units of the next layer. The processing units of the input layer are homogeneous and linear, while neurons with non-linear and continuously differentiable functions are used in the hidden layer [4]. ...
Chapter
Full-text available
Neural networks excel in handling complex, non-linear relationships, making them suitable for predictions where traditional linear models fall short. Applications include forecasting exchange rates, stock prices, and bankruptcy risks, demonstrating superior accuracy compared to conventional methods. In the realm of economics, neural networks facilitate the integration of theories like the Kuznets curve with advanced modeling techniques, allowing for nuanced analyses of economic development and environmental impacts. They also play a critical role in identifying financial risks, enabling policymakers to respond effectively during economic crises. Furthermore, neural networks are instrumental in business and marketing, providing insights into consumer behavior and market demand. They enhance decision support systems, guiding strategic investments and financial decisions. By classifying data sets, they assist in predicting bankruptcy among various economic entities. Overall, the diverse applications of neural networks across disciplines underscore their significance in contemporary research and practical implementations, paving the way for future advancements in artificial intelligence and data analytics. Thus neural networks a type of machine learning, are increasingly used in economics and management due to their ability to analyze complex data and make predictions.
... Tuning hyperparameters, such as the number of hidden layers, neurons per layer and learning rate, is crucial for optimal performance. MLPs are versatile and widely used in image recognition, natural language processing and medical diagnosis [26]. ...
... BioBERT, pre-trained on a vast corpus of biomedical literature, further enhances this capability, making it exceptionally wellsuited for biomedical text mining and QA tasks [6]. Despite these advancements, integrating these models effectively to handle the vast and nuanced biomedical data remains a significant challenge [12]. ...
Preprint
We present a refined approach to biomedical question-answering (QA) services by integrating large language models (LLMs) with Multi-BERT configurations. By enhancing the ability to process and prioritize vast amounts of complex biomedical data, this system aims to support healthcare professionals in delivering better patient outcomes and informed decision-making. Through innovative use of BERT and BioBERT models, combined with a multi-layer perceptron (MLP) layer, we enable more specialized and efficient responses to the growing demands of the healthcare sector. Our approach not only addresses the challenge of overfitting by freezing one BERT model while training another but also improves the overall adaptability of QA services. The use of extensive datasets, such as BioASQ and BioMRC, demonstrates the system's ability to synthesize critical information. This work highlights how advanced language models can make a tangible difference in healthcare, providing reliable and responsive tools for professionals to manage complex information, ultimately serving the broader goal of improved care and data-driven insights.
... IntuWition is inspired by radar polarimetry [84] and employs a vertically-polarized transmitter antenna and three mutually-perpendicular polarized receiving antennas to measure the Wi-Fi signal, as shown in Fig. 6. The measured power is then used to infer the material composition via a multi-layer perceptron model [85]. Additionally, the system can accurately locate surrounding objects by measuring the time of flight along different paths. ...
Article
Full-text available
As an application of fine-grained wireless sensing, RF-based material identification follows the paradigm of RF computing that fetches the information during RF signal propagation. Specifically, the RF signal accesses the objects’ material-related information and carries the information with its electromagnetic properties. With a variety of important applications, research on RF-based material identification has gained significant progress in recent years. However, several fundamental problems remain insufficiently studied, such as the sensing models, signal processing approaches, performance and future extensions. This paper presents the first comprehensive survey of RF-based material identification. According to the basic sensing model used for sensing, we propose a taxonomy to classify the existing works into two categories: reflection-based and penetration-based. The works in each category are further grouped by the type of RF signals used, with elaborated discussion of the detailed approaches and the common challenges. We provide a framework that benchmarks the performance of the existing works, followed by a thorough discussion of future extensions.
... Forward propagation of input data through the network is required for classification, and the output layer assigns probabilities to various classes. The final classification decision is based on the probability with the most significant value [38,39]. ...
Article
Full-text available
Introduction Users are accessing websites for many purposes, such as obtaining information about a particular topic, buying items, accessing their accounts, etc. Cybercriminals use phishing websites to attain the sensitive information of the users, like usernames and passwords, credit card details, etc. Detecting phishing websites helps in protecting the information and the money of people. Machine learning algorithms can be applied to detect phishing websites. Methods In this paper, a model based on various machine learning algorithms is developed to detect phishing websites. The machine learning algorithms used in this model are Decision Tree, Random Forest, Extra Trees, K-Nearest Neighbors, Multilayer Perceptron and Support Vector Machine. The dataset of phishing websites is taken from the Kaggle website. The algorithms mentioned above of the developed model are compared together to identify which algorithm has better classification results. Results The extra trees algorithm offers the best results for accuracy, precision, and F1- Score. This paper also compares the developed model with a previous model that uses the same dataset and relies upon decision tree, random forest, and support vector machine to determine which model has better classification report results. The developed model, depending on the Decision Tree and SVM, offers better classification results than those of the previous models. The developed model is compared with another preceding model relying upon Decision Tree and Random Forest algorithms to determine which model generates better results for accuracy, precision, recall/sensitivity, and F1-Score. Conclusion The developed model, depending on the Decision Tree, presents better results for accuracy, recall, and F1-Score than the results of accuracy, sensitivity, and F1-Score for the preceding model based on the Decision Tree.
... The hyperplane is obtained from the weights and biases in the Neural Network model. In this research, a Multi-Layer Perceptron is used, where the Neural Network model will have a hidden layer [33], [34]. The number of input neurons in a Neural Network will amount to the data attributes used, and the number of output neurons will also adjust to the number of classes contained in the data. ...
Article
Full-text available
Stroke is a disease which cause the death of brain cells, so that the part of the body controlled by the brain loses its function. If not treated immediately, this disease can cause long-term disability, brain damage, and death. In this research, stroke prediction was carried out on the Stroke dataset acquired from the Kaggle dataset using various machine learning models. Then, data sampling techniques are used to handle data imbalance problems in the stroke dataset, which include Random Undersampling, Random Oversampling, and SMOTE techniques. Pearson Correlation and Principal Component Analysis are also used for dimensional reduction and analyzing the important features that are most influential in predicting stroke. Pearson Correlation produces five attributes that have the highest Pearson coefficient, namely age, hypertension, heart disease, blood sugar level, and marital status. Experimental results have demonstrated that the utilization of RUS, ROS, and SMOTE sampling techniques can significantly boost the F1-Score testing by an impressive 43.44%, 34.44%, and 35.55% respectively, as compared to experiments conducted without implementing any data sampling techniques. The highest F1-Score testing was achieved using the Support Vector Machine and Gaussian Naïve Bayes models, namely 0.83.
... The signs "-" and "+" refer to different patterns from a classification perspective, as the linear regression line distinguishes between these two patterns. This limitation necessitates the use of the more advanced multilayer perceptron for a broader range of classification and nonlinear problems/patterns (Singh & Banerjee, 2019). ...
Thesis
Full-text available
Road Traffic Collisions (RTCs) are a leading cause of death and injury globally and are a considerable economic cost. Priority give-way junctions are the most common form of traffic junction in the UK and are associated with a high proportion of all reported collisions. Initially, the form or “design” of priority junctions evolved in the UK, based on the movement needs of particular vehicles. In the early to mid-twentieth century, their designs were formalised into standards. These were then updated to cater for increased car movements, by enlarging the dimensions and visibilities at these junctions; thus, increasing their capacity. It was believed that greater visibilities at the junctions would also improve safety. In the latter part of the 20th century, it was suggested that an unintended consequence of greater visibilities at these junctions was that drivers’ risk perceptions reduce, they drove through the junctions faster, and hence the high visibilities were leading to a greater number of collisions and increased severity. Design guidance was updated to reduce visibilities and dimensions of priority junctions in Manual for Streets (MfS and MfS 2). This study contributes to understanding of the precise nature of the associations between junction visibilities, geometries, vehicle speeds and the frequency of collisions for different road users, making different turning movements. A sample of 120 junctions in Portsmouth UK was used to develop a novel-to-the-field piecewise Structural Equation Model and an Artificial Neural Networks model. Results support the reductions in visibilities proposed in Manual for Streets and suggest particular combinations of geometries and visibilities as being important determinants of specific vehicle-movement accident combinations. The findings may be of interest to traffic engineers, and highway authorities, and could inform further updates to design guidance.
... The model multiplies each feature by its weight, then the weighted features are summed to obtain the scalar product, and the latter is added to the bias that feeds the activation function as the output of the neuron. MLP can approximate smooth and measurable functions by selecting the appropriate connection weights and transfer functions (Singh & Banerjee, 2019). ...
Article
Full-text available
Accurate and reliable forecasting of cancellations is important for successful revenue management in the tourism industry. The objective of this study is to develop classification models to predict hotel booking cancellations. The work involves a number of key steps, such as data preprocessing to properly prepare the data; feature engineering to identify relevant attributes to help improve the predictive ability of the models; hyperparameter settings of the models, including choice of optimizers and incorporation of dropout layers to avoid overfitting in the neural networks; potential overfitting is evaluated using K‐fold cross‐validation; and performance is analysed using the confusion matrix and various performance metrics. The algorithms used are Multilayer Perceptron Neural Network, Radial Basis Function Neural Network, Deep Neural Network, Decision Tree Classifier, Random Forest Classifier, Ada Boost Classifier and XgBoost Classifier. Finally, the results of all models are compared, visualizing Deep Neural Network and XgBoost as the most suitable models for predicting hotel reservation cancellations.
... Figure 3 explains the LSTM cells further. LSTM units are used as the building blocks of a recurrent neural network; LSTM cells can read, write, and delete their memory [15]. In neural networks, especially within long short-term memory (LSTM) cells, the gating system plays a crucial role. ...
... Excessive use of neurons can lead to a significant increase in training time and a potential risk of overfitting the network to the data. While an optimal solution to this problem remains elusive, numerous methodologies have been devised to facilitate the determination of the most suitable number of hidden layers [26,27]. ...
Article
Full-text available
In context of industry 4.0, intelligent manufacturing and maintenance play a significant role. The fault detection and diagnosis (FDD) of industrial gas turbine (IGT) engine is very crucial in smart manufacturing. With the advancement of machine learning and sensor technology, artificial neural network (ANN) and multi-sensor data fusion have made it possible to solve the above issues. In this work, a hybrid model is proposed for the FDD of an IGT engine. Principal component analysis (PCA) is firstly employed to combine the multi-sensor monitoring data as a pre-processing step. The PCA approach has the capacity to glean insights from raw data and optimize the amalgamation of various condition monitoring datasets, with the aim of enhancing accuracy and maximizing the utility of gas turbine information. Later, ANN based FDD method is applied on the fused multiple sensors monitoring data. The present work also implements a comparative account of supervised and unsupervised ANN learning techniques, like multilayer perceptron and self-organizing map, and their pattern classification evaluations. The proposed model facilitates the attainment of early FDD with minimum error and has been validated and tested using real time data from actual operation environments. The data is collected from twin-shaft (18.7 MW) IGT engine as a case study. Results demonstrate that the proposed hybrid model is able to detect the conditions of industrial gas turbine engine with best diagnosis accuracy and calculated errors of 0.00173 and 1.9498. Comparison of two learning techniques demonstrates the superior performance of supervised learning technique.
... Multilayer perceptrons (MLPs) are a class of feedforward artificial neural networks with multiple layers, including input, hidden, and output layers [28]. The MLP is called a feedforward network because it transfers data from input to output in a single direction without requiring loops or feedback connections between neurons. ...
Article
Full-text available
In recent years, Mobile Edge Computing (MEC) has revolutionized the landscape of the telecommunication industry by offering low-latency, high-bandwidth, and real-time processing. With this advancement comes a broad range of security challenges, the most prominent of which is Distributed Denial of Service (DDoS) attacks, which threaten the availability and performance of MEC’s services. In most cases, Intrusion Detection Systems (IDSs), a security tool that monitors networks and systems for suspicious activity and notify administrators in real time of potential cyber threats, have relied on shallow Machine Learning (ML) models that are limited in their abilities to identify and mitigate DDoS attacks. This article highlights the drawbacks of current IDS solutions, primarily their reliance on shallow ML techniques, and proposes a novel hybrid Autoencoder–Multi-Layer Perceptron (AE–MLP) model for intrusion detection as a solution against DDoS attacks in the MEC environment. The proposed hybrid AE–MLP model leverages autoencoders’ feature extraction capabilities to capture intricate patterns and anomalies within network traffic data. This extracted knowledge is then fed into a Multi-Layer Perceptron (MLP) network, enabling deep learning techniques to further analyze and classify potential threats. By integrating both AE and MLP, the hybrid model achieves higher accuracy and robustness in identifying DDoS attacks while minimizing false positives. As a result of extensive experiments using the recently released NF-UQ-NIDS-V2 dataset, which contains a wide range of DDoS attacks, our results demonstrate that the proposed hybrid AE–MLP model achieves a high accuracy of 99.98%. Based on the results, the hybrid approach performs better than several similar techniques.
... The best model is that with the lowest value of the Akaike criterion. Multilayer perceptron (MLP) neural network [64] is a feedforward ANN formed by fully connected neurons organized in minimum three layer. In this article, we used four layers, two of which are hidden. ...
Article
Full-text available
Modeling and forecasting the river flow is essential for the management of water resources. In this study, we conduct a comprehensive comparative analysis of different models built for the monthly water discharge of the Buzău River (Romania), measured in the upper part of the river’s basin from January 1955 to December 2010. They employ convolutional neural networks (CNNs) coupled with long short-term memory (LSTM) networks, named CNN-LSTM, sparrow search algorithm with backpropagation neural networks (SSA-BP), and particle swarm optimization with extreme learning machines (PSO-ELM). These models are evaluated based on various criteria, including computational efficiency, predictive accuracy, and adaptability to different training sets. The models obtained applying CNN-LSTM stand out as top performers, demonstrating a superior computational efficiency and a high predictive accuracy, especially when built with the training set containing the data series from January 1984 (putting the Siriu Dam in operation) to September 2006 (Model type S2). This research provides valuable guidance for selecting and assessing river flow prediction models, offering practical insights for the scientific community and real-world applications. The findings suggest that Model type S2 is the preferred choice for the discharge forecast predictions due to its high computational speed and accuracy. Model type S (considering the training set recorded from January 1955 to September 2006) is recommended as a secondary option. Model type S1 (with the training period January 1955–December 1983) is suitable when the other models are unavailable. This study advances the field of water discharge prediction by presenting a precise comparative analysis of these models and their respective strengths
... On the other hand, with respect to renewable energy systems, as indicated in Table 1, given the inherent instability of such resources, the primary emphasis has been on accurately forecasting the production capacity. Throughout these research endeavors, the predominant methods employed have been LSTM, followed by the combination of LSTM with various other techniques, and ultimately, the utilization of the multi-layer perceptron (MLP) [3] method. ...
Article
Full-text available
This study focuses on using machine learning techniques to accurately predict the generated power in a two-stage back-pressure steam turbine used in the paper production industry. In order to accurately predict power production by a steam turbine, it is crucial to consider the time dependence of the input data. For this purpose, the long-short-term memory (LSTM) approach is employed. Correlation analysis is performed to select parameters with a correlation coefficient greater than 0.8. Initially, nine inputs are considered, and the study showcases the superior performance of the LSTM method, with an accuracy rate of 0.47. Further refinement is conducted by reducing the inputs to four based on correlation analysis, resulting in an improved accuracy rate of 0.39. The comparison between the LSTM method and the Willans line model evaluates the efficacy of the former in predicting production power. The root mean square error (RMSE) evaluation parameter is used to assess the accuracy of the prediction algorithm used for the generator’s production power. By highlighting the importance of selecting appropriate machine learning techniques, high-quality input data, and utilising correlation analysis for input refinement, this work demonstrates a valuable approach to accurately estimating and predicting power production in the energy industry.
... • KNN is a lazy learning method widely used in data mining, especially when datasets have little or no prior knowledge of data distribution [43]. • Multilayer Perceptron (MLP) is a method that repeatedly adjusts the weights and thresholds to minimize the difference between the target output and the resulting output [44], also type of artificial neural network that can be used for deep learning tasks and to learn complex hierarchical representations of data, can handle various types of data ...
Article
Full-text available
In recent years, many studies on medical texts have attracted the attention of researchers. Medical text studies have few multi-label data targets because it is challenging to understand dependencies between labels. Therefore, this study investigates a collection of medical texts by addressing complex problems in the behavioural pattern of Doctor’s answer text in Online Health Consultation (OHC) by suggesting a pattern of six medical interview functions ranging from fostering doctor-patient relationships to treatment-related behaviours and responding to emotions. There are many proposed MLC methods to solve a multi-label problem. However, this study proposes an MLC model that can improve MLC accuracy, especially in multilingual medical datasets: English and Indonesian. This study proposes 16 MLC models using two feature extraction methods, compares all proposed models, and evaluates model performance using three perspectives. The results show that from 3 perspectives, the MLC model that consistently outperforms other models in the English dataset is a T-BR-RF model (TF/IDF, Binary Relevance, and Random Forest). In contrast, using the Indonesian dataset, the T-BR-AD Model (TF/IDF, Binary Relevance and Adaboost) outperforms other MLC models. The feature extraction method that helps optimize the performance of MLC models is TF-IDF compared to the Word2Vec method. read more : https://www.ijeei.org/docs-1697368590653d365d25542.pdf
... In the context of ANN, the term learning refers to the method of varying the weights of the connecting links between the neurons of the network to match to the desired output. Further background on the MLP can be found in [24]. ...
Article
Full-text available
p>Conventional methods of circuit simulation such as full-wave electromagnetic fieldsolvers can be very slow. Machine learning is an emerging technology in modelling, simulation, optimization, and design that present attractive alternatives to the conventional methodologies because they can be trained with a small amount of data, and then used to perform fast circuit predictions within the same design space. In this paper, we present applications of machine learning techniques for the modelling of transmission lines from their impulse reponses. The standard multilayer perceptron (MLP) neural network and the gaussian process (GP) regression techniques are demonstrated, and both models are successfully implemented to model the impulse responses of transmission lines with great accuracies. We show that the GP outperforms the MLP in terms of prediction accuracies and that the GP is more data efficient than the MLP. This is beneficial considering that each training sample is expensive, making the GP a good candidate for the task, compared to the more popular MLP.</p
Article
The continuous exposure of excavator bucket teeth to abrasive materials during excavation causes wear, leading to frequent replacements and unplanned downtime. Therefore, developing reliable models for accurately predicting bucket teeth wear is crucial for effective maintenance strategies. In this study, novel hybrid artificial intelligence models are introduced to achieve accurate predictions of excavator bucket teeth wear. These models were built on the Multilayer Perceptron Neural Network (MLPNN) and optimised using five metaheuristic optimisation techniques: Whale Optimisation Algorithm (WOA), Particle Swarm Optimisation (PSO), Ant Lion Optimisation (ALO), Grey Wolf Optimisation (GWO), and Genetic Algorithm (GA). These optimisation techniques improved the effectiveness of the MLPNN model by adjusting its weights and biases. The accuracy of the optimised models, along with the standalone MLPNN models, was assessed using six statistical indicators: coefficient of determination (R2), relative root mean square error (RRMSE), mean absolute error (MAE), correlation coefficient (R), root mean square error (RMSE) and mean absolute percentage error (MAPE). Additionally, the Bayesian Information Criterion was employed to choose the top-performing predictive method. The statistical results confirmed the superior performance of the ALO-MLPNN hybrid model, which achieved the lowest error values for RMSE (0.22007), RRMSE (0.00597), MAE (0.10014), and MAPE (0.26704), alongside high values for R² (0.99962) and R (0.99981). Additionally, ALO-MLPNN recorded the lowest Bayesian Information Criterion (BIC) value of -245.53662, reinforcing its effectiveness in predicting on-site wear of excavator bucket teeth. These findings emphasise the model’s strong potential to enhance the accuracy of AI-based wear prediction systems.
Article
Full-text available
The milling performance of thermally modified wood is an essential step in its actual processing and production. Accurate prediction of milling performance of thermally modified wood is significantly meaningful for subsequent parameter optimization to improve product surface quality and increase product competitiveness. Hence, based on machine learning, four models, Random Forest (RF), Support Vector Machine (SVM), Gaussian Process Regression (GPR) and Multilayer Perceptron (MLP), were established to predict two milling parameters of thermally modified wood. In addition, four characteristics factors were set up to evaluate the cutting force (F) and the surface roughness (Ra): the modification temperature (T) of thermally modified wood, the depth of cut (h), the feed rate (u), and the spindle speed (n) of the tool during the milling process. In order to reflect the scientific nature of the research process, normal distribution analysis was additionally used as a dataset preprocessing step. The final comparison found the GPR model to be the best fitting and most accurate method for predicting milling performance.
Article
Full-text available
Objectives Accurate identification of molecular subtypes in breast cancer is critical for personalized treatment. This study introduces a novel neural network model, RAE-Net, based on Multimodal Feature Fusion (MFF) and the Evidential Deep Learning Algorithm (EDLA) to improve breast cancer subtype prediction using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Methods A dataset of 344 patients with histologically confirmed breast cancer was divided into training (n = 200), validation (n = 60), and testing (n = 62) cohorts. RAE-Net, built on ResNet-50 with Multi-Head Attention (MHA) fusion and Multi-Layer Perceptron (MLP) mechanisms, combines radiomic and deep learning features for subtype prediction. The EDLA module adds uncertainty estimation to enhance classification reliability. Results The RAE-Net model incorporating the MFF module demonstrated superior performance, achieving a mean accuracy of 0.83 and a Macro-F1 score of 0.78, surpassing traditional radiomics models (accuracy: 0.79, Macro-F1: 0.75) and standalone deep learning models (accuracy: 0.80, Macro-F1: 0.76). When an EDLA uncertainty threshold of 0.2 was applied, the performance significantly improved, with accuracy reaching 0.97 and Macro-F1 increasing to 0.92. Additionally, RAE-Net outperformed two recent deep learning networks, ResGANet and HIFUSE. Specifically, RAE-Net showed a 0.5% improvement in accuracy and a higher AUC compared to ResGANet. In comparison to HIFUSE, RAE-Net reduced both the number of parameters and computational cost by 90% while only increasing computation time by 5.7%. Conclusions RAE-Net integrates feature fusion and uncertainty estimation to predict breast cancer subtypes from DCE-MRI. The model achieves high accuracy while maintaining computational efficiency, demonstrating its potential for clinical use as a reliable and resource-efficient diagnostic tool.
Article
Full-text available
Due to the aging of the global population and lifestyle changes, cardiovascular disease has become the leading cause of death worldwide, causing serious public health problems and economic pressures. Early and accurate prediction of cardiovascular disease is crucial to reducing morbidity and mortality, but traditional prediction methods often lack robustness. This study focuses on integrating swarm intelligence feature selection algorithms (including whale optimization algorithm, cuckoo search algorithm, flower pollination algorithm, Harris hawk optimization algorithm, particle swarm optimization algorithm, and genetic algorithm) with machine learning technology to improve the early diagnosis of cardiovascular disease. This study systematically evaluated the performance of each feature selection algorithm under different population sizes, specifically by comparing their average running time and objective function values to identify the optimal feature subset. Subsequently, the selected feature subsets were integrated into ten classification models, and a comprehensive weighted evaluation was performed based on the accuracy, precision, recall, F1 score, and AUC value of the model to determine the optimal model configuration. The results showed that random forest, extreme gradient boosting, adaptive boosting and k-nearest neighbor models performed best on the combined dataset (weighted score of 1), where the feature set consisted of 9 key features selected by the cuckoo search algorithm when the population size was 25; while on the Framingham dataset, the k-nearest neighbor model performed best (weighted score of 0.92), and its feature set was derived from 10 features selected by the whale optimization algorithm when the population size was 50. The results of this study show that swarm intelligence algorithms can effectively screen key and informative feature sets, significantly improve model classification accuracy, and provide strong support for the early diagnosis of cardiovascular diseases.
Article
In the hospitality business, cancellations negatively affect the precise estimation of revenue management. With today’s powerful computational advances, it is feasible to develop a model to predict cancellations to reduce the risks for business owners. Although these models have not yet been tested in real-world conditions, several prototypes were developed and deployed in two hotels. The their main goal was to study how these models could be incorporated into a decision support system and to assess their influence on demand-management decisions. In our study, we introduce a tree-based neural network (TNN) that combines a tree-based learning algorithm with a feed-forward neural network as a computational method for predicting hotel booking cancellation. Experimental results indicated that the TNN model significantly improved the predictive power on two benchmark datasets compared to tree-based models and baseline artificial neural networks alone. Also, the preliminary success of our study confirmed that tree-based neural networks are promising in dealing with tabular data.
Chapter
The power of deep learning-based models to process and interpret complex biological data has opened new avenues for research and discovery, offering innovative solutions to longstanding and emerging challenges in bioinformatics. This chapter explores how deep learning is researchers solve some of the most pressing challenges in bioinformatics today. From predicting protein structures to speeding the development of new drugs, these models are at the forefront of bioinformatic research, redefining the boundaries of the. The authors explore applications where deep learning has made significant contributions and the synergy between deep learning techniques and bioinformatics, illustrating how this powerful collaboration is helping to advance medical research, drug discovery, and beyond. The current challenges and limitations of such methods are also covered in this chapter. The goal is to provide insights into the transformative impact of deep learning on bioinformatics, underscoring its potential to revolutionize our understanding of life.
Article
Full-text available
Breast Cancer is the leading form of cancer found in women and a major cause of increased mortality rates among them. However, manual diagnosis of the disease is time-consuming and often limited by the availability of screening systems. Thus, there is a pressing need for an automatic diagnosis system that can quickly detect cancer in its early stages. Data mining and machine learning techniques have emerged as valuable tools in developing such a system. In this study we investigated the performance of several machine learning models on the Wisconsin Breast Cancer (original) dataset with a particular emphasis on finding which models perform the best for breast cancer diagnosis. The study also explores the contrast between the proposed ANN methodology and conventional machine learning techniques. The comparison between the methods employed in the current study and those utilized in earlier research on the Wisconsin Breast Cancer dataset is also compared. The findings of this study are in line with those of previous studies which also highlighted the efficacy of SVM, Decision Tree, CART, ANN, and ELM ANN for breast cancer detection. Several classifiers achieved high accuracy, precision and F1 scores for benign and malignant tumours, respectively. It is also found that models with hyperparameter adjustment performed better than those without and boosting methods like as XGBoost, Adaboost, and Gradient Boost consistently performed well across benign and malignant tumours. The study emphasizes the significance of hyperparameter tuning and the efficacy of boosting algorithms in addressing the complexity and nonlinearity of data. Using the Wisconsin Breast Cancer (original) dataset, a detailed summary of the current status of research on breast cancer diagnosis is provided.
Article
Traffic-state forecasting is crucial for traffic management and control strategies, as well as user- and system-level decision-making in the transportation network. While traffic forecasting has been approached with a variety of techniques over the last couple of decades, most approaches simply rely on endogenous traffic variables for state prediction, despite the evidence that exogenous factors can significantly affect traffic conditions. This paper proposes a multidimensional spatiotemporal graph attention-based traffic-prediction approach (M-STGAT), which predicts traffic based on past observations of speed, along with lane-closure events, temperature, and visibility across a large transportation network. The approach is based on a graph attention network architecture, which learns based on the structure of the transportation network on which these variables are observed. Numerical experiments are performed using traffic-speed and lane-closure data from the Caltrans Performance Measurement System (PeMS) and corresponding weather data from the National Oceanic and Atmospheric Administration (NOOA) Automated Surface Observing Systems (ASOS). The numerical experiments implement three alternative models which do not allow for multidimensional input, along with two alternative multidimensional models, based on the literature. The M-STGAT outperforms the five alternative models in validation and testing with the primary data set, as well as for one transfer data set across all three prediction horizons for all error measures. However, the model’s transferability varies for the remaining two transfer data sets, which may require further investigation. The results demonstrate that M-STGAT has the most consistently low error values across all transfer data sets and prediction horizons.
Article
Full-text available
The increasing demand for electricity in daily life highlights the need for Smart Cities (SC) to use energy efficiently. Both technical and Non‐Technical Losses (NTL), particularly those resulting from electricity theft, present powerful obstacles; NTL alone can reach billions of dollars. Although Machine Learning (ML) based approaches for NTL detection have been embraced by numerous utilities, there is still a lack of thorough analysis of these methods. Limited research exists on NTL identification evaluation criteria and unbalanced data management in the context of SC. This research compares ML algorithms and data balancing methods to optimize electricity consumption detection. The given research applied the 15 ML techniques of Logistic regression, Bernoulli naive Bayes, Gaussian naive Bayes, K‐Nearest Neighbour, perceptron, passive‐aggressive classifier, quadratic discriminant analysis, SGD classifier, ridge classifier, linear discriminant analysis, decision tree, nearest centroid classifier, multi‐nomial naive Bayes, complement naive Bayes and dummy classifier. While SMOTE, AdaSyn, NRAS, and CCR are considered for data balancing. AUC, F1‐score, and seven relevant performance metrics were used for comparison. We have also implemented SHapely Additive exPlanations (SHAP) for feature importance and model interpretation. Results show varying classifier performance with different balancing methods, emphasizing data preprocessing's role in NTL detection for smart grid security.
Article
Malware is one of the most common and severe cyber threat today. Malware infects millions of devices and can perform several malicious activities including compromising sensitive data, encrypting data, crippling system performance, and many more. Hence, malware detection is crucial to protect our computers and mobile devices from malware attacks. Recently, deep Learning (DL) has emerged as one of the promising technologies for detecting malware. The recent high production of malware variants against desktop and mobile platforms makes DL algorithms powerful approaches for building scalable and advanced malware detection models as they can handle big datasets. This work explores current deep learning technologies for detecting malware attacks on Windows, Linux, and Android platforms. Specifically, we present different categories of DL algorithms, network optimizers, and regularization methods. Different loss functions, activation functions, and frameworks for implementing DL models are discussed. We also present feature extraction approaches and a review of DL-based models for detecting malware attacks on the above platforms. Furthermore, this work presents major research issues on DL-based malware detection including future research directions to further advance knowledge and research in this field.
Book
This book provides comprehensive coverage of neural networks, their evolution, their structure, the problems they can solve, and their applications. The first half of the book looks at theoretical investigations on artificial neural networks and addresses the key architectures that are capable of implementation in various application scenarios. The second half is designed specifically for the production of solutions using artificial neural networks to solve practical problems arising from different areas of knowledge. It also describes the various implementation details that were taken into account to achieve the reported results. These aspects contribute to the maturation and improvement of experimental techniques to specify the neural network architecture that is most appropriate for a particular application scope. The book is appropriate for students in graduate and upper undergraduate courses in addition to researchers and professionals.
Conference Paper
This paper tries to explain the network structures and methods of single-layer perceptron and multi-layer perceptron. It also analyses the linear division and un-division problems in logical operation performed by single-layer perceptron. XOR is linear un-division operation, which cannot be treated by single-layer perceptron. With the analysis, several solutions are proposed in the paper to solve the problems of XOR. Single-layer perceptron can be improved by multi-layer perceptron, functional perceptron or quadratic function. These solutions are designed and analyzed.
Article
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik 's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximal-margin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
All about biological neurons
  • U Sinha
How To Implement The Perceptron Algorithm From Scratch In Python
  • J Brownlee
Multi-Layer Neural Networks with Sigmoid Function—
  • Kang Nahua
Excitation and Inhibition: The Yin and Yang of the Brain
  • F Frohlich
Artificial neural network : a practical course
  • I N A Silva
  • D Spatti
  • R Flauzino
  • L H B Liboni
  • S Dos Reis Alves
Multi-Layer Neural Networks with Sigmoid Function&#x2014;
  • nahua