Random Forest Structure

Source publication

Phishing Detection Based on Machine Learning and Feature Selection Methods

Article

Full-text available

Dec 2019

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with t...

Context 1

... forest is a classification method based on the decision tree algorithm. It is appropriate for enormous datasets for the reason that it can hold a considerable number of variables in the dataset; at the training phase, it builds a group of different decision trees (Fig.2). Where each tree runs on a set of predefined attributes that selected randomly. ...

View in full-text

Context 2

View in full-text

Context 3

View in full-text

A Smart Model for Web Phishing Detection Based on New Proposed Feature Selection Technique

Article

Full-text available

Jul 2020

Mohamed Ahmed Elrashidy

Phishing Website Detection Using Effective Classifiers and Feature Selection Techniques

Conference Paper

Full-text available

Dec 2019

DeepEdge: A Novel Appliance Identification Edge Platform for Data Gathering, Capturing and Labeling

Article

Full-text available

Mar 2022

With the development of the Internet of Things for smart grid, the requirement for appliance monitoring has become an important topic. The first and most important step in appliance monitoring is to identify the type of appliance. Most of the existing appliance identification platforms are cloud based, thus they consume large computing resources an...

Figure 1. Hierarchical federal learning architecture of Internet of...

Raining Duration of Minist Dataset Under Edge ID and Edge NIID Distribution

Hierarchical federated learning with mobile edge computing in the Internet of Vehicles

Article

Full-text available

Apr 2023

Federated Learning is a distributed machine learning framework, which can be used in the Internet of Vehicles to train deep learning models without directly accessing the original data of mobile edge vehicle nodes. ECS can access massive data, but it has the characteristics of high latency and high communication overhead. However, mobile edge compu...

Feature Selection for Phishing Website Classification

Article

Full-text available

Jan 2020

Phishing is an attempt to obtain confidential information about a user or an organization. It is an act of impersonating a credible webpage to lure users to expose sensitive data, such as username, password and credit card information. It has cost the online community and various stakeholders hundreds of millions of dollars. There is a need to dete...

MDepthNet based phishing attack detection using integrated deep learning methodologies for cyber security enhancement

Article

Full-text available

Feb 2024
CLUSTER COMPUT

Developing technologies over digitalization have become more popular and become a threat to society and cybersecurity. Generally, the phishing method is used by hackers to access the data without the influence of users whose data was stolen. Several techniques are used to detect whether the data is phished or non-phished. Some anti-phishing software is used to identify the phishing data. However, few of these techniques did not provide efficient performance. Hence, the proposed model is introduced to overcome the issues obtained and improve the efficiency of detecting whether the data is phished or non-phished. The data is gathered from the Phishstorm dataset, which is pre-processed using the Z score normalization method and data cleaning. Data balancing is done by the Advanced synthetic sampling approach (Adv-SyN) to balance the dataset, and the features are extracted using a Double self-sparse autoencoder (DSelSa). The Opposition Gazelle optimization algorithm (OpGoA) model is used for optimal feature selection, and finally, the data is classified using Multi Head Depth wise Tern integrated long short term memory (MDepthNet). The sooty tern optimization is used to evaluate the loss function of the network model. The performance of the proposed model is analyzed based on some evaluation metrics and compared with other models, which describes the efficiency of the proposed model. The main objective of proposed technique used to detect the phishing attack and phishing or non-phishing. An automated DL methodology introduced for effective detection of phishing attacks for enhancing the cyber security. The accuracy of the proposed model is obtained as 99.45%, and Precision is 99.45%. RMSE and MSE rate of the proposed model is reduced to 0.73 and 0.05 for better performance.

An Investigation into the Performances of the State-of-the-art Machine Learning Approaches for Various Cyber-attack Detection: A Survey

Preprint

Full-text available

Feb 2024

To secure computers and information systems from attackers taking advantage of vulnerabilities in the system to commit cybercrime, several methods have been proposed for real-time detection of vulnerabilities to improve security around information systems. Of all the proposed methods, machine learning had been the most effective method in securing a system with capabilities ranging from early detection of software vulnerabilities to real-time detection of ongoing compromise in a system. As there are different types of cyberattacks, each of the existing state-of-the-art machine learning models depends on different algorithms for training which also impact their suitability for detection of a particular type of cyberattack. In this research, we analyzed each of the current state-of-theart machine learning models for different types of cyberattack detection from the past 10 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack

Analysis of Ensemble Methods for Phishing Detection

Chapter

Oct 2023

Phishing is one of the biggest issues in the cyberspace. It leads to monetary losses for both public and private industries. The escalating number of phishing attacks is a major concern for security experts. High accuracy phishing attack detection has always been a difficult problem. The conventional tools used for detection of phishing webpages use signature-based methods. These methods are not able to detect zero-day phishing webpages. Thus, security researchers have started to use machine and deep learning algorithms to detect newly created phishing webpages. This chapter studies and compares various machine learning and ensemble methods for classification and detection of phishing webpages. A comparative analysis of machine learning techniques like Naïve Bayes (NB), logistic regression (LR), k-nearest neighbor (k-NN), decision table (DT), random forest (RF) and ensemble methods such as bagging, boosting, stacking and voting methods is carried out. Experiments are conducted on a phishing dataset with 30 features containing 6157 benign and 4898 phishing webpages. Experimental results reveal that the stacking ensemble method provides the best accuracy of 96.987% as compared to other methods used for detecting phishing webpages.

Phishing Attacks Detection by Using Support Vector Machine

Article

Sep 2023

Today's world is heading towards complete digital transformation, and with all its advantages, this transformation involves many risks, the most important of which is phishing. This article proposes a system that extracts features from all parts of the email, initially brought from different data sets, and uses one of the machine learning algorithms (K-means algorithm) to extract the valuable features, as used four methods to calculate the distance in the K-means algorithm. This work used SVM as a classifier to classify emails into phishing and legitimate and tuned its parameters to obtain a high percentage of accuracy. The proposed model gave accuracy equal to 98.8 %.

A Feature Extraction Approach for the Detection of Phishing Websites Using Machine Learning

Article

Full-text available

Jun 2023
J CIRCUIT SYST COMP

In this growing world of the internet, most of our daily routine tasks are somehow connected to the internet, from smartphones to internet of things (IoT) devices to cloud networks. Internet users are growing rapidly, and the internet is accessible to everyone from anywhere. Data phishing is a cyber security attack that uses deception to trick internet users to get their content and information. In this attack, malicious users try to steal personal data such as login credentials, credit card details, health care information, etc., of the users on the internet. They exploit users’ sensitive information using vulnerabilities. Information stealers are known as phishers. Phishers use different techniques for phishing. One of the most common methods is to direct the users to a false website to enter their login credentials and their details on these phishing sites. Phishing websites look like the original websites. Phishers use these details to get access to the user’s accounts and hijack them for monetary purposes. Many internet users fall for this trap of phishing sites and share their personal and sensitive details. In this paper, we will analyze and implement machine learning (ML) techniques to detect phishing attacks. There are different methods to identify phishing attacks, one of them is by checking the uniform resource locator (URL) address using ML. ML is used to teach a machine to differentiate between phishing and original site URLs. There are many different techniques to overcome this attack. This research paper aims to provide accurate and true phishing detection with less time complexity.

Feature Selection toEnhance Phishing Website Detection Based On URL Using Machine Learning Techniques

Article

May 2023

The detection of phishing websites based on machine learning has gained much attention due to its ability to detect newly generated phishing URLs. To detect phishing websites, most techniques combine URLs, web page content, and external features. However, the content of the web page and external features are time-consuming, require large computing power, and are not suitable for resource-constrained devices. To overcome this problem, this study applies feature selection techniques based on the URL to improve the detection process. The methodology for this study consists of seven stages, including data preparation, preprocessing, splitting the dataset into training and validation, feature selection, 10-fold cross-validation, validating the model, and finally performance evaluation. Two public datasets were used to validate the method. TreeSHAP and Information Gain were used to rank features and select the top 10, 15, and 20. These features are fed into three machine learning classifiers which are Naïve Bayes, Random Forest, and XGBoost. Their performance is evaluated based on accuracy, precision, and recall. As a result, the features ranked by TreeSHAP contributed most to improving detection accuracy. The highest accuracy of 98.59 percent was achieved by XGBoost for the first dataset with 15 features. For the second dataset, the highest accuracy is 90.21 percent using 20 features and Random Forest. As for Naïve Bayes, the highest accuracy recorded is 98.49 percent using the first dataset.

PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Polymorphic Treatment of Features

Article

Full-text available

May 2023

Phishing scams are increasing drastically, which affects Internet users in compromising personal credentials. This paper proposes a novel feature utilization method for phishing URL detection called the Polymorphic property of features. In the initial stage, the URL-related features (46 features) were extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was identified, and they were extracted from different parts of the URL (the domain and path). After extracting the features, various machine learning classification algorithms were applied to build the machine learning model using monomorphic treatment of features, polymorphic treatment of features, and both monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean that the same feature provides different interpretations when considered in different parts of the URL. The machine learning models were built on two different datasets. A comparison of the machine learning models derived from the two datasets reveals the fact that the model built with both monomorphic and polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing works. While testing the model on phishing URL datasets, the most challenging thing we noticed was detecting the phishing URLs with a valid SSL certificate. The existing works on detecting phishing URLs, using only digital certificate-related features, are not up to the mark. We combined certificate-related and URL-related features to improve the performance to address the problem.

Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection

Article

Full-text available

Mar 2023

Phishing leverages people's tendency to share personal information online. Phishing attacks often begin with an email and can be used for a variety of purposes. The cybercriminal will employ social engineering techniques to get the target to click on the link in the phishing email, which will take them to the infected website. These attacks become more complex as hackers per-sonalize their fraud and provide convincing messages. Phishing with a malicious URL is an advanced kind of cybercrime. It might be challenging even for cautious users to spot phishing URLs. The researchers displayed different techniques to address this challenge. Machine learning models improve detection by using URLs, web page content and external features. This article presents the findings of an experimental study that attempted to enhance the performance of machine learning models to obtain improved accuracy for the two phishing datasets that are used the most commonly. Three distinct types of tuning factors are utilized, including data balancing, hyper-parameter optimization and feature selection. The experiment utilizes the eight most prevalent machine learning methods and two distinct datasets obtained from online sources, such as the UCI repository and the Mendeley repository. The result demonstrates that data balance improves accuracy marginally, whereas hyperparameter adjustment and feature selection improve accuracy significantly. The performance of machine learning algorithms is improved by combining all fine-tuned factors, outperforming existing research works. The result shows that tuning factors enhance the efficiency of machine learning algorithms. For Dataset-1, Random Forest (RF) and Gradient Boosting (XGB) achieve accuracy rates of 97.44% and 97.47%, respectively. Gradient Boosting (GB) and Extreme Gradient Boosting (XGB) achieve accuracy values of 98.27% and 98.21%, respectively, for Dataset-2.

An automated multi-class skin lesion diagnosis by embedding local and global features of Dermoscopy images

Article

Full-text available

Mar 2023
MULTIMED TOOLS APPL

Skin cancer is an increasing cause of concern among cancers worldwide. There has been extensive research carried out all over the globe for the early detection of skin cancer to increase the life expectancy of patients. The decision support systems and Computer-aided diagnosis systems aid in detecting cancer at an early stage. The increasing ability of Convolutional Neural Networks (CNN) to extract delicate patterns has made it a popular choice in automated decision support systems. This work proposes a novel U-Net segmentation network with Spatial Attention Blocks (SPAB) called SASegNet to segment the skin lesion accurately. The spatial attention blocks emphasize the model to focus on a particular region. The proposed SASegNet model can provide an accuracy of 95% on the PH2 dataset. In this work, EfficientNet B1 is used for classification. The local features from segmentation results are then passed to EfficientNet B1 to extract features for classification. The pre-processed original images are passed to EfficientNet B1 to extract the global features. Finally, these two features are concatenated to extract the best patterns for classification. Experimentation is carried out on the International Skin Imaging Collaboration (ISIC) datasets. The proposed methodology can obtain the Area Under Curve Receiver Operating Characteristic Curve (AUC-ROC) as 0.974, 0.972, 0.962, and 0.937 for the ISIC-2017, 18, 19, and 2020 datasets. The results obtained are the benchmark results to the best of our knowledge. This automated methodology can aid practising dermatologists in a robust diagnosis.

From Phishing Behavior Analysis and Feature Selection to Enhance Prediction Rate in Phishing Detection

Article

Full-text available

Jan 2023

Random Forest Structure

Contexts in source publication

Similar publications

Citations