Fig 2 - uploaded by Mouhammd Alkasassbeh
Content may be subject to copyright.
Random Forest Structure

Random Forest Structure

Source publication
Article
Full-text available
With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with t...

Contexts in source publication

Context 1
... forest is a classification method based on the decision tree algorithm. It is appropriate for enormous datasets for the reason that it can hold a considerable number of variables in the dataset; at the training phase, it builds a group of different decision trees (Fig.2). Where each tree runs on a set of predefined attributes that selected randomly. ...
Context 2
... forest is a classification method based on the decision tree algorithm. It is appropriate for enormous datasets for the reason that it can hold a considerable number of variables in the dataset; at the training phase, it builds a group of different decision trees (Fig.2). Where each tree runs on a set of predefined attributes that selected randomly. ...
Context 3
... forest is a classification method based on the decision tree algorithm. It is appropriate for enormous datasets for the reason that it can hold a considerable number of variables in the dataset; at the training phase, it builds a group of different decision trees (Fig.2). Where each tree runs on a set of predefined attributes that selected randomly. ...

Similar publications

Article
Full-text available
With the development of the Internet of Things for smart grid, the requirement for appliance monitoring has become an important topic. The first and most important step in appliance monitoring is to identify the type of appliance. Most of the existing appliance identification platforms are cloud based, thus they consume large computing resources an...
Article
Full-text available
Federated Learning is a distributed machine learning framework, which can be used in the Internet of Vehicles to train deep learning models without directly accessing the original data of mobile edge vehicle nodes. ECS can access massive data, but it has the characteristics of high latency and high communication overhead. However, mobile edge compu...
Article
Full-text available
Phishing is an attempt to obtain confidential information about a user or an organization. It is an act of impersonating a credible webpage to lure users to expose sensitive data, such as username, password and credit card information. It has cost the online community and various stakeholders hundreds of millions of dollars. There is a need to dete...

Citations

... A special type of spam message, such as phishing email, is reportedly like a legal and original organization sent to users for stealing the information. This email consists of noxious Uniform Resource Locators (URLs) or embedded malicious attachments, which create malware in the user's application and system [11]. Malware is the source of system failure because it destroys the operating system's internet component. ...
Article
Full-text available
Developing technologies over digitalization have become more popular and become a threat to society and cybersecurity. Generally, the phishing method is used by hackers to access the data without the influence of users whose data was stolen. Several techniques are used to detect whether the data is phished or non-phished. Some anti-phishing software is used to identify the phishing data. However, few of these techniques did not provide efficient performance. Hence, the proposed model is introduced to overcome the issues obtained and improve the efficiency of detecting whether the data is phished or non-phished. The data is gathered from the Phishstorm dataset, which is pre-processed using the Z score normalization method and data cleaning. Data balancing is done by the Advanced synthetic sampling approach (Adv-SyN) to balance the dataset, and the features are extracted using a Double self-sparse autoencoder (DSelSa). The Opposition Gazelle optimization algorithm (OpGoA) model is used for optimal feature selection, and finally, the data is classified using Multi Head Depth wise Tern integrated long short term memory (MDepthNet). The sooty tern optimization is used to evaluate the loss function of the network model. The performance of the proposed model is analyzed based on some evaluation metrics and compared with other models, which describes the efficiency of the proposed model. The main objective of proposed technique used to detect the phishing attack and phishing or non-phishing. An automated DL methodology introduced for effective detection of phishing attacks for enhancing the cyber security. The accuracy of the proposed model is obtained as 99.45%, and Precision is 99.45%. RMSE and MSE rate of the proposed model is reduced to 0.73 and 0.05 for better performance.
... Authors Mean Score Naive Bayes [27], [28], [29], [30], [31], [32], [33] 90.4 SVM [28], [29], [34], [30], [32], [35], [33] 87.63 Random Forest [27], [28], [32], [35], [36], [37], [33] 93.71 Logistic Regression [27], [30], [35], [38], [37], [33], [39] 89.68 KNN [32], [37], [33], [40], [41], [39], [42] 87.2 Decision Tree [35], [29], [36], [33], [40], [41], [43] 90.04 [44], [45], [46], [47], [48], [49], [50] 90.0 Naive Bayes [44], [51], [46], [52], [53], [54], [47] 84.6 Random Forest [55], [51], [46], [56], [52], [53], [57] 93.34 Decision Tress [55], [44], [46], [56], [47], [49], [58] 96 XGBoost [55], [56], [50], [57], [59], [60] 96.2 KNN [44], [56], [47], [61], [62], [63], [58] 96.5 Naive Bayes [23], [64], [65], [66], [24], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76] 80.431 SVM [23], [64], [66], [24], [77], [67], [78], [79], [80], [81], [82], [75] 89.429 Random Forest [23], [64], [65], [66], [24], [83], [77], [67], [80], [84], [78], [85], [82] 97.065 Decision Tree [64], [66], [24], [77], [67], [78], [80], [86], [82], [70], [71], [75] 95.248 Logistic Regression [23], [64], [24], [79], [80], [87], [85], [82], [88], [69], [70] 92.589 KNN [23], [64], [66], [24], [67], [84], [79], [80], [87], [85], [89], [82], [71], [75] 90.479 1. Insufficient research on the capability of machine learning algorithms for the detection of drive-by downloads, man-in-the-middle, and Malware attacks. ...
Preprint
Full-text available
To secure computers and information systems from attackers taking advantage of vulnerabilities in the system to commit cybercrime, several methods have been proposed for real-time detection of vulnerabilities to improve security around information systems. Of all the proposed methods, machine learning had been the most effective method in securing a system with capabilities ranging from early detection of software vulnerabilities to real-time detection of ongoing compromise in a system. As there are different types of cyberattacks, each of the existing state-of-the-art machine learning models depends on different algorithms for training which also impact their suitability for detection of a particular type of cyberattack. In this research, we analyzed each of the current state-of-theart machine learning models for different types of cyberattack detection from the past 10 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack
... In this work, document object model, search engines and machine learning techniques were used for the classi ication of webpages. In [34], the authors presented a feature selection and machine learning based study to improve the ef iciency of phishing detection. A dataset consisting of 10,000 phishing webpages (5000 benign and 5000 phishing) with 48 features was used. ...
Chapter
Phishing is one of the biggest issues in the cyberspace. It leads to monetary losses for both public and private industries. The escalating number of phishing attacks is a major concern for security experts. High accuracy phishing attack detection has always been a difficult problem. The conventional tools used for detection of phishing webpages use signature-based methods. These methods are not able to detect zero-day phishing webpages. Thus, security researchers have started to use machine and deep learning algorithms to detect newly created phishing webpages. This chapter studies and compares various machine learning and ensemble methods for classification and detection of phishing webpages. A comparative analysis of machine learning techniques like Naïve Bayes (NB), logistic regression (LR), k-nearest neighbor (k-NN), decision table (DT), random forest (RF) and ensemble methods such as bagging, boosting, stacking and voting methods is carried out. Experiments are conducted on a phishing dataset with 30 features containing 6157 benign and 4898 phishing webpages. Experimental results reveal that the stacking ensemble method provides the best accuracy of 96.987% as compared to other methods used for detecting phishing webpages.
... With the ubiquity of the Internet today, society uses Internet products for various things, such as sharing knowledge, socializing, and conducting multiple financial activities, including purchases, advertising, and sending money [1] [2]. This state has led to the emergence of cybercrime; Cybercrime is using computers, communication devices, or networks as a tool for illicit purposes. ...
Article
Today's world is heading towards complete digital transformation, and with all its advantages, this transformation involves many risks, the most important of which is phishing. This article proposes a system that extracts features from all parts of the email, initially brought from different data sets, and uses one of the machine learning algorithms (K-means algorithm) to extract the valuable features, as used four methods to calculate the distance in the K-means algorithm. This work used SVM as a classifier to classify emails into phishing and legitimate and tuned its parameters to obtain a high percentage of accuracy. The proposed model gave accuracy equal to 98.8 %.
... Almseidin et al. 11 used Google's image search to find phony identities for their study's verification and ML techniques to find the right logo image. They considered the domain name as the logo's identity because of the unique connection between them. ...
... Has proactive phishing URL detection but is dependent on whitelists and blacklists. Ref. 11 Used the relationship between logo and domain name to distinguish between legitimate and phishing sites. Calculated the FP, TP, FN, TN, accuracy, and F1-Score using SVM to classify URLs. ...
Article
Full-text available
In this growing world of the internet, most of our daily routine tasks are somehow connected to the internet, from smartphones to internet of things (IoT) devices to cloud networks. Internet users are growing rapidly, and the internet is accessible to everyone from anywhere. Data phishing is a cyber security attack that uses deception to trick internet users to get their content and information. In this attack, malicious users try to steal personal data such as login credentials, credit card details, health care information, etc., of the users on the internet. They exploit users’ sensitive information using vulnerabilities. Information stealers are known as phishers. Phishers use different techniques for phishing. One of the most common methods is to direct the users to a false website to enter their login credentials and their details on these phishing sites. Phishing websites look like the original websites. Phishers use these details to get access to the user’s accounts and hijack them for monetary purposes. Many internet users fall for this trap of phishing sites and share their personal and sensitive details. In this paper, we will analyze and implement machine learning (ML) techniques to detect phishing attacks. There are different methods to identify phishing attacks, one of them is by checking the uniform resource locator (URL) address using ML. ML is used to teach a machine to differentiate between phishing and original site URLs. There are many different techniques to overcome this attack. This research paper aims to provide accurate and true phishing detection with less time complexity.
... Phishing is a social engineering attack that aims to steal a user's identity data and financial account credentials [1]. Attackers exploit the lack of cybersecurity awareness among users and the insecure Internet protocol to trick users into visiting phishing websites. ...
... The objectives of this research are (1) to study feature profiles for phishing website detection based on URL, (2) to develop a feature selection model for phishing website detection based on URL using machine learning techniques, (3) to validate the website phishing detection model in terms of accuracy, precision, and recall. This study will make three contributions. ...
... The URL-based features depend on the characteristics of the URL, such as the use of IP address, blacklisted words, use of HTTPS, length of URL, etc. The content-based approach requires an in-depth analysis of the pages content [1]. Some of the most common tricks followed by attackers while designing a phishing website are to disallow users to view the source code of the web page. ...
Article
The detection of phishing websites based on machine learning has gained much attention due to its ability to detect newly generated phishing URLs. To detect phishing websites, most techniques combine URLs, web page content, and external features. However, the content of the web page and external features are time-consuming, require large computing power, and are not suitable for resource-constrained devices. To overcome this problem, this study applies feature selection techniques based on the URL to improve the detection process. The methodology for this study consists of seven stages, including data preparation, preprocessing, splitting the dataset into training and validation, feature selection, 10-fold cross-validation, validating the model, and finally performance evaluation. Two public datasets were used to validate the method. TreeSHAP and Information Gain were used to rank features and select the top 10, 15, and 20. These features are fed into three machine learning classifiers which are Naïve Bayes, Random Forest, and XGBoost. Their performance is evaluated based on accuracy, precision, and recall. As a result, the features ranked by TreeSHAP contributed most to improving detection accuracy. The highest accuracy of 98.59 percent was achieved by XGBoost for the first dataset with 15 features. For the second dataset, the highest accuracy is 90.21 percent using 20 features and Random Forest. As for Naïve Bayes, the highest accuracy recorded is 98.49 percent using the first dataset.
... Almseidin et al. [16] developed a machine learning-based phishing detection model with a novel dataset that contains 5000 phishing and 5000 legitimate web pages. Forty-eight were extracted from the dataset, and different feature optimization techniques were applied to improve the performance. ...
... Since the proposed work's objective is to detect phishing URLs, possible URL-related features were collected. The existing works related to phishing URL detection were studied thoroughly along with GitHub codes to gather URL-related features [16], [19], [24]. After a thorough investigation, 46 features were identified as the best-performing features for phishing URL detection. ...
Article
Full-text available
Phishing scams are increasing drastically, which affects Internet users in compromising personal credentials. This paper proposes a novel feature utilization method for phishing URL detection called the Polymorphic property of features. In the initial stage, the URL-related features (46 features) were extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was identified, and they were extracted from different parts of the URL (the domain and path). After extracting the features, various machine learning classification algorithms were applied to build the machine learning model using monomorphic treatment of features, polymorphic treatment of features, and both monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean that the same feature provides different interpretations when considered in different parts of the URL. The machine learning models were built on two different datasets. A comparison of the machine learning models derived from the two datasets reveals the fact that the model built with both monomorphic and polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing works. While testing the model on phishing URL datasets, the most challenging thing we noticed was detecting the phishing URLs with a valid SSL certificate. The existing works on detecting phishing URLs, using only digital certificate-related features, are not up to the mark. We combined certificate-related and URL-related features to improve the performance to address the problem.
... HTML and JavaScript techniques are incorporated within the website's source code for both abnormal-based and HTML/JavaScript features. The downloading of items from other websites is an illustration of an abnormal-based functionality [30]. The dataset's overview is presented in Table 4. Tables 5 and 6 provide the dataset's characteristics. ...
Article
Full-text available
Phishing leverages people's tendency to share personal information online. Phishing attacks often begin with an email and can be used for a variety of purposes. The cybercriminal will employ social engineering techniques to get the target to click on the link in the phishing email, which will take them to the infected website. These attacks become more complex as hackers per-sonalize their fraud and provide convincing messages. Phishing with a malicious URL is an advanced kind of cybercrime. It might be challenging even for cautious users to spot phishing URLs. The researchers displayed different techniques to address this challenge. Machine learning models improve detection by using URLs, web page content and external features. This article presents the findings of an experimental study that attempted to enhance the performance of machine learning models to obtain improved accuracy for the two phishing datasets that are used the most commonly. Three distinct types of tuning factors are utilized, including data balancing, hyper-parameter optimization and feature selection. The experiment utilizes the eight most prevalent machine learning methods and two distinct datasets obtained from online sources, such as the UCI repository and the Mendeley repository. The result demonstrates that data balance improves accuracy marginally, whereas hyperparameter adjustment and feature selection improve accuracy significantly. The performance of machine learning algorithms is improved by combining all fine-tuned factors, outperforming existing research works. The result shows that tuning factors enhance the efficiency of machine learning algorithms. For Dataset-1, Random Forest (RF) and Gradient Boosting (XGB) achieve accuracy rates of 97.44% and 97.47%, respectively. Gradient Boosting (GB) and Extreme Gradient Boosting (XGB) achieve accuracy values of 98.27% and 98.21%, respectively, for Dataset-2.
... Feature extraction is essential in detecting and classifying images in machine learning and deep learning applications. A machine learning approach was used in [1] to extract features using J48, Random Forest and Multi-layer perceptron and observed a 98.11% accuracy for phishing detection. Deep learning algorithms require high computational resources for feature extraction [52]. ...
Article
Full-text available
Skin cancer is an increasing cause of concern among cancers worldwide. There has been extensive research carried out all over the globe for the early detection of skin cancer to increase the life expectancy of patients. The decision support systems and Computer-aided diagnosis systems aid in detecting cancer at an early stage. The increasing ability of Convolutional Neural Networks (CNN) to extract delicate patterns has made it a popular choice in automated decision support systems. This work proposes a novel U-Net segmentation network with Spatial Attention Blocks (SPAB) called SASegNet to segment the skin lesion accurately. The spatial attention blocks emphasize the model to focus on a particular region. The proposed SASegNet model can provide an accuracy of 95% on the PH2 dataset. In this work, EfficientNet B1 is used for classification. The local features from segmentation results are then passed to EfficientNet B1 to extract features for classification. The pre-processed original images are passed to EfficientNet B1 to extract the global features. Finally, these two features are concatenated to extract the best patterns for classification. Experimentation is carried out on the International Skin Imaging Collaboration (ISIC) datasets. The proposed methodology can obtain the Area Under Curve Receiver Operating Characteristic Curve (AUC-ROC) as 0.974, 0.972, 0.962, and 0.937 for the ISIC-2017, 18, 19, and 2020 datasets. The results obtained are the benchmark results to the best of our knowledge. This automated methodology can aid practising dermatologists in a robust diagnosis.
... Authors at [7] propose a machine learning-based detection model and compare various algorithms. They also used various feature selection tools to select the most valuable features in 20 of the 48 features. ...