Conference Paper

Users' behavioral prediction for phishing detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study explores the users' web browsing behaviors that confront phishing situations for context-aware phishing detection. We extract discriminative features of each clicked URL, i.e., domain name, bag-of-words, generic Top-Level Domains, IP address, and port number, to develop a linear chain CRF model for users' behavioral prediction. Large-scale experiments show that our method achieves promising performance for predicting the phishing threats of users' next accesses. Error analysis indicates that our model results in a favorably low false positive rate. In practice, our solution is complementary to the existing anti-phishing techniques for cost-effectively blocking phishing threats from users' behavioral perspectives.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... [ a Yahoo URL Generator (http://random.yahoo.com/bin/ryl) b CommonCrawl [33], Gmail directory [34], Weibo Sina API [63], Google Top 1000, Yahoo, Netcraft, Millersmiles [54] c URL feeds and logs from Web [55], Cyveillance [64], Bitdefender Laboratories [68] d Datasets collected by authors in [22], [56], [57] e Phishing messages from Weibo Sina API [63], URLs from UAB Spam DataMine email messages [38], [64] f Click-through data from the Trend Micro research laboratory [69], Private honeypot [67], T.Co. Labs [66] [77] a MalwareDomainslist [35] and malwareurl [48] b Malware private sources [77] imbalanced, or almost balanced, 9 data. ...
... , [29], [66], [69], [76], [77], [84] ≥4 5 [21], [41], [49], [67], [88] We next describe the different evaluation metrics used in phishing URL detection papers. ...
... Misspelled/Bad domain name 8 U [36], [42], [43], [56], [68], [71], [79], [88] Top Level Domain features 22 U,W,E [36], [38], [43], [46], [64], [68], [69], [77], [84], uW: [104], [106], [109], [112], [137], [139], [152], [154], [156], uE: [162], [169], [183], [193] TTL c value of DNS 9 U,W,E [28], [43], [47], [69], uW: [92], [101], [131], [144], uE: [189] Age of Domain 13 U,W,E [52], uW: [106], [112], [118], [119], [123], [124], [126], [131], [138], [150], [154], uE: [183] Ranking d based features 30 U,W,E [28], [33], [42], [49]- [52], [56], [58], [65], [75], [80], [87], uE: [183], [189] uW: [93], [99], [101], [104], [106], [110], [112], [113], [119], [121], [125], [130], [131], [138], [154] Hostname based ...
Article
Phishing and spear phishing are typical examples of masquerade attacks since trust is built up through impersonation for the attack to succeed. Given the prevalence of these attacks, considerable research has been conducted on these problems along multiple dimensions. We reexamine the existing research on phishing and spear phishing from the perspective of the unique needs of the security domain, which we call security challenges: real-time detection, active attacker, dataset quality and base-rate fallacy. We explain these challenges and then survey the existing phishing/spear phishing solutions in their light. This viewpoint consolidates the literature and illuminates several opportunities for improving existing solutions. We organize the existing literature based on detection techniques for different attack vectors (e.g., URLs, websites, emails) along with studies on user awareness. For detection techniques we examine properties of the dataset, feature extraction, detection algorithms used, and performance evaluation metrics. This work can help guide the development of more effective defenses for phishing, spear phishing and email masquerade attacks of the future, as well as provide a framework for a thorough evaluation and comparison.
... [35], [48] [59] Twitter (pub.) [29], [30], [ a CommonCrawl [33], Gmail directory [34], Weibo Sina API [64] b URL feeds and logs from Web [55], Cyveillance [65], Bitdefender Laboratories [69] c Datasets collected by authors in [22], [56], [57] d Phishing messages from Weibo Sina API [64], URLs from UAB Spam DataMine email messages [38], [65] e Click-through data from the Trend Micro research laboratory [70], Private honeypot [68], T.Co. Labs [67] f Yahoo URL Generator (http://random.yahoo.com/bin/ryl) ...
... , [29], [67], [70], [77], [78], [85] ≥4 5 [21], [41], [49], [89], [68] We next describe the different evaluation metrics used in phishing URL detection papers. ...
... Misspelled/Bad domain name 8 U [36], [42], [43], [56], [69], [72], [80], [89] Top Level Domain features 22 U,W,E [36], [38], [43], [46], [65], [69], [70], [78], [85], uW: [105], [107], [110], [113], [138], [139], [152], [154], [157], uE: [163], [170], [182], [194] TTL c value of DNS 9 U,W,E [28], [43], [47], [70], uW: [93], [102], [132], [144], uE: [186] Age of Domain 13 U,W,E [52], uW: [107], [113], [119], [120], [124], [125], [127], [132], [150], [154], [155], uE: [182] Ranking d based features 30 U,W,E [28], [33], [42], [49]- [52], [56], [58], [66], [76], [81], [88], uE: [182], [186] uW: [94], [100], [102], [105], [107], [111], [113], [114], [120], [122], [126], [131], [132], [154], [155] Hostname based ...
Preprint
Phishing and spear-phishing are typical examples of masquerade attacks since trust is built up through impersonation for the attack to succeed. Given the prevalence of these attacks, considerable research has been conducted on these problems along multiple dimensions. We reexamine the existing research on phishing and spear-phishing from the perspective of the unique needs of the security domain, which we call security challenges: real-time detection, active attacker, dataset quality and base-rate fallacy. We explain these challenges and then survey the existing phishing/spear phishing solutions in their light. This viewpoint consolidates the literature and illuminates several opportunities for improving existing solutions. We organize the existing literature based on detection techniques for different attack vectors (e.g., URLs, websites, emails) along with studies on user awareness. For detection techniques, we examine properties of the dataset, feature extraction, detection algorithms used, and performance evaluation metrics. This work can help guide the development of more effective defenses for phishing, spear-phishing, and email masquerade attacks of the future, as well as provide a framework for a thorough evaluation and comparison.
... Lee et al. [19] exploit a linear chain CRF model to study users' web browsing behaviors faced phishing situations and then make behavioral prediction for context-aware phishing detection. Experiments were made to show good performance for prediction and blocking of phishing threats from user behaviors. ...
... complexity score of the P , C(P );18 compute the complexity score of page L, C(L);19 compute the match score of page L and P , M (P, L);20 compute the similarity score between page L and P , ...
Article
Full-text available
Social networks have become one of the most popular platforms for users to interact with each other. Given the huge amount of sensitive data available in social network platforms, user privacy protection on social networks has become one of the most urgent research issues. As a traditional information stealing technique, phishing attacks still work in their way to cause a lot of privacy violation incidents. In a web-based phishing attack, an attacker sets up scam web pages (pretending to be an important web site such as a social network portal) to lure users to input their private information, such as passwords, social security numbers, and credit card numbers, etc. In fact, the appearance of web pages is among the most important factors in deceiving users, and thus the similarity among web pages is a critical metric for detecting phishing web sites. In this paper, we present a new solution, called Phishing-Alarm, to detect phishing attacks using features that are hard to evade by attackers. In particular, we present an algorithm to quantify the suspiciousness ratings of web pages based on similarity of visual appearance between the web pages. Since Cascading Style Sheets (CSS) is the technique to specify page layout across browser implementations, our approach uses CSS as the basis to accurately quantify the visual similarity of each page element. As page elements do not have the same influence to pages, we base our rating method on weighted page-component similarity. We prototyped our approach in the Google Chrome browser. Our large-scale evaluation using real-world websites shows the effectiveness of our approach. The proof of concept implementation verifies the correctness and accuracy of our approach with a relatively low performance overhead.
... Several techniques have been developed by considering user behavioral (Lee, et.al [10], Rao & Pais [11], Rao & Pais [12]). Zhang and Gupta [13] state in their paper that user behavior in phishing attacks influenced by user attributes and online experience. ...
Article
Full-text available
A phishing attack remains popular among security attacks, which has specific characteristics. The attacks have an impact on the economy and involve user behaviour that makes the attack successful. Many solution framework and program has developed but fails for eliminating the attack except minimizing the impact only. Mitigating and cope processes for reducing phishing attacks involve user education and training to raise higher user awareness. The mitigating process itself has two main steps that are prevention and detection. Many algorithms employ machine learning to overthrow the attack intelligently. The remaining problem of this process is detection time is slower than using blacklist filtering. The paper review social and technical aspects to conclude and identify the potential solution that will be proposed for the next stage of the research. The potential solution will include future data analytics such as an immersive and augmented analytic regardless of computing issues. An immersive and augmented analytics should have the ability to learn from the past data of user and attack behaviour to direct the system and user in combating the attacks. The learning result should give preventive suggestions and evaluate user awareness level.
... It trains these features using SVM (Support vector machine) to detect phishing attacks. Abu-Nimeh et al. [2] compared six machine learning algorithms for phishing detection, including Bayesian Additive Regression Trees, Logical Regression, SVM, RF, Neural Network, and Regression Tree. Lee et al. [13] leverages a linear chain CRF model to understand web browsing behaviors of users on phishing web sites. It then predicts behavior under the context to detect phishing attacks. ...
Article
Full-text available
Phishing websites are typical starting points of online social engineering attacks, including many recent online scams. The attackers develop web pages mimicking legitimate websites, and send the malicious URLs to victims to lure them to input their sensitive information. Existing phishing defense mechanisms are not sufficient to detect with new phishing attacks. In this paper, we aim to improve phishing detection techniques using machine learning techniques. In particular, we propose a learning-based aggregation analysis mechanism to decide page layout similarity, which is used to detect phishing pages. Our experiment results shows that our approach is accurate and effective in detecting phishing pages.
... Phishing is metaphorically similar to fishing in the water, but instead of trying to catch a fish, attackers try to steal consumer's personal information [10,11]. When a user opens a fake webpage and enters the username and protected password, the credentials of the user are acquired by the attacker which can be used for malicious purposes [12][13][14][15][16][17][18][19][20][21][22]. Phishing websites look very similar in appearance to their corresponding legitimate websites to attract large number of Internet users. ...
Article
Full-text available
Phishing is one of the major problems faced by cyber-world and leads to financial losses for both industries and individuals. Detection of phishing attack with high accuracy has always been a challenging issue. At present, visual similarities based techniques are very useful for detecting phishing websites efficiently. Phishing website looks very similar in appearance to its corresponding legitimate website to deceive users into believing that they are browsing the correct website. Visual similarity based phishing detection techniques utilise the feature set like text content, text format, HTML tags, Cascading Style Sheet (CSS), image, and so forth, to make the decision. These approaches compare the suspicious website with the corresponding legitimate website by using various features and if the similarity is greater than the predefined threshold value then it is declared phishing. This paper presents a comprehensive analysis of phishing attacks, their exploitation, some of the recent visual similarity based approaches for phishing detection, and its comparative study. Our survey provides a better understanding of the problem, current solution space, and scope of future research to deal with phishing attacks efficiently using visual similarity based approaches.
... Mis-classifications affect the perceived reliability of the service and users are likely to be quite intolerant to " losing " legitimate messages. Techniques to detect phishing websites include blacklists, machine learning (Whittaker et al., 2010), URL feature classification and domain name analysis, visual similarity assessment (Fu et al., 2006), contextual analysis and user behavioural prediction (Lee et al., 2014), and crowdsourcing (OpenDNS, 2014). Some blacklists, such as Google's (Whittaker et al., 2010), use automated machine learning. ...
Article
We have conducted a user study to assess whether improved browser security indicators and increased awareness of phishing have led to users' improved ability to protect themselves against such attacks. Participants were shown a series of websites and asked to identify the phishing websites. We use eye tracking to obtain objective quantitative data on which visual cues draw users' attention as they determine the legitimacy of websites. Our results show that users successfully detected only 53% of phishing websites even when primed to identify them and that they generally spend very little time gazing at security indicators compared to website content when making assessments. However, we found that gaze time on browser chrome elements does correlate to increased ability to detect phishing. Interestingly, users' general technical proficiency does not correlate with improved detection scores.
... The access contexts in which users fall into phishing situations have been explored from behavioral perspective [7]. Users' browsing behaviors that confront phishing dangers are studied for context-aware phishing detection [8]. ...
Conference Paper
This study explores the existing blacklists to discover suspected URLs that refer to on-the-fly phishing threats in real time. We propose a PhishTrack framework that includes redirection tracking and form tracking components to update the phishing blacklists. It actively finds phishing URLs as early as possible. Experimental results show that our proactive phishing update method is an effective and efficient approach for improving the coverage of the blacklists. In practice, our solution is complementary to the existing anti-phishing techniques for providing secured web surfing.
Chapter
Full-text available
The electroencephalogram is a test that is used to keep track on the brain activity. These signals are generally used in clinical areas to identify various brain activities that happen during specific tasks and to design brain–machine interfaces to help in prosthesis, orthosis, exoskeletons, etc. One of the tedious tasks in designing a brain–machine interface application is based on processing of EEG signals acquainted from real-time environment. The complexity arises due to the fact that the signals are noisy, non-stationary, and high-dimensional in nature. So, building a robust BMI is based on the efficient processing of these signals. Optimal selection of features from the signals and the classifiers used plays a vital role in building efficient devices. This paper concentrates on surveying the recent feature selection, feature extraction, and classification algorithms used in various applications for the development of BMI.KeywordsEEGProsthesisOrthosisExoskeletons
Article
Full-text available
Abstract The web technology has become the cornerstone of a wide range of platforms, such as mobile services and smart Internet-of-things (IoT) systems. In such platforms, users’ data are aggregated to a cloud-based platform, where web applications are used as a key interface to access and configure user data. Securing the web interface requires solutions to deal with threats from both technical vulnerabilities and social factors. Phishing attacks are one of the most commonly exploited vectors in social engineering attacks. The attackers use web pages visually mimicking legitimate web sites, such as banking and government services, to collect users’ sensitive information. Existing phishing defense mechanisms based on URLs or page contents are often evaded by attackers. Recent research has demonstrated that visual layout similarity can be used as a robust basis to detect phishing attacks. In particular, features extracted from CSS layout files can be used to measure page similarity. However, it needs human expertise in specifying how to measure page similarity based on such features. In this paper, we aim to enable automated page-layout-based phishing detection techniques using machine learning techniques. We propose a learning-based aggregation analysis mechanism to decide page layout similarity, which is used to detect phishing pages. We prototype our solution and evaluate four popular machine learning classifiers on their accuracy and the factors affecting their results.
Conference Paper
In a web-based phishing attack, an attacker sets up scam web pages to deceive users to input their sensitive information. The appearance of web pages plays an important role in deceiving users, and thus is a critical metric for detecting phishing web sites. In this paper, we propose a robust phishing page detection mechanism based on web pages’ visual similarity. To measure the similarity of the suspicious pages and victim pages accurately, we extract features from the Cascading Style Sheet (CSS) of web pages, and select the effective feature sets for similarity rating. We prototyped our approach in the Google Chrome browser and used it to analyze suspicious web pages. The proof of concept implementation verifies the effectiveness of our algorithm with a low performance overhead.
Conference Paper
This paper studies the feasibility of an early warning system that prevents users from the dangerous situations they may fall into during web surfing. Our approach adopts behavioral Hidden Markov Models to explore collective intelligence embedded in users' browsing behaviors for context-aware category prediction, and applies the results to web security threat prevention. Large-scale experiments show that our proposed method performs accuracy 0.463 for predicting the fine-grained categories of users' next accesses. In real-life filtering simulations, our method can achieve macro-averaging blocking rate 0.4293 to find web security threats that cannot be detected by the existing security protection solutions at the early stage, while accomplishes a low macro-averaging over-blocking rate 0.0005 with the passage of time. In addition, behavioral HMM is able to alert users for avoiding security threats by 8.4 hours earlier than the current URL filtering engine does. Our simulations show that the shortening of this lag time is critical to avoid severe diffusions of security threats.
Conference Paper
Phishing is a significant problem involving fraudul ent email and web sites that trick unsuspecting users into reveal ing private information. In this paper, we present the design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF i nformation retrieval algorithm. We also discuss the design and evaluation of several heuristics we developed to reduce false pos itives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phis hing sites.
Conference Paper
Phishing is a significant security threat to the Internet, which causes tremendous economic loss every year. In this pa- per, we proposed a novel hybrid phish detection method based on information extraction (IE) and information re- trieval (IR) techniques. The identity-based component of our method detects phishing webpages by directly discover- ing the inconsistency between their identity and the identity they are imitating. The keywords-retrieval component uti- lizes IR algorithms exploiting the power of search engines to identify phish. Our method requires no training data, no prior knowledge of phishing signatures and specific imple- mentations, and thus is able to adapt quickly to constantly appearing new phishing patterns. Comprehensive experi- ments over a diverse spectrum of data sources with 11449 pages show that both components have a low false positive rate and the stacked approach achieves a true positive rate of 90.06% with a false positive rate of 1.95%.