ArticlePublisher preview available

Secure Short URL Generation Method that Recognizes Risk of Target URL

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

All the information and data on the Internet are connected based on URL. Although many people use URL to share and convey the information, it is difficult to transmit the information when URL is long and special characters are mixed. Short URL service is a service that transforms long URL with information into short form of URL and conveys the information, which makes it possible to access the page with necessary information. Recently, attackers who want to distribute the malicious code abuse the short URL through SMS or SNS to distribute malicious codes. With the short URL information, as it is difficult to predict the original URL, it has the vulnerability to Phishing attacks. In this study, a method is proposed, which writes the destination information when generating a short URL so that a user is able to check whether the destination is a web document or a file. The service provider of short URL monitors the risk of target URL page of the generated short URL and decides whether to provide service. By monitoring the modification of web-document, it measures and evaluates the risk of the webpage and decides whether to block the short URL according to the threshold, which prevents attacks such as “drive by download” through the short URL.
This content is subject to copyright. Terms and conditions apply.
Secure Short URL Generation Method that Recognizes
Risk of Target URL
Hyung-Jin Mun
1
Yongzhen Li
2
Published online: 24 December 2016
Springer Science+Business Media New York 2016
Abstract All the information and data on the Internet are connected based on URL.
Although many people use URL to share and convey the information, it is difficult to
transmit the information when URL is long and special characters are mixed. Short URL
service is a service that transforms long URL with information into short form of URL and
conveys the information, which makes it possible to access the page with necessary
information. Recently, attackers who want to distribute the malicious code abuse the short
URL through SMS or SNS to distribute malicious codes. With the short URL information,
as it is difficult to predict the original URL, it has the vulnerability to Phishing attacks. In
this study, a method is proposed, which writes the destination information when generating
a short URL so that a user is able to check whether the destination is a web document or a
file. The service provider of short URL monitors the risk of target URL page of the
generated short URL and decides whether to provide service. By monitoring the modifi-
cation of web-document, it measures and evaluates the risk of the webpage and decides
whether to block the short URL according to the threshold, which prevents attacks such as
‘drive by download’’ through the short URL.
Keywords Short URL Short link URL shortening Security Malware Phishing
Drive by download Smishing Shortener service
1 Introduction
Because of SNS activation from the advancement of ICT, in the process of transmission of
messages, the necessity to convey various information and data on the Internet happens to
arise. Diverse techniques about methods to transmit the information and data on the Internet
&Yongzhen Li
lyz2008@ybu.edu.cn
1
Division of Information and Communication, Baekseok University, Cheonan 31065, Korea
2
Department of Computer Science and Technology, Yanbian University, Yanji 133002, China
123
Wireless Pers Commun (2017) 93:269–283
DOI 10.1007/s11277-016-3866-8
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Malicious URLs contains vulnerabilities and poses a significant threat to the computer. This malicious website threat has become an important rising issue [17,18,19]. Many studies have been proposed for analyzing and detecting malicious URLs. ...
... Many types of research concerned about the risk and impact on their computer when surfing website. In [18] proposed risk assessment to monitor the risk on URL by using the destination information when generating a short URL. By monitoring URL, any risky URL or risk over the threshold will be blocked to prevent malicious attack [19], especially from drive-by download through the short URL. ...
Article
Full-text available
span>The openness of the World Wide Web (Web) has become more exposed to cyber-attacks. An attacker performs the cyber-attacks on Web using malware Uniform Resource Locators (URLs) since it widely used by internet users. Therefore, a significant approach is required to detect malicious URLs and identify their nature attack. This study aims to assess the efficiency of the machine learning approach to detect and identify malicious URLs. In this study, we applied features optimization approaches by using a bio-inspired algorithm for selecting significant URL features which able to detect malicious URLs applications. By using machine learning approach with static analysis technique is used for detecting malicious URLs applications. Based on this combination as well as significant features, this paper shows promising results with higher detection accuracy. The bio-inspired algorithm: particle swarm optimization (PSO) is used to optimized URLs features. In detecting malicious URLs, it shows that naïve Bayes and support vector machine (SVM) are able to achieve high detection accuracy with rate value of 99%, using URL as a feature.</span
... Therefore, most of the attacker sends a short URL to victims, and it is difficult to verify which file or webpage the short URL interfaces to users. Therefore, [35] proposed a method that composes the destination information of the short URL. Furthermore, the method analyzed the webpage and measured the risk of the webpage and blocked the short URL by comparing with predefined threshold. ...
Article
Full-text available
The SMS phishing is another method where the phisher operates the SMS as a medium to communicate with the victims and this method is identified as smishing (SMS + phishing). Researchers promoted several anti-phishing methods where the correlation algorithm is applied to explore the relevancy of the features since there are numerous features in the features corpus. The correlation algorithm assesses the rank of the features that is the highest rank leads to the more relevant to the appropriate assignment. Therefore, this paper analyses four rank correlation algorithms particularly Pearson rank correlation, Spearman’s rank correlation, Kendall rank correlation, and Point biserial rank correlation with a machine-learning algorithm to determine the best features set for detecting Smishing messages. The result of the investigation reveals that the AdaBoost classifier offered better accuracy. Further analysis shows that the classifier with the ranking algorithm that is Kendall rank correlation appeared superior accuracy than the other correlation algorithms. The inferred of this experiment confirms that the ranking algorithm was able to reduce the dimension of features with 61.53% and presented an accuracy of 98.40%.
... It is exceptionally difficult to verify whether the short URL denotes to a file or a web document. Therefore, a model was proposed by Mun et al. (2017) using the destination information of the short URL which assists the user to verify whether the destination is a web document or a file. Furthermore, the method analyzes the webpage and measures the risk of the webpage and blocked the short URL by comparing with the threshold. ...
Thesis
Phishing is a dominant threat in cyberspace where the attacker collects sensitive information from the victims by designing a fake site that appears highly similar to a legitimate website. The aim of the phishing attack is to steal sensitive credentials such as passwords, social security numbers. Recently, the phishing attacks have been expanding exponentially and target all categories of web users, which lead to a huge amount of financial loss to consumers and businesses. Although several studies have proposed a wide number of anti-phishing models to lessen the phishing attacks, there are still many unsolved research problems in the phishing domain. The attacker primarily employs three steps to trap the victims into the phishing:(a) utilize a communication channel for distributing the phishing link; (b) setup a fake URL for addressing the phishing website; (c) launch a fake site for stealing the credentials from the victims. Hence, the phishing scam could be detected in any of the aforementioned steps. This thesis classifies the anti-phishing models into four groups: communication channel-based model, URL-based model, content-based model and hybrid (URL and content) model. The objective of this thesis is to develop anti-phishing models at each of these steps so that the users get completely shielded. In each of these steps multiple filters are used making the phishing detection efficient. At the communication layer, this thesis proposes two models to stop phishing: text message-based model and email-based model. Text message on the mobile phones is one kind of communication channel where the attacker sends the phishing message(SMS phishing :smishing) to the victims to acquire the credentials. For this purpose, a text-message based model entitled SmiDCA is proposed to verify the phishing message. This model extracts 39 features from the text messages and adopts four well-known machine learning algorithms to classify a text message as a phishing or genuine message. Furthermore, a feature selection algorithm is employed to find the best feature set to detect smishing messages efficiently. Many attackers prefer email for distributing the phishing link as legitimate organizations consider emails as an official communication channel. Although several models have attempted to find the best feature set for detecting phishing emails, they suffer from the time delay in searching the best feature set. This thesis proposes a novel feature selection algorithm BSFS(Binary Search Feature Selection) which determines the best feature set with minimum time compared to the SFFS(Sequential Forward Feature Selection) algorithm. In some scenarios, the attacker applies other communication channels such as social networking sites, blogs or employs novel features to escape the boundaries of anti-phishing models which indicates that the user clicks the phishing links. In the URL layer, this thesis has proposed a model to detect phishing. A prominent aspect that the existing models have not taken into account concerning phishing URLs is the phoneme-based phishing domain. Hence, this thesis has addressed this novel category of phishing attacks and has developed a model entitled MMSPhiD which verifies the phoneme-based phishing URL and alerts the users. As the phoneme based phishing URL sounds very similar to a legitimate URL, it would be near impossible for users with visual impairments to detect this attack, who use a screen reader to access the computer. This model has incorporated accessibility as one of its objectives. Although many features are available in phishing URL detection models, some attackers employ genuine features to overtake the barriers of anti-phishing. This category of phishing attacks primarily concentrates on the contents of the website. The aurally similar and visual dissimilar website is one of the critical phishing attacks to fool especially persons with visual impairments. Thus, another content-based model aphid is proposed in this thesis which identifies phishing websites using aurally similar but visually dissimilar metrics. According to the literature survey, this would be the first anti-phishing model developed which incorporates the needs of persons with visual impairments. The one restriction of the aforementioned models is the usage of a single step to detect a phishing attack. As a result, the attacker attempts to defraud victims either by manipulating the URL or contents of the website. Hence, this thesis proposes another model entitled PhiDMA which merges the URL and contents of the website using a multi-layer approach. The novelty of this model is an accessibility based layer which harnesses the accessibility errors of the web pages in detecting the phishing scam. All the proposed models are validated using the experiments. The first model SmiDCA was able to achieve an accuracy of 96.16% even after half of the features were pruned. The Email-based model provided a better mechanism to explore the best feature set with the least time delay and minimal features to detect phishing emails. The MMSphiD model achieved an accuracy of 99.03% and has the potential to handle multiple attacks. The PhiDMA model produced the accuracy of 92.72% which performed better in comparison to URL based approach(87.18%) and search engine based approach(91.45%). The anti-phishing approaches proposed in this thesis focus on two components: a) phishing detection at various layers and b) making it accessible to persons with visual impairments. This research is a small step towards evolving anti-phishing models that incorporates the needs of persons with visual impairments as well.
... The short URL is one of the novels phishing or smishing attack where users are unable to perceive the features of the linked information or data, and it is exceptionally hard to verify which file or web page the short URL interfaces to users. A novel method was proposed [29] which composes the destination information when generating a short URL so that the user can verify whether the destination is a web page or a file. On analyzing the web page, the method measures and evaluates the risk of the web page and decides whether to block the short URL as per threshold, which prevents attacks. ...
Article
Phishing has become a serious cyber-security issue, and it is spreading through various media such as e-mail, SMS to capture the victim’s critical profile information. Although many novel anti-phishing techniques have been developed to forestall the progress of phishing, it remains an unresolved issue. Smishing is an incarnation of Phishing attack, which utilizes Short Messaging Service (SMS) or simple text message on mobile phones to lure the victim’s online credentials. This paper presents an anti-phishing model entitled ‘SmiDCA’ (SMIshing Detection based on Correlation Algorithm). The proposed model has collected different smishing messages from various sources, and 39 distinct features were extracted initially. The SmiDCA model incorporates dimensionality reduction, and machine Learning-based experiments were conducted on without (BFSA) and with (AFSA) reduction of features. The model has been validated with experiments on both the English and non-English datasets and the results of both of these experiments are encouraging in terms of accuracy: 96.40% for English dataset and 90.33% for the non-English dataset. In addition, the model achieved an accuracy of 96.16% even after nearly half of the features were pruned.
... The knowledge-based CES-D matrix is constructed based on highly reliable data determined from context information within the smart health system. The paper by Mun and Li [18] introduces a secure short URL-generation method that recognizes the risk posed by the target URL. The service provider of a short URL monitors the risk of the target URL page for the generated short URL and decides whether to provide the service. ...
Chapter
Full-text available
The second author’s affiliation was wrong in the original version. The correct affiliation is given below: Jae Dong Lee Department of Computer Science and Engineering, Seoul National University of Science and Technology, Korea
Article
Full-text available
A Trojan malicious code is one of largest malicious codes and has been known as a virus that causes damage to a system as itself. However, it has been changed as a type that picks user information out stealthily through a backdoor method, and worms or viruses, which represent a characteristic of the Trojan malicious code, have recently been increased. Although several modeling methods for analyzing the diffusion characteristics of worms have proposed, it allows a macroscopic analysis only and shows limitations in estimating specific viruses and malicious codes. Thus, in this study an EMP model that can estimate future occurrences of Trojan malicious codes using the previous Trojan data is proposed. It is verified that the estimated value obtained using the proposed model is similar to the existing actual frequency in causes of the comparison between the obtained value and the result obtained by the Markov chain.
Article
Full-text available
To protect stored personal information, many organizations and information systems adopt the role-based access control model (RBAC) or the mandatory access control model (MAC). Although individuals want to control their personal information, an individual-needs-based access control system is difficult to adopt in the existing environment. Recent proposals have included privacy-enhancing technologies such as communication anonymizers, shared bogus online accounts, and access to personal data. However, these systems cannot satisfy users’ privacy requirements. In this paper we propose two confidential access control models that apply individually established policy to existing RBAC and MAC technologies. In the SpRBAC model, a user’s right to access would follow organizational policy and accessing personal information would be restricted by subject policy. In the SpMAC model, users would have to satisfy the subject policy established by the provider of information in addition to the requirements of normal MAC policy. In the proposed models, it is possible to restrict access by authorized users according to the subject policy, that is, the policy defined by the subject (or informant—the one providing the personal information), and personal information can thus be protected.
Article
Full-text available
In the past about 10 different kinds of malicious code were found in one day on the average. However, the number of malicious codes that are found has rapidly increased reachingover 55,000 during the last 10 year. A large number of malicious codes, however, are not new kinds of malicious codes but most of them are new variants of the existing malicious codes as same functions are newly added into the existing malicious codes, or the existing malicious codes are modified to evade anti-virus detection. To deal with a lot of malicious codes including new malicious codes and variants of the existing malicious codes, we need to compare the malicious codes in the past and the similarity and classify the new malicious codes and the variants of the existing malicious codes. A former calculation method of the similarity on the existing malicious codes compare external factors of IPs, URLs, API, Strings, etc or source code levels. The former calculation method of the similarity takes time due to the number of malicious codes and comparable factors on the increase, and it leads to employing fuzzy hashing to reduce the amount of calculation. The existing fuzzy hashing, however, has some limitations, and it causes come problems to the former calculation of the similarity. Therefore, this research paper has suggested a new comparison method for malicious codes to improve performance of the calculation of the similarity using fuzzy hashing and also a classification method employing the new comparison method.
Conference Paper
Full-text available
Drive-by download attacks where web browsers are subverted by malicious content delivered by web servers have become a common attack vector in recent years. Several methods for the detection of malicious content on web pages using data mining techniques to classify web pages as malicious or benign have been proposed in the literature. However, each proposed method uses different content features in order to do the classification and there is a lack of a high-level frameworks for comparing these methods based upon their choice of detection features. The lack of a framework makes it problematic to develop experiments to compare the effectiveness of methods based upon different selections of features. This paper presents such a framework derived from an analysis of of drive-by download attacks that focus upon potential state changes seen when Internet browsers render HTML documents. This framework can be used to identify potential features that have not yet been exploited and to reason about the challenges for using those features in detection drive-by download attack.
Article
Shorten URL service is the method of using short URL instead of long URL, it redirect short url to long URL. While the users of microblog increased rapidly, as the creating and usage of shorten URL is convenient, shorten url became common under the limited length of writing on microblog. E-mail, SMS and books use shorten URL well, because of its simplicity. But, there is no relativeness between the most of shorten URLs and their target URLs, user can not expect the target URL. To cover this problem, there is attempts such as changing the shorten URL service name, inserting the information of website into shorten URL, and the usage of shortcode of physical address. However, each ones has the limits, so these are the trouble of automation, relatively long address, and the narrowness of applicable targets. SHRT is complementary to the attempts, as getting the idea from the writing system of Arabic. Though the writing system of Arabic has no vowel alphabet, Arabs have no difficult to understand their writing. This paper proposes SHRT, new method of URL Shortening. SHRT makes user guess the target URL using Relative word of the lowest domain of target URL without vowels.
Article
Recently the paradigm of cyber attacks is changing due to the information security technology improvement. The cyber attack that uses the social engineering and targets the end users has been increasing as the organization's systems and networks security controls have been tightened. The 91% of APT(Advanced Persistent Threat) which targets an enterprise or a government agency to get the important data and disable the critical service starts with the spear phishing email. In this paper, we analysed the security threats and characteristics of the spear phishing in detail and explained why the technical solutions are not enough to prevent spear phishing attacks. Therefore, we proposed the administrative prevention methods for the spear phishing attack.
Article
People-centric sensing (PCS) is an emerging paradigm of sensor network which turns daily used mobile devices (such as smartphones and PDAs) to sensors. It is promising but faces severe security problems. As smartphones are already and will keep up to be attractive targets to attackers, even more, with strong connectivity and homogeneous applications, all mobile devices in PCS will risk being infected by malware more rapidly. Even worse, attackers usually obfuscate their malwares in order to avoid simple (syntactic signature based) detection. Thus, more intelligent (behavioral signature based) detection is needed. But in the field of network security, the state-of-the-art behavioral signature—behavior graph—is too complicated to be used in mobile devices. This paper proposes a novel behavioral signature generation system—SimBehavior—to generate lightweight behavioral signature for malware detection in PCS. Generated lightweight behavioral signature is a bit like regex (regular expression) rules. And thus, unlike malware detection using behavior graph is NP-Complete, using our lightweight behavioral signature is efficient and very suitable for malware detection in PCS. Our experimental results show that SimBehavior can extract behavioral signatures effectively, and generated lightweight behavioral signatures can be used to detect new malware samples in PCS efficiently and effectively.
Article
URL shortener services today have come to play an important role in our social media landscape. They direct user attention and disseminate information in online social media such as Twitter or Facebook. Shortener services typically provide short URLs in exchange for long URLs. These short URLs can then be shared and diffused by users via online social media, e-mail or other forms of electronic communication. When another user clicks on the shortened URL, she will be redirected to the underlying long URL. Shortened URLs can serve many legitimate purposes, such as click tracking, but can also serve illicit behavior such as fraud, deceit and spam. Although usage of URL shortener services today is ubiquituous, our research community knows little about how exactly these services are used and what purposes they serve. In this paper, we study usage logs of a URL shortener service that has been operated by our group for more than a year. We expose the extent of spamming taking place in our logs, and provide first insights into the planetary-scale of this problem. Our results are relevant for researchers and engineers interested in understanding the emerging phenomenon and dangers of spamming via URL shortener services.
Article
As the well-developed civilian wireless network infrastructure was built and the mobile devices becoming increasingly widespread, many mobile police information systems were established to provide information for police anywhere and anytime. The (SMS) Short Message Service is a widely used communication means in these applications. But lacking of security for SMS makes it unsuitable for transmitting these confidential data. In this paper, we discuss the security requirements of the mobile police information system firstly, and then propose a security strategy for mobile police information system to transmit information confidentially over the public mobile network using SMS as a bearer, and describe the authentication and communication protocol in detail. Lastly a performance comparison of the mobile police information system working in secure mode with that working in ordinary mode is given.