Article

Short links under attack: Geographical analysis of spam in a URL shortener network

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

URL shortener services today have come to play an important role in our social media landscape. They direct user attention and disseminate information in online social media such as Twitter or Facebook. Shortener services typically provide short URLs in exchange for long URLs. These short URLs can then be shared and diffused by users via online social media, e-mail or other forms of electronic communication. When another user clicks on the shortened URL, she will be redirected to the underlying long URL. Shortened URLs can serve many legitimate purposes, such as click tracking, but can also serve illicit behavior such as fraud, deceit and spam. Although usage of URL shortener services today is ubiquituous, our research community knows little about how exactly these services are used and what purposes they serve. In this paper, we study usage logs of a URL shortener service that has been operated by our group for more than a year. We expose the extent of spamming taking place in our logs, and provide first insights into the planetary-scale of this problem. Our results are relevant for researchers and engineers interested in understanding the emerging phenomenon and dangers of spamming via URL shortener services.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These malicious links can be: i) spam -irrelevant messages sent to large number of people online, ii) scam -online fraud to mislead people, iii) phishing -online fraud to get user credentials, or iv) malware -auto downloadable content to damage the system. 2 Link obfuscation makes short URL spam more difficult to detect than traditional long URL spam. Malicious long URLs can be detected with a direct domain lookup or a simple blacklist check, while short URLs can easily escape such technique. ...
... According to a threat activity report by Symantec in year 2010 [20], around 65% malicious URLs on OSM were shortened URLs. Another study in 2012 investigated a particular URL shortening service (qr.cx) and revealed that around 80% of shortened URLs from this service contained spam-related content [2]. Research by a URL shortener yi.tl reveals that because of deep penetration of spam, 614 out of 1,002 URL shortening services became non-functional in year 2012. ...
... Their results show that most of the tweets containing phishing URLs comes from inorganic (automated) accounts. Later in year 2012, Klien et al. presented the global usage pattern of short URLs by setting up their own URL shortening service and found 80% short URL content to be spam related [2]. In year 2013, Maggi et al. performed a large scale study on 25 million short URLs belonging to 622 distinct URL shortening services [4]. ...
Article
Full-text available
Existence of spam URLs over emails and Online Social Media (OSM) has become a massive e-crime. To counter the dissemination of long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening has gained a lot of traction. URL shorteners take as input a long URL and output a short URL with the same landing page (as in the long URL) in return. With their immense popularity over time, URL shorteners have become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service among all shortening services is being exploited heavily to carry out phishing attacks, work-from-home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset of 763,160 short URLs marked suspicious by Bitly in the month of October 2013. Our results reveal that Bitly is not using its claimed spam detection services very effectively. We also show how a suspicious Bitly account goes unnoticed despite of a prolonged recurrent illegitimate activity. Bitly displays a warning page on identification of suspicious links, but we observed this approach to be weak in controlling the overall propagation of spam. We also identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious or benign and achieved an accuracy of 86.41%. The feature set identified can be generalized to other URL shortening services as well. To the best of our knowledge, this is the first large scale study to highlight the issues with the implementation of Bitly's spam detection policies and proposing suitable countermeasures.
... In year 2012, one major attack happened in which the U.S. federal government's official short link service usa.gov (in collaboration with Bitly) was hijacked to spread work from home scam. 2 Such attacks which targets seemingly secure and highly trusted web sources look alarming and also bring to light the massive impact of exploiting the shortening services. All this imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. ...
... According to a threat activity report by Symantec in year 2010 [20], around 65% malicious URLs on OSM are shortened URLs. Another study in year 2012 reveals that around 80% of shortened URLs contained spam-related content [2]. ...
... These third party services like TweetDeck API, Twitterfeed, Tweetbot, etc. provides a single interface for users to shorten and share the links on multiple OSM. 2 Since we collect our dataset using Twitter, we focus on only Twitter based applications. Twitterfeed is a service to feed content to Twitter. ...
Article
Full-text available
Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best of our knowledge, this is the first large scale study to highlight the issues with Bitly's spam detection policies and proposing a suitable countermeasure.
... Some of the Microblogging sites like Twitter limit the number of characters up to 40 for every tweet. To reduce the content into a limited view, various software and algorithms are used by preprocessing the contents [29]. ...
... Some of the Microblogging sites like Twitter limit the number of characters up to 40 for every tweet. To reduce the content into a limited view, various software and algorithms are used for preprocessing the contents [29]. Shortened URLs hide the malicious link that spread spam messages and the original meaning of the content. ...
Chapter
In recent years, spammer has imbued every social network platform. The growth in users in social platforms like Instagram, Facebook, Sina Weibo, Twitter, YouTube, etc. has unfolded new approaches for spammers. To get the financial benefit, spammers exploit social network platforms using flooded content shared by the users. In order to protect user information and accounts from spammers, many researchers have proposed defensive mechanisms and detection methods. Over the last few years, researchers develop many spam detection methods which improve the performance of user’s account. Therefore, we are motivated to point out few surveys about various methods by which the spammer can spread and some detection mechanisms to protect users and their account information. The solution for the detection of a spammer is different based on the collected dataset and predictive user analysis. This survey includes (1) Various detection mechanisms including the way of spreading spam. (2) Analysis of user contents based on their account information and posts including a comparative analysis of various methodologies for the detection of spammers. (3) Open issues and challenges in social network spam detection techniques.
... The most popular key segments are about URLs. The related works that drop in this category are listed as follows: Chu et al. (2010, Gao et al. (2010), Grier et al. (2010), Klien and Strohmaier (2012), Lee and Kim (2012), Ma et al. (2009), McGrath and Gupta (2008, Stringhini et al. (2010), Thomas et al. (2011), Whittaker et al., andZhang et al. (2012). In particular, the works Lee and Kim (2013) and Zhang et al. (2016) are published in the last three years. ...
... First, many URL based detection methods focus on shortened URLs. Since Twitter limits the number of characters in every tweet, URLs which are usually composed of many characters will be pre-processed by shortening services in order to accommodate more descriptive words in tweets (Chu et al., 2010;Grier et al., 2010;Klien and Strohmaier, 2012;McGrath and Gupta, 2008;Stringhini et al., 2010). Shortened URLs may obfuscate the direct lexical meanings and hide malicious links. ...
Article
Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have proposed many detection and defence methods in order to protect Twitter users from spamming activities. Particularly in the last three years, many innovative methods have been developed, which have greatly improved the detection accuracy and efficiency compared to those which were proposed three years ago. Therefore, we are motivated to work out a new survey about Twitter spam detection techniques. This survey includes three parts: 1) A literature review on the state-of-art: this part provides detailed analysis (e.g. taxonomies and biases on feature selection) and discussion (e.g. pros and cons on each typical method); 2) Comparative studies: we will compare the performance of various typical methods on a universal testbed (i.e. same datasets and ground truths) to provide a quantitative understanding of current methods; 3) Open issues: the final part is to summarise the unsolved challenges in current Twitter spam detection techniques. Solutions to these open issues are of great significance to both academia and industries. Readers of this survey may include those who do or do not have expertise in this area and those who are looking for deep understanding of this field in order to develop new methods.
... This is one who threatens to challenge the development of the younger creation in cybercrime and also leaving permanent scar and damage on the younger generation, if can't restricted. Clustering can be measured as the most important unconfirmed learning problem [14]. ...
Article
Full-text available
Cyber law is a term that refers to all legal and with its regulatory portion of World Wide Web and the internet. Since Cybercrime is a newly specialized field, growing in Cyber laws, a lot of development has to take place in terms of putting into place the relevant legal mechanism for controlling and preventing Cybercrime. Social Network is susceptible to malicious message containing URLs for spam, phishing and malware distribution. Mainly we are dealing with Twitter, where we find suspicious address by using application tool manually and from that manual coded tool, we are calling API. In this research, we propose Warning Tool, a suspicious address detection scheme for twitter. Our structure investigates correlation of URL which redirect the chains that are most regularly share the similar URLs. Here the developmental methods to determine correlated URL by using application tool. Phishing website detection can be said as new to the arena. Phishing websites are considered as one of the lethal weapon to embezzle one's personal information and use it for the cracker's benefits. The Detection course has been divided into two steps 1) Feature Extraction, where the ideal features are extracted to capture the nature of the file samples and the phishing websites. 2) Categorization, where exceptional techniques are used to automatically group the file samples and the websites into different classes.
... As with traditional email spam [40] as well as social spam [10], URLs play a key role, comprising 57% of all spam messages. Spammers also use URL shorteners, which are known to be a key mechanism for hiding spam URLs from users [25,42]. Since WhatsApp is phone-based, 1/4 of the spam messages include a phone number, possibly as a way to engage users outside of the platform. ...
Preprint
Full-text available
WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. We find that spam is often disseminated by groups of phone numbers, and that spam messages are generally shared for longer duration than non-spam messages. Finally, we devise content and activity based detection algorithms that can counter spam.
... Our perception depends on the latest short URLs that can be seen from astounding perspectives: I gathering URL shortening administrations over an expansive scope creep and ii) gathering messages from Twitter. The previous one offers an overall component on the utilization of short URLs, while the last gives an extra centered view around the protected utilization of shortening administrations by social orders [9]. Our exploration shows that the space and site acknowledgment from fast URLs shifts extraordinarily from the disseminations offered by very much distributed proposals alongside Alexa. ...
Article
Full-text available
In this paper, we endorse fake detection, a questionable Twitter URL identity system. Our system looks at URL links isolated from certain tweets. Given that aggressors have limited assets and usually reuse them, the URLs of their distracting chains often share similar URLs. We construct techniques to detect and evaluate their doubt by using the frequently exchanged URLs. We acquire numerous public Twitter tweets and build an observable classification model for them. Assessment results indicate that the classifiers identifies questionable URLs correctly and efficiently. Furthermore, we pose fake DETECTION information to circulate within the Twitter as a connected to ongoing mechanism to order dubious URLs.
... Spammers therefore can exploit the time gap between spreading unknown URLs and the time required for the blacklists to be updated [5]. This is achieved by creating URLs with no historical profile via URLshortening [6] and low-cost domain registration and hosting services [7]. The consequence to users is that they are exposed to links to malicious content in real-time, with spam campaigns achieving 80% of their spreading target within the first 24 hours [5]. ...
Conference Paper
The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall.
... Bilge et al. [46] report that 45% of users on a social media platform readily click on links posted by their "friends", even though they may not know that person in real life. Content-filtering approaches are not effective for Twitter since spammers tend to share shorten URLs in order to (1) overcome the character limitation defined by Twitter, and (2) manipulate spam filtering methods based on URL blacklisting [28,36,[47][48][49][50][51][52]. The major contributions of this paper are given as follows: The rest of the paper is structured as follows: Section 2 describes the background including features of Twitter and how Twitter deals with spam. ...
Article
Full-text available
Twitter is one of the most popular social media platforms that has 313 million monthly active users which post 500 million tweets per day. This popularity attracts the attention of spammers who use Twitter for their malicious aims such as phishing legitimate users or spreading malicious software and advertises through URLs shared within tweets, aggressively follow/unfollow legitimate users and hijack trending topics to attract their attention, propagating pornography. In August of 2014, Twitter revealed that 8.5% of its monthly active users which equals approximately 23 million users have automatically contacted their servers for regular updates. Thus, detecting and filtering spammers from legitimate users are mandatory in order to provide a spam-free environment in Twitter. In this paper, features of Twitter spam detection presented with discussing their effectiveness. Also, Twitter spam detection methods are categorized and discussed with their pros and cons. The outdated features of Twitter which are commonly used by Twitter spam detection approaches are highlighted. Some new features of Twitter which, to the best of our knowledge, have not been mentioned by any other works are also presented.
... A user would not know until they had followed it. The loss of semantic information has made shortening services a vector for phishing attacks (Chhabra et al., 2011;Klien and Strohmaier, 2012). Despite these shortcomings, shortening services have a number of benefits. ...
Article
Full-text available
Link-shortening services save space and make the manual entry of URLs less onerous. Short links are often included on printed materials so that people using mobile devices can quickly enter URLs. Although mobile transcription is a common use-case, link-shortening services generate output that is poorly suited to entry on mobile devices: links often contain numbers and capital letters that require time consuming mode switches on touch screen keyboards. With the aid of computational modeling, we identified problems with the output of a link-shortening service, bit.ly. Based on the results of this modeling, we hypothesized that longer links that are optimized for input on mobile keyboards would improve link entry speeds compared to shorter links that required keyboard mode switches. We conducted a human performance study that confirmed this hypothesis. Finally, we applied our method to a selection of different non-word mobile data-entry tasks. This work illustrates the need for service design to fit the constraints of the devices people use to consume services.
... The ability to disguise URL destination has made twitter or other social networks as an attractive target for the spammers. In the first study focusing on spam detection [3], we collect a number of users account. The users are considered as spammers by use of special methods and algorithms and to determine the false positive rate. ...
Article
Full-text available
Social Networks such as Twitter, Facebook play a remarkable growth in recent years. The ratio of tweets or messages in the form of URLs increases day by day. As the number of URL increases, the probability of fabrication also gets increased using their HTML content as well as by the usage of tiny URLs. It is important to classify the URLs by means of some modern techniques. Conditional redirection method is used here by which the URLs get classified and also the target page that the user needs is achieved. Learning methods also introduced to differentiate the URLs and there by the fabrication is not possible. Also the classifiers will efficiently detect the suspicious URLs using link analysis algorithm.
... Klien and Strohmaier [24] investigated the use of short URLs for fraud, deceit, and spam. Based on the logs of the URL shortener their group operated over several months, they concluded that about 80% of the URLs they shortened are spam-related. ...
Article
Modern cloud services are designed to encourage and support collaboration. To help users share links to online documents, maps, etc., several services, including cloud storage providers such as Microsoft OneDrive and mapping services such as Google Maps, directly integrate URL shorteners that convert long, unwieldy URLs into short URLs, consisting of a domain such as 1drv.ms or goo.gl and a short token. In this paper, we demonstrate that the space of 5- and 6-character tokens included in short URLs is so small that it can be scanned using brute-force search. Therefore, all online resources that were intended to be shared with a few trusted friends or collaborators are effectively public and can be accessed by anyone. This leads to serious security and privacy vulnerabilities. In the case of cloud storage, we focus on Microsoft OneDrive. We show how to use short-URL enumeration to discover and read shared content stored in the OneDrive cloud, including even files for which the user did not generate a short URL. 7% of the OneDrive accounts exposed in this fashion allow anyone to write into them. Since cloud-stored files are automatically copied into users' personal computers and devices, this is a vector for large-scale, automated malware injection. In the case of online maps, we show how short-URL enumeration reveals the directions that users shared with each other. For many individual users, this enables inference of their residential addresses, true identities, and extremely sensitive locations they visited that, if publicly revealed, would violate medical and financial privacy.
... On the other hand, the shortened URLs are normally made of a randomly generated meaningless string5. These are known to be vulnerable to cyber-attacks because short URLs work on redirecting the visitors to a previous long URL [20]. The destination may be a spam site or a site with malicious script code. ...
Article
Automatic identification is pervasive in many areas and its applicable areas are increasing gradually. 2D bar-code, NFC, and RFID technologies are representative examples of the automatic identification. This paper explains the implementation of mobile tourism application software on RFID technology. The mobile application provides the location and navigation information by combining the tag inventory and web database. The interactions among the user, application and database server are described in detail. This paper proposes a simple way of minimizing the efforts to build the entire system by storing the URLs for the tag and accessing existing tourism information services through the URLs.
... They discovered that most phishing on Twitter aims at stealing social network credentials rather than other services. Klien and Strohmaier in [14], performed a geographical analysis of the short URLs from the qr.cx service. Kandylas et al. [13] show that the quality of the landing pages pointed by bit.ly ...
Conference Paper
Full-text available
URL shortening services facilitate the need of exchanging long URLs using limited space, by creating compact URL aliases that redirect users to the original URLs when followed. Some of these services show advertisements (ads) to link-clicking users and pay a commission of their advertising earnings to link-shortening users. In this paper, we investigate the ecosystem of these increasingly popular ad-based URL shortening services. Even though traditional URL shortening services have been thoroughly investigated in previous research, we argue that, due to the monetary incentives and the presence of third-party advertising networks, ad-based URL shortening services and their users are exposed to more hazards than traditional shortening services. By analyzing the services themselves, the advertisers involved, and their users, we uncover a series of issues that are actively exploited by malicious advertisers and endanger the users. Moreover, next to documenting the ongoing abuse, we suggest a series of defense mechanisms that services and users can adopt to protect themselves.
... A (not ideal) solution for this are short links [23], but they require an additional shortening service, acting as a translator from long to short links and vice versa, causing performance and security issues (e.g., phishing [24], spam [25]). ...
Conference Paper
Full-text available
An important challenge for the Internet of Things is the gap between scientific environments and real life deployments. Smart objects need to be accessible and usable by ordinary users through familiar software and access technologies to facilitate any interaction and to increase their acceptance rate. This work deals with a seamless integration, discovery, and employment of smart objects into the Internet infrastructure under Human-to-Machine (H2M) communication aspects. We introduce an XMPP-based service provisioning sublayer for the IoT to integrate resource constrained devices seamlessly into the Internet by showing how XMPP can empower the collaboration between humans and smart objects. To meet the requirements of constrained devices, we propose to extend XMPP's publish-subscribe capabilities with a topic-based filter mechanism to effectively reduce the number of exchanged XMPP messages. We further present standardized bootstrapping and handling processes for smart objects that adapt automatically to infrastructure and ad hoc network environments and do not require predefined parameters or user interaction. The applicability of XMPP for constrained devices is further demonstrated with an XMPP client and mDNS/DNS-SD service for the Contiki operating system.
Article
Full-text available
Social Network is susceptible to malicious message containing URLs for spam, phishing, and malware distribution. Mainly we are dealing with Twitter, in that we are finding suspicious address by using application tool both by creating manually and also from that manual coded tool we are calling API. Conventional Spam detection schemes are ineffective against feature fabrications or consume abundant time and resources. In this paper, we propose WarningBird, a suspicious address detection system for twitter. Our system investigates correlations of URL redirect chains frequently share the same URLs. We develop methods to discover correlated URL by using application tool. We collect numerous tweets from the twitter public timeline and build a statistical classifier using them and also we are creating multiple accounts in that tool. We are finding suspicious URLs in that created account by analyzing URLs published in that account. Evaluation result shows that our classifier accurately and efficiently detects suspicious URLs. Also present our system work as a real-time detection system for classifying suspicious address in the twitter stream.
Article
With the increasing use of social networking site, there are increments in the malicious, fake, and viruses. Daily approximately 20 million users will register in one day on different social networking site. Unfortunately, hackers have realized the potential of using apps for spreading malware and spam. And now days this problem is more critical, as our survey find that at least 13% of apps in our dataset are malicious. And due to this the research community has focused on detecting malicious posts and campaigns. In this paper, we took the survey of some social networking sites and application and malicious activity relates to it. Also we mention the different techniques to control malicious activities for different social networking sites like Twitter, Facebook.
Article
Full-text available
As cyber-attacks grow fast and complicated, the cybersecurity industry faces challenges to utilize state-of-the-art technology and strategies to battle the consistently present malicious threats. Phishing is a sort of social engineering attack produced technically and classified as identity theft and complicated attack vectors to steal information of internet users. In this perspective, our main objective of this study is to propose a unique, robust ensemble machine learning model architecture that provides the highest prediction accuracy with a low error rate while proposing few other robust machine learning models. Both supervised and unsupervised techniques were used for the detection process. For our experiments, seven classification algorithms, one clustering algorithm, two ensemble techniques, and two large standard legitimate datasets with 73,575 URLs and 100,000 URLs were used. Two test modes (percentage split, K-Fold cross-validation) were utilized for conducting experiments and final predictions. Mechanisms were developed to (i) identify the best N, which is the optimal heuristic-based threshold value for splitting words into subwords for each classifier, (ii) tune hyperparameters for each classifier to specify the best parameter combination, (iii) select prominent features using various feature selection techniques, (iv) propose a robust ensemble model (classifier) called the Expandable Random Gradient Stacked Voting Classifier (ERG-SVC) utilizing a voting classifier along with a model architecture, (v) analyze possible clusters of the dataset using k-means clustering, (vi) thoroughly analyze the gradient boost classifier (GB) with respect to utilizing the “criterion" parameter with the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Friendman_MSE, and(vii) propose a lightweight preprocessor to reduce computational cost and preprocessing time. Initial experiments were carried out with 46 features; the number of features was reduced to 22 after the experiments. The results show that the GB classifier outperformed with the least number of NLP based features by achieving a 98.118% prediction accuracy. Furthermore, our stacking ensemble model and proposed voting ensemble model (ERG-SVC) outperformed other tested approaches and yielded reliable prediction accuracy results in detecting malicious URLs at rates of 98.23% and 98.27%, respectively.
Article
Full-text available
With the increasing use of social networking site, there are increments in the malicious, fake, and viruses. Daily approximately 20 million users will register in one day on different social networking site. Unfortunately, hackers have realized the potential of using apps for spreading malware and spam. And now days this problem is more critical, as our survey find that at least 13% of apps in our dataset are malicious. And due to this the research community has focused on detecting malicious posts and campaigns. In this paper, we took the survey of some social networking sites and application and malicious activity relates to it. Also we mention the different techniques to control malicious activities for different social networking sites like Twitter, Facebook.
Article
This study investigates on running a program in a very controlled environment where various URLs are blocked while the program is running. The program selects the HTML content from certain websites, extracts the target program from the HTML content and runs them automatically on your system. While the process is running, every URL that is been accessed is actively monitored at every 5 seconds until the browser is closed.
Conference Paper
Twitter is one of the most popular social networking and micro-blogging Web sites in the world. TinyURL is a URL or link shortening web service used by Twitter. Recently, it is being exploited by spammers as a podium to transmit malicious information. TinyURL spam detection in Twitter is a challenging task. In this paper, an efficient scheme is proposed to detect spam in the TinyURL with particular focus on Twitter tweets. A set of features are first extracted from the tweets. This feature set is analyzed to select a reduced set of features. The reduced feature set is fed as input to train three classifiers, namely simple logistic regression, decision tree, and SVM. The classification results show that the SVM classifier has the highest accuracy in detecting spam in TinyURLs.
Chapter
Various research findings suggest that humans often mistake social robot (‘bot) accounts for human in a microblogging context. The core research question here asks whether the use of social network analysis may help identify whether a social media account is fully automated, semi-automated, or fully human (embodied personhood)—in the contexts of Twitter and Wikipedia. Three hypotheses are considered: that automated social media account networks will have less diversity and less heterophily; that automated social media accounts will tend to have a botnet social structure, and that cyborg accounts will have select features of human- and robot- social media accounts. The findings suggest limited ability to differentiate the levels of automation in a social media account based solely on social network analysis alone in the face of a determined and semi-sophisticated adversary given the ease of network account sock-puppetry but does suggest some effective detection approaches in combination with other information streams.
Chapter
As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.
Chapter
Various research findings suggest that humans often mistake social robot ('bot) accounts for human in a microblogging context. The core research question here asks whether the use of social network analysis may help identify whether a social media account is fully automated, semi-automated, or fully human (embodied personhood)-in the contexts of Twitter and Wikipedia. Three hypotheses are considered: that automated social media account networks will have less diversity and less heterophily; that automated social media accounts will tend to have a botnet social structure, and that cyborg accounts will have select features of human- and robot- social media accounts. The findings suggest limited ability to differentiate the levels of automation in a social media account based solely on social network analysis alone in the face of a determined and semi-sophisticated adversary given the ease of network account sock-puppetry but does suggest some effective detection approaches in combination with other information streams.
Article
Phishing has become an increasing threat in online space, largely driven by the evolving web, mobile, and social networking technologies. Previous phishing taxonomies have mainly focused on the underlying mechanisms of phishing but ignored the emerging attacking techniques, targeted environments, and countermeasures for mitigating new phishing types. This survey investigates phishing attacks and antiphishing techniques developed not only in traditional environments such as e-mails and websites, but also in new environments such as mobile and social networking sites. Taking an integrated view of phishing, we propose a taxonomy that involves attacking techniques, countermeasures, targeted environments and communication media. The taxonomy will not only provide guidance for the design of effective techniques for phishing detection and prevention in various types of environments, but also facilitate practitioners in evaluating and selecting tools, methods, and features for handling specific types of phishing problems.
Article
Full-text available
All the information and data on the Internet are connected based on URL. Although many people use URL to share and convey the information, it is difficult to transmit the information when URL is long and special characters are mixed. Short URL service is a service that transforms long URL with information into short form of URL and conveys the information, which makes it possible to access the page with necessary information. Recently, attackers who want to distribute the malicious code abuse the short URL through SMS or SNS to distribute malicious codes. With the short URL information, as it is difficult to predict the original URL, it has the vulnerability to Phishing attacks. In this study, a method is proposed, which writes the destination information when generating a short URL so that a user is able to check whether the destination is a web document or a file. The service provider of short URL monitors the risk of target URL page of the generated short URL and decides whether to provide service. By monitoring the modification of web-document, it measures and evaluates the risk of the webpage and decides whether to block the short URL according to the threshold, which prevents attacks such as “drive by download” through the short URL.
Thesis
More and more attention is paid on crossmedia publishing in publishing houses. The reason for this is predominantely the growing number of output channels on the one hand and associated economic importance on the other hand. Especially, in the field of education, crossmedia publishing of textbooks can contribute to an additional added value for teachers and learners under the consideration of educational suitability in combination with technical and legal frameworks. Herein, content, design, legal and technical production requirements for crossmedia textbooks, which are obtained from existing scientific publications and surveys form the basis of discussion and evaluation. In this work a concept for crossmedia publishing of textbooks is developed and hence a list of criteria consisting of content, design and technical production conditions is derived. These criteria should be met and taken into account in the secondary stage in german and english subjects, in order to achieve a high educational effect on learners.
Conference Paper
With the rapid development of Internet, micro-blog service has become the fastest growing Internet application, where URLs play an important role in the social network. However, the studies on analyzing the URL resources especially for Chinese micro-blog system are extremely scarce. In this paper, we construct a corpus which contains the dissemination and classification information about URLs in Sina Weibo. Then we focus on the typical questions who publishes the URLs, what the URLs point to and how the URLs are disseminated and answer all the questions above by analyzing a recent Sina Weibo corpus. We find that verified users tend to publish about twice the amount of URLs as non-verfied users; Video URLs are more easily to disseminate in Sina Weibo. Our findings provide insights on downstream IR applications such as search engine and recommender systems effectively.
Conference Paper
With the prevalence of cutting-edge technology, the social media network is gaining popularity and is becoming a worldwide phenomenon. Twitter is one of the most widely used social media sites, with over 500 million users all around the world. Along with its rapidly growing number of users, it has also attracted unwanted users such as scammers, spammers and phishers. Research has already been conducted to prevent such issues using network or contextual features with supervised learning. However, these methods are not robust to changes, such as temporal changes or changes in phishing trends. Current techniques also use additional network information. However, these techniques cannot be used before spammers form a particular number of user relationships. We propose an unsupervised technique that detects phishing in Twitter using a 2-phase unsupervised learning algorithm called PDT (Phishing Detector for Twitter). From the experiments we show that our technique has high accuracy ranging between 0.88 and 0.99.
Conference Paper
This paper addresses the challenge of detecting spam URLs in social media, which is an important task for shielding users from links associated with phishing, malware, and other low-quality, suspicious content. Rather than rely on traditional blacklist-based filters or content analysis of the landing page for Web URLs, we examine the behavioral factors of both who is posting the URL and who is clicking on the URL. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. Concretely, we propose and evaluate fifteen click and posting-based features. Through extensive experimental evaluation, we find that this purely behavioral approach can achieve high precision (0.86), recall (0.86), and area-under-the-curve (0.92), suggesting the potential for robust behavior-based spam detection.
Article
Full-text available
Social Networks such as Twitter, Facebook plays a remarkable growth in recent years. The ratio of tweets or messages in the form of URLs increases day by day. As the number of URL increases, the probability of fabrication also gets increased using their HTML content as well as by the usage of tiny URLs. So it is important to classify the URLs by means of some modern techniques. Conditional redirection method is used here by which the URLs get classified and also the target page that the user needs is achieved. Learning methods also introduced to differentiate the URLs and there by the fabrication is not possible. Also the classifiers will efficiently detect the suspicious URLs using link analysis algorithm.
Article
Now a days in context of online social media, hackers have started using social networks like Twitter, Facebook Google+ etc for their unauthorized activities. These are very popular social networking sites which are used by numerous people to get connected with each other and share their every day's happenings through it. In this paper we consider twitter as such a social networking site to experiment. Twitter is extremely popular for micro-blogging where people post short messages of 140 characters called as tweets. It has over 200 million active users who post approximately 300 million tweets everyday on the walls. Hackers or attackers have started using Twitter as a medium to spread virus as the available information is quite vast and scattered. Also it is very easy to spread and posting URLs on twitter wall. Our experiment shows the detection of Malicious URLs on Twitter in real-time. We test such a method to discover correlated URL redirect chains using the frequently shared URLs. We used the collection of tweets from which we extract features based on URL redirection. Then we find entry points of correlated URLs. Crawler browser marks the suspicious URL. The system shows the expected results of detection of malicious URLs.
Conference Paper
With social networks like Facebook, Twitter and Google+ attracting audiences of millions of users, they have been an important communication platform in daily life. This in turn attracts malicious users to the social networks as well, causing an increase in the incidence of low quality information. Low quality information such as spam and rumors is a nuisance to people and hinders them from consuming information that is pertinent to them or that they are looking for. Although individual social networks are capable of filtering a significant amount of low quality information they receive, they usually require large amounts of resources (e.g, personnel) and incur a delay before detecting new types of low quality information. Also the evolution of various low quality information posts lots of challenges to defensive techniques. My PhD thesis work focuses on the analysis and detection of low quality information in social networks. We introduce social spam analytics and detection framework SPADE across multiple social networks showing the efficiency and flexibility of cross-domain classification and associative classification. For evolutionary study of low quality information, we present the results on large-scale study on Web spam and email spam over a long period of time. Furthermore, we provide activity-based detection approaches to filter out low quality information in social networks: click traffic analysis of short URL spam, behavior analysis of URL spam and information diffusion analysis of rumor. Our framework and detection techniques show promising results in analyzing and detecting low quality information in social networks.
Conference Paper
Full-text available
URL shortening services have become extremely popular. However, it is still unclear whether they are an effective and reliable tool that can be leveraged to hide malicious URLs, and to what extent these abuses can impact the end users. With these questions in mind, we first analyzed existing countermeasures adopted by popular shortening services. Surprisingly, we found such countermeasures to be ineffective and trivial to bypass. This first measurement motivated us to proceed further with a large-scale collection of the HTTP interactions that originate when web users access live pages that contain short URLs. To this end, we monitored 622 distinct URL shortening services between March 2010 and April 2012, and collected 24,953,881 distinct short URLs. With this large dataset, we studied the abuse of short URLs. Despite short URLs are a significant, new security risk, in accordance with the reports resulting from the observation of the overall phishing and spamming activity, we found that only a relatively small fraction of users ever encountered malicious short URLs. Interestingly, during the second year of measurement, we noticed an increased percentage of short URLs being abused for drive-by download campaigns and a decreased percentage of short URLs being abused for spam campaigns. In addition to these security-related findings, our unique monitoring infrastructure and large dataset allowed us to complement previous research on short URLs and analyze these web services from the user's perspective.
Article
Twitter is prone to malicious tweets containing URLs for spar, phishing, and malware distribution. Conventional Twitter spar detection schemes utilize account features such as the ratio of tweets containing URLs and the account creation date, or relation features in the Twitter graph. These detection schemes are ineffective against feature fabrications or consume much time and resources. Conventional suspicious URL detection schemes utilize several features including lexical features of URLs, URL redirection, HTIUIL content, and dynamic behavior. However, evading techniques such as time-based evasion and crawler evasion exist. in this paper, we propose WARNINGBIRD, a suspicious URL detection system for Twitter. Our system investigates correlations of URL redirect chains extracted from several tweets. Because attackers have limited resources and usually reuse them, their URL redirect chains frequently share the same URLs. We develop methods to discover correlated URL redirect chains using the frequently shared URLs and to determine their suspiciousness. We collect numerous tweets from the Twitter public timeline and build a statistical classifier using them. Evaluation results show that our classifier accurately and efficiently detects suspicious URLs. We also present WARNINGBIRD as a near real-time system for classifying suspicious URLs in the Twitter stream.
Conference Paper
Full-text available
Short URLs have become ubiquitous. Especially popular within social networking services, short URLs have seen a significant increase in their usage over the past years, mostly due to Twitter's restriction of message length to 140 characters. In this paper, we provide a first characterization on the usage of short URLs. Specifically, our goal is to examine the content short URLs point to, how they are published, their popularity and activity over time, as well as their potential impact on the performance of the web. Our study is based on traces of short URLs as seen from two different perspectives: i) collected through a large-scale crawl of URL shortening services, and ii) collected by crawling Twitter messages. The former provides a general characterization on the usage of short URLs, while the latter provides a more focused view on how certain communities use shortening services. Our analysis highlights that domain and website popularity, as seen from short URLs, significantly differs from the distributions provided by well publicised services such as Alexa. The set of most popular websites pointed to by short URLs appears stable over time, despite the fact that short URLs have a limited high popularity lifetime. Surprisingly short URLs are not ephemeral, as a significant fraction, roughly 50%, appears active for more than three months. Overall, our study emphasizes the fact that short URLs reflect an "alternative" web and, hence, provide an additional view on web usage and content consumption complementing traditional measurement sources. Furthermore, our study reveals the need for alternative shortening architectures that will eliminate the non-negligible performance penalty imposed by today's shortening services.
Conference Paper
Full-text available
We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
Conference Paper
Full-text available
Size, accessibility, and rate of growth of Online Social Media (OSM) has attracted cyber crimes through them. One form of cyber crime that has been increasing steadily is phishing, where the goal (for the phishers) is to steal personal information from users which can be used for fraudulent purposes. Although the research community and industry has been developing techniques to identify phishing attacks through emails and instant messaging (IM), there is very little research done, that provides a deeper understanding of phishing in online social media. Due to constraints of limited text space in social systems like Twitter, phishers have begun to use URL shortener services. In this study, we provide an overview of phishing attacks for this new scenario. One of our main conclusions is that phishers are using URL shorteners not only for reducing space but also to hide their identity. We observe that social media websites like Facebook, Habbo, Orkut are competing with e-commerce services like PayPal, eBay in terms of traffic and focus of phishers. Orkut, Habbo, and Facebook are amongst the top 5 brands targeted by phishers. We study the referrals from Twitter to understand the evolving phishing strategy. A staggering 89% of references from Twitter (users) are inorganic accounts which are sparsely connected amongst themselves, but have large number of followers and followees. We observe that most of the phishing tweets spread by extensive use of attractive words and multiple hashtags. To the best of our knowledge, this is the first study to connect the phishing landscape using blacklisted phishing URLs from PhishTank, URL statistics from bit.ly and cues from Twitter to track the impact of phishing in online social media.
Article
On March 11th 2011, a great earthquake and tsunami hit eastern Japan. After that, several web sites, especially those providing helpful disaster-related information, were overloaded due to flash crowds caused by Twitter users. In order to mitigate the flash crowds, we develop a new URL shortener that redirects Twitter users to a CDN instead of original sites, since Twitter users rely on URL shorteners like bit.ly to shorten long URLs. In this paper, we describe our experience of developing and operating the URL shortener in the aftermath of the giant earthquake. Since the flash crowds were a serious problem in an emergency, we had to develop it as quickly as possible with a spirit of so-called agile software development. We then explain our HTTP request log collected at the URL shortener (it is now available online). To investigate the cause of flash crowds, the log is examined with tweets (Twitter messages) provided by another research project; this collaboration is realized by the encouragement of the workshop committee. We hope our experience will be helpful in tackling future disasters.
Article
Social bots are automatic or semi-automatic computer pro-grams that mimic humans and/or human behavior in online social networks. Social bots can attack users (targets) in on-line social networks to pursue a variety of latent goals, such as to spread information or to influence targets. Without a deep understanding of the nature of such attacks or the susceptibility of users, the potential of social media as an instrument for facilitating discourse or democratic processes is in jeopardy. In this paper, we study data from the So-cial Bot Challenge 2011 -an experiment conducted by the WebEcologyProject during 2011 -in which three teams im-plemented a number of social bots that aimed to influence user behavior on Twitter. Using this data, we aim to de-velop models to (i) identify susceptible users among a set of targets and (ii) predict users' level of susceptibility. We explore the predictiveness of three different groups of fea-tures (network, behavioral and linguistic features) for these tasks. Our results suggest that susceptible users tend to use Twitter for a conversational purpose and tend to be more open and social since they communicate with many different users, use more social words and show more affection than non-susceptible users.
Conference Paper
In this work we present a characterization of spam on Twitter. We find that 8% of 25 million URLs posted to the site point to phishing, malware, and scams listed on popular blacklists. We analyze the accounts that send spam and find evidence that it originates from previously legitimate accounts that have been compromised and are now being puppeteered by spammers. Using clickthrough data, we analyze spammers' use of features unique to Twitter and the degree that they affect the success of spam. We find that Twitter is a highly successful platform for coercing users to visit spam pages, with a clickthrough rate of 0.13%, compared to much lower rates previously reported for email spam. We group spam URLs into campaigns and identify trends that uniquely distinguish phishing, malware, and spam, to gain an insight into the underlying techniques used to attract users. Given the absence of spam filtering on Twitter, we examine whether the use of URL blacklists would help to significantly stem the spread of Twitter spam. Our results indicate that blacklists are too slow at identifying new threats, allowing more than 90% of visitors to view a page before it becomes blacklisted. We also find that even if blacklist delays were reduced, the use by spammers of URL shortening services for obfuscation negates the potential gains unless tools that use blacklists develop more sophisticated spam filtering.
Article
The minimum Pearson chi-square estimator and test are defined for multinomial log-linear models with expected frequencies subject to linear constraints. The minimum Pearson chi-square estimate defined here yields predicted cell frequency estimates that, unlike the maximum likelihood estimates, minimize the popular Pearson chi-square test statistic.
cx usage time analysis video
  • Qr
qr.cx usage time analysis video. http://qr.cx/8Ctq or http://youtu.be/06Mhn0L23Tk, 2012.