Article

Telecom Fraud Detection with Big Data Analytics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Today's highly connected world suffers from the increase and variety of cyber-attacks. To mitigate those threats, researchers have been continuously exploring different methods for intrusion detection through the last years. In this paper, we study the use of data mining techniques for intrusion detection. The research intends to compare the performances of classification techniques for intrusion detection. To reach the goal, we involve 74 classification techniques in this comparative study. The study shows that no technique outperforms the others in all situations. However, some classification methods lead to promising results and give clues for further combinations.
Article
Full-text available
With the fast development of Smart Grid globally, the security issues arise sharply and Non-Technical Loss (NTL) fraud is one of the major security issues. There are some existing NTL fraud detectors, however, when big data security challenges emerge in Smart Grid, none of them can detect NTL fraud for big data in Smart Grid. In this paper, we propose ENFD, an NTL detection scheme enabled by Edge Computing and big data analytic tools to address big data NTL fraud detection problem in Smart Grid. The research work provides us with experience of developing big data security solutions in Smart Grid. The experimental results show that ENFD can efficiently detect big data NTL frauds which cannot be detected by the state-of-the-art detectors. ENFD can detect small data NTL frauds as well and the average detection speed is about six to seven times that of the fastest detector exists in the literature.
Article
Full-text available
Abstract In the United States, advances in technology and medical sciences continue to improve the general well-being of the population. With this continued progress, programs such as Medicare are needed to help manage the high costs associated with quality healthcare. Unfortunately, there are individuals who commit fraud for nefarious reasons and personal gain, limiting Medicare’s ability to effectively provide for the healthcare needs of the elderly and other qualifying people. To minimize fraudulent activities, the Centers for Medicare and Medicaid Services (CMS) released a number of “Big Data” datasets for different parts of the Medicare program. In this paper, we focus on the detection of Medicare fraud using the following CMS datasets: (1) Medicare Provider Utilization and Payment Data: Physician and Other Supplier (Part B), (2) Medicare Provider Utilization and Payment Data: Part D Prescriber (Part D), and (3) Medicare Provider Utilization and Payment Data: Referring Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS). Additionally, we create a fourth dataset which is a combination of the three primary datasets. We discuss data processing for all four datasets and the mapping of real-world provider fraud labels using the List of Excluded Individuals and Entities (LEIE) from the Office of the Inspector General. Our exploratory analysis on Medicare fraud detection involves building and assessing three learners on each dataset. Based on the Area under the Receiver Operating Characteristic (ROC) Curve performance metric, our results show that the Combined dataset with the Logistic Regression (LR) learner yielded the best overall score at 0.816, closely followed by the Part B dataset with LR at 0.805. Overall, the Combined and Part B datasets produced the best fraud detection performance with no statistical difference between these datasets, over all the learners. Therefore, based on our results and the assumption that there is no way to know within which part of Medicare a physician will commit fraud, we suggest using the Combined dataset for detecting fraudulent behavior when a physician has submitted payments through any or all Medicare parts evaluated in our study.
Article
Full-text available
Over the last decades, numerous security and privacy issues in all three active mobile network generations have been revealed that threaten users as well as network providers. In view of the newest generation (5G) currently under development, we now have the unique opportunity to identify research directions for the next generation based on existing security and privacy issues as well as already proposed defenses. This paper aims to unify security knowledge on mobile phone networks into a comprehensive overview and to derive pressing open research questions. To achieve this systematically, we develop a methodology that categorizes known attacks by their aim, proposed defenses, underlying causes, and root causes. Further, we assess the impact and the efficacy of each attack and defense. We then apply this methodology to existing literature on attacks and defenses in all three network generations. By doing so, we identify ten causes and four root causes of attacks. Mapping the attacks to proposed defenses and suggestions for the 5G specification enables us to uncover open research questions and challenges for the development of next-generation mobile networks. The problems of unsecured pre-authentication traffic and jamming attacks exist across all three mobile generations. They should be addressed in the future, in particular, to wipe out the class of downgrade attacks and, thereby, strengthen the users' privacy. Further advances are needed in the areas of inter-operator protocols as well as secure baseband implementations. Mitigations against signal denial-of-service attacks by smart protocol also design represent open research questions.
Article
Full-text available
With big data growing rapidly in importance over the past few years, academics and practitioners have been considering the means through which they can incorporate the shifts these technologies bring into their competitive strategies. To date, emphasis has been on the technical aspects of big data, with limited attention paid to the organizational changes they entail and how they should be leveraged strategically. As with any novel technology, it is important to understand the mechanisms and processes through which big data can add business value to companies, and to have a clear picture of the different elements and their interdependencies. To this end, the present paper aims to provide a systematic literature review that can help to explain the mechanisms through which big data analytics (BDA) lead to competitive performance gains. The research framework is grounded on past empirical work on IT business value research, and builds on the resource-based view and dynamic capabilities view of the firm. By identifying the main areas of focus for BDA and explaining the mechanisms through which they should be leveraged, this paper attempts to add to literature on how big data should be examined as a source of competitive advantage. To this end, we identify gaps in the extant literature and propose six future research themes.
Article
Full-text available
In this paper we present the design, implementation and experimental evaluation of Kerberos, an architecture for the detection of frauds in current generation Voice over IP (VoIP) networks. Kerberos is fed by an On-line Charging System (OCS) generating events associated with the setup, evolution and tear-down of end-user calls in a VoIP network compliant with the IP Multimedia Subsystem (IMS) specification. Such events are properly correlated in order to identify, in real-time, patterns associated with a fraudulent utilization of the Operator's resources. The detection phase can in turn trigger the subsequent remediation actions. Communication between the OCS and Kerberos is based on an asynchronous paradigm, whereas event correlation and analysis are effectively realized through a Complex Event Processing approach. The paper will shed light on both the design and the implementation of the system, whose performance is then evaluated by relying on a real-world dataset of Call Detail Record (CDR) events provided by Tiscali, a well known Italian Operator.
Conference Paper
Full-text available
Voice traffic termination fraud, often referred to as Subscriber Identity Module box (SIMbox) fraud, is a common illegal practice on mobile networks. As a result, cellular operators around the globe lose billions annually. Moreover, SIMboxes compromise the cellular network infrastructure by overloading local base stations serving these devices. This paper analyzes the fraudulent traffic from SIMboxes operating with a large number of SIM cards. It processes hundreds of millions of anonymized voice call detail records (CDRs) from one of the main cellular operators in the United States. In addition to overloading voice traffic, fraudulent SIMboxes are observed to have static physical locations and to generate disproportionately large volume of outgoing calls. Based on these observations, novel classifiers for fraudulent SIMbox detection in mobility networks are proposed. Their outputs are optimally fused to increase the detection rate. The operator's fraud department confirmed that the algorithm succeeds in detecting new fraudulent SIMboxes.
Article
Full-text available
Electronic fraud is highly lucrative, with estimates suggesting these crimes to be worth millions of dollars annually. Because of its complex nature, electronic fraud detection is typically impractical to solve without automation. However, the creation of automated systems to detect fraud is very difficult as adversaries readily adapt and change their fraudulent activities which are often lost in the magnitude of legitimate transactions. This study reviews the most popular types of electronic fraud and the existing nature-inspired detection methods that are used for them. The common characteristics of electronic fraud are examined in detail along with the difficulties and challenges that these present to computational intelligence systems. Finally, open questions and opportunities for further work, including a discussion of emerging types of electronic fraud, are presented to provide a context for ongoing research.
Chapter
Full-text available
One of the most severe threats to revenue and quality of service in telecom providers is fraud. The advent of new technologies has provided fraudsters new techniques to commit fraud. SIM box fraud is one of such fraud that has emerged with the use of VOIP technologies. In this work, a total of nine features found to be useful in identifying SIM box fraud subscriber are derived from the attributes of the Customer Database Record (CDR). Artificial Neural Networks (ANN) has shown promising solutions in classification problems due to their generalization capabilities. Therefore, supervised learning method was applied using Multi layer perceptron (MLP) as a classifier. Dataset obtained from real mobile communication company was used for the experiments. ANN had shown classification accuracy of 98.71 %.
Article
Full-text available
Fraud detection is an increasingly important and difficult task in today's technological environment. As consumers are putting more of their personal information online and transacting much more business over computers, the potential for losses from fraud is in the billions of dollars, not to mention the damage done by identity theft. This paper reviews the history of fraud detection at AT&T, one of the first companies to address fraud in a systematic way to protect its revenue stream. We discuss some of the major fraud schemes and the techniques used to address them, leading to generic conclusions about fraud detection. Specifically, we advocate the use of simple, understandable models, heavy use of visualization, and a flexible environment and emphasize the importance of data management and the need to keep humans in the loop.
Conference Paper
Full-text available
Many fraud analysis systems have at their heart a rule-based engine for generating alerts about suspicious behaviors. The rules in the system are usually based on expert knowledge. Automatic rule discovery aims at using past examples of fraudulent and legitimate usage to find new patterns and rules to help distinguish between the two. Some aspects of the problem of finding rules suitable for fraud analysis make this problem unique. Among them are the following: the need to find rules combining both the properties of the customer (e.g., credit rating) and properties of the specific "behavior" which indicates fraud (e.g., number of international calls in one day); and the need for a new definition of accuracy: We need to find rules which do not necessarily classify correctly each individual "usage sample" as either fraudulent or not, but ensure the identification, with a minimum of wasted cost and effort, of most of the fraud "cases" (i.e., defrauded customers). These aspects require a special-purpose rule discovery system. We present as an example a two-stage system based on adaptation of the C4.5 rule generator, with an additional rule selection mechanism. Our experimental results indicate that this route is very promising.
Conference Paper
Full-text available
All over the world we have been assisting to a signican t increase of the telecommunication systems usage. People are faced day after day with strong marketing campaigns seeking their attention to new telecommunication products and services. Telecommunication com- panies struggle in a high competitive business arena. It seems that their eorts were well done, because customers are strongly adopting the new trends and use (and abuse) systematically communication services in their quotidian. Although fraud situations are rare, they are increasing and they correspond to a large amount of money that telecommunication companies lose every year. In this work, we studied the problem of fraud detection in telecommunication systems, especially the cases of superim- posed fraud, providing an anomaly detection technique, supported by a signature schema. Our main goal is to detect deviate behaviors in useful time, giving better basis to fraud analysts to be more accurate in their decisions in the establishment of potential fraud situations.
Article
Full-text available
Finding telecommunications fraud in masses of call records is more dicult than nding a needle in a haystack. In the haystack problem, there is only one needle that does not look like hay, the pieces of hay all look similar, and neither the needle nor the hay changes much over time. Fraudulent calls may be rare like needles in haystacks, but they are much more challenging to nd. Callers are dissimilar, so calls that look like fraud for one account look like expected behavior for another, while all needles look the same. Moreover, fraud has to be found repeatedly, as fast as fraud calls are placed, the nature of fraud changes over time, the extent of fraud is unknown in advance, and fraud may be spread over more than one type of service. For example, calls placed on a stolen wireless telephone may be charged to a stolen credit card. Finding fraud is like nding a needle in a haystack only in the sense of sifting through masses of data to nd something rare. This paper describ...
Conference Paper
Over the past decade, the number of mobile phones has increased dramatically, overtaking the world population in October 2014. In developing countries like India and China, mobile subscribers outnumber traditional landline users and account for over 90% of the active population. At the same time, convergence of telephony with the Internet with technologies like VoIP makes it possible to reach a large number of telephone users at a low or no cost via voice calls or SMS (short message service) messages. As a consequence, cybercriminals are abusing the telephony channel to launch attacks, e.g., scams that offer fraudulent services and voice-based phishing or vishing, that have previously relied on the Internet. In this paper, we introduce and deploy the first mobile phone honeypot called MobiPot that allow us to collect fraudulent calls and SMS messages. We implement multiple ways of advertising mobile numbers (honeycards) on MobiPot to investigate how fraudsters collect phone numbers that are targeted by them. During a period of over seven months, MobiPot collected over two thousand voice calls and SMS messages, and we confirmed that over half of them were unsolicited. We found that seeding honeycards enables us to discover attacks on the mobile phone numbers which were not known before.
Article
The increment of computer technology use and the continued growth of companies have enabled most financial transactions to be performed through the electronic commerce systems, such as using the credit card system, telecommunication system, healthcare insurance system, etc. Unfortunately, these systems are used by both legitimate users and fraudsters. In addition, fraudsters utilized different approaches to breach the electronic commerce systems. Fraud prevention systems (FPSs) are insufficient to provide adequate security to the electronic commerce systems. However, the collaboration of FDSs with FPSs might be effective to secure electronic commerce systems. Nevertheless, there are issues and challenges that hinder the performance of FDSs, such as concept drift, supports real time detection, skewed distribution, large amount of data etc. This survey paper aims to provide a systematic and comprehensive overview of these issues and challenges that obstruct the performance of FDSs. We have selected five electronic commerce systems; which are credit card, telecommunication, healthcare insurance, automobile insurance and online auction. The prevalent fraud types in those E-commerce systems are introduced closely. Further, state-of-the-art FDSs approaches in selected E-commerce systems are systematically introduced. Then a brief discussion on potential research trends in the near future and conclusion are presented.
Article
Information and communication technologies are widely used in healthcare. However, there is not still a unified taxonomy for them. The lack of understanding of this phenomenon implies theoretical and ethical issues. This paper attempts to find out the basis for a classification, starting from a new perspective: the structural elements are obtained from the etymologies of the lexicon commonly used, that is words like telemedicine, telehealth, telecare and telecure. This will promote a better understanding of communication technologies; at the same time, it will allow to draw some reflection about health, medicine and care, and their semantic and relational nature.
Article
Subscription fraud, i.e., customers signing up to a service with no intent to pay, causes significant losses in the telecommunication industry. Telecom operators have developed strategies to identify those fraudsters, but fraudsters tend to migrate from one carrier to another. Data sharing between telecoms would increase fraud detection rates, but phone records are protected by law and telecom operators might be reluctant to share information about fraudsters because they see it as giving a competitive advantage. We propose several protocols to enable fraud detection across multiple databases without revealing additional information.We also propose a model to generate phone records, with which we evaluate how the choice of parameters affects detection performance. We show feasibility, performance and costs with implementations of our protocols.
Conference Paper
The Voice over IP (VoIP) application utilizes the Internet to provide voice service; thus it is susceptible to various security issues common on the IP networks, such as the flooding attack. Moreover, VoIP uses the Session Initiation Protocol (SIP) for session control and management. The transactional nature of SIP makes flooding attack an even severer threat, which can consequentially lead to denial of service (DoS). In this paper, we develop an efficient online SIP flooding detection scheme by integrating the sketch technique with Hellinger distance (HD) based detection. The sketch data structure can summarize the SIP call generating process into a fixed set of data for developing a probability model. The HD technique, combined with on-line traffic estimation, can efficiently identify attacks by monitoring the distance between current traffic distribution and the estimated distribution based on history information. Compared to the original HD detection system, our technique achieves the advantages of higher accuracy, flexibility to deal with multi-attribute attacks and DDoS attacks, and the ability to track the period of attack. Computer simulation results are presented to demonstrate the performance of the proposed technique.
Article
This paper investigates the usefulness of applying different learning approaches to a problem of telecommunications fraud detection. Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering. One aim of the study is to identify the user model that best identifies fraud cases. The second task is to explore different views of the same problem and see what can be learned form the application of each different technique. All data come from real defrauded user accounts in a telecommunications network. The models are compared in terms of their performances. Each technique’s outcome is evaluated with appropriate measures.
Article
We have been developing signature-based methods in the telecommunications industry for the past 5 years. In this paper, we describe our work as it evolved due to improvements in technology and our aggressive attitude toward scale. We discuss the types of features that our signatures contain, nuances of how these are updated through time, our treatment of outliers, and the trade-off between time-driven and event-driven processing. We provide a number of examples, all drawn from the application of signatures to toll fraud detection.
Patent and Trademark Office
  • M Blatt
  • A Kaufman
Blatt, M. and Kaufman, A. (2017) U.S. Patent No. 9,699,660, U.S. Patent and Trademark Office, Washington DC.
Online credit card fraud detection: a hybrid framework with big data technologies
  • Y Dai
  • J Yan
  • X Tang
  • H Zhao
  • M Guo
Dai, Y., Yan, J., Tang, X., Zhao, H. and Guo, M. (2016) 'Online credit card fraud detection: a hybrid framework with big data technologies', EEE Trustcom/BigDataSE/ISPA, Tianjin, China, pp.1644-1651.
Global Telecom Fraud Survey
  • J Howell
Howell, J. (2019) Global Telecom Fraud Survey, Communications Fraud Control Association, New Jersey, pp.1-2.
Isolating and analyzing fraud activities in a large cellular network via voice call graph analysis
  • N Y J A Jiang
  • W-L H G Skudlark
  • S P Jacobson
  • Z-L Zhang
Jiang, N.Y.J.A., Skudlark, W-L.H.G., Jacobson, S.P. and Z-L., Zhang (2012) 'Isolating and analyzing fraud activities in a large cellular network via voice call graph analysis', ACM International Conference on Mobile Systems, Applications, and Services, Low Wood Bay Lake District, UK, pp.253-266.
Authloop: end-to-end cryptographic authentication for telephony over voice channels
  • B Reaves
  • L Blue
  • P Traynor
Reaves, B., Blue, L. and Traynor, P. (2016) 'Authloop: end-to-end cryptographic authentication for telephony over voice channels', 25th USENIX Security Symposium, Austin, USA, pp.963-978.
Boxed out: blocking cellular interconnect bypass fraud at the network edge
  • B Reaves
  • E Shernan
  • A Bates
  • H Carter
  • P Traynor
Reaves, B., Shernan, E., Bates, A., Carter, H. and Traynor, P. (2015) 'Boxed out: blocking cellular interconnect bypass fraud at the network edge', 24th USENIX Security Symposium, Washington, USA, pp.833-848.
  • L Wood
Wood, L. (2020) Intelligence Report -Key Global Telecom Industry Statistics, Paul Budde Communication, Australia, pp.1-39.