ArticlePDF Available

Discovering telecom fraud situations through mining anomalous behavior patterns

Authors:

Abstract and Figures

In this paper we tackle the problem of superimposed fraud detection in telecommunication systems. We propose two anomaly detection methods based on the concept of signatures. The first method relies on a signature deviation-based approach while the second on a dynamic clustering analysis. Experiments carried out with real data, voice call records from an entire week, corresponding to approximately 2.5 millions of CDRs and 700 thousand of signatures processed per day, allowed us to detect several anomalous situations. The frauds analysts provide us a small list of 12 customers for whom a fraudulent behavior was detected during this week. Thus, 9 and 11 fraud situations were discovered from each method respectively. Preliminary results and discussion with fraud analysts has already proved that our methods are a valuable tool to assist them in fraud detection.
Content may be subject to copyright.
Discovering Telecom Fraud Situations through Mining
Anomalous Behavior Patterns
Ronnie Alves, Pedro Ferreira,
Orlando Belo, Joao Lopes, Joel
Ribeiro
University of Minho
Campus de Gualtar
4710-057 Braga
PORTUGAL
{ronnie, pedrogabriel,
obelo}@di.uminho.pt
Luís Cortesão
Portugal Telecom Inovação, SA,
Rua Eng. José Ferreira Pinto Basto
3810-106 Aveiro
PORTUGAL
lcorte@ptinovacao.pt
Filipe Martins
Telbit, Lda,
Rua Banda da Amizade, 38
3810-059 Aveiro
PORTUGAL
fmartins@telbit.pt
ABSTRACT
In this paper we tackle the problem of superimposed fraud
detection in telecommunication systems. We propose two
anomaly detection methods based on the concept of signatures.
The first method relies on a signature deviation-based approach
while the second on a dynamic clustering analysis. Experiments
carried out with real data, voice call records from an entire week,
corresponding to approximately 2.5 millions of CDRs and 700
thousand of signatures processed per day, allowed us to detect
several anomalous situations. The frauds analysts provide us a
small list of 12 customers for whom a fraudulent behavior was
detected during this week. Thus, 9 and 11 fraud situations were
discovered from each method respectively. Preliminary results
and discussion with fraud analysts has already proved that our
methods are a valuable tool to assist them in fraud detection.
1. INTRODUCTION
In superimposed fraud situations, the fraudsters make an
illegitimate use of a legitimate account by different means. In this
case, some abnormal usage is blurred into the characteristic usage
of the account. This type of fraud is usually more difficult to
detect and poses a bigger challenge to the telecommunications
companies. Telecommunications companies use since the 90's
decade several kinds of approaches based on statistical analysis
and heuristics methods to assist them in the detection and
categorization of fraud situations. Recently, they have been
adopting the use and exploitation of data mining and knowledge
discovery techniques for this task. In this paper we tackle the
problem of superimposed fraud detection in telecommunication
systems. Two methods for discovering fraud situations through
mining anomalous customers’ behavior patterns are presented.
These methods are based on the concept of signature [3], which
has already been used successfully for anomalous detection in
many areas like credit card usage [1], network intrusion [2] and in
particular in telecommunications fraud [3]. Our goal was to detect
deviate behaviors in useful time, giving better basis to analysts to
be more accurate in their decisions in the establishment of
potential fraud situations.
2. THE ROLE OF SIGNATURES ON
DETECTING FRAUD
Our technique has as a core concept on the notion of signature.
We emphasize the work of Cortes and Pregibon [3], since it was
the main inspiration for the use of signatures. We have redefined
their notion of signature. A signature of a user corresponds to a
vector of feature variables whose values are determined during a
certain period of time. The variables can be simple, if they consist
into a unique atomic value (ex: integer or real) or complex, if they
consist in two co-dependent statistical values, typically the
average and the standard deviation of a given feature.
Table 1. Description of the fv used in signature and summary.
Description Type
Duration of Calls Complex
N. of Calls – Working Days Complex
N. of Calls – Weekends and Holidays Complex
N. of Calls – Working Time (8h-20h) Complex
N. of Calls – Night Time (20h-8h) Complex
N. of Calls to Diff. National Networks Simple
N. of Calls as Caller (Origin) Simple
N. of Calls as Called (Destination) Simple
N. of International Calls Simple
N. of Calls as Caller in Roaming Simple
N. of Calls as Called in Roaming Simple
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
DMBA'06, August 20, 2006, Philadelphia, Pennsylvania, USA.
Copyright 2006 ACM 1-59593-439-1...$5.00
The choice of the type of the variables depends on several factors,
like the complexity of the feature described or the data available
to perform such calculation. A feature like the duration of the
calls shows a significant variability which is much better
expressed through an average(µ)/standard-deviation(σ)
parameter. A feature like the number of international calls is
typically much less frequent and thus an average value is
sufficient to describe it. In table 1 we list the complete set of
feature variables (fv) used in the context of this work. A signature
S is then obtained from a function φ for a given temporal window
ω, where S = φ(ω). We consider a time unit, the amount of time in
which the CDRs are accumulated and that in the end of this
period are processed. A summary C, has the same information
structure as a signature, but it is used to resume the user behavior
in a smaller time period. Typically, a signature reflects the usage
patterns for a period of a
week, a month or even half year,
whereas a summary reflects the periods of an hour, a half
day or complete day. In this work, we considered the
period of one day for a summary and a week for the
signature.
3. DEVIATING PATTERNS
3.1 Evaluating Similarities among Signatures
3.1.1 Similarity of Simple Feature Variables
A simple feature is defined by a unique variable, which
corresponds to the average value of the considered feature. For
simple feature variable comparison we will make use of a ratio-
scaled function. This type of function makes a positive
measurement on a non-linear scale, which will be, in this case, the
exponential scale. The used function is defined in the range [0, 1]
and is defined according to the equation
}
||
{
),(
Amp
BSS
yx
yx
eSSd
×
=
(1)
In equation 3, S
x
and S
y
are the two variables under comparison,
B is a constant value, and Amp is the amplitude (difference
between the maximum and minimum value) of the respective
feature variable in all signatures space.
3.1.2 Similarity of Complex Feature Variable
Complex feature variables are defined by two co-dependent
variables. These variables correspond respectively to the average
and the standard deviation of the considered feature. For two
complex variables, C
x
= (M
x
,σ
x
,) and C
y
= (M
y
,σ
y
), the similarity
function is defined in equation 4, and is within the range [0, 1].
||
||
),(),(
yx
yx
yxyx
CC
CC
MMdCCd
×=
(2)
Equation 4 is the result of the combination of two formulas, the
similarity function for simple variables (eq. 3) and the
ratio
||
||
yx
yx
CC
CC
. This ratio is also within the range [0, 1] and
provides the overlap degree of the two complex feature variables
by measuring the intersection of the intervals [M
x
-σ
x
, M
x
+σ
x
]
and [M
y
-σ
y
, M
y
+σ
y
].
3.2 Calculating the Distance among
Signatures
Since the feature variables in the signature have different types,
each variable has to be evaluated according to a distinct sub-
function. Thus, the dist function is composed by the several sub-
functions: dist = θ(f
1
, f
2
,…, f
n
). Consider as an example the
simplification of a signature S = {(
µ
a
,σ
a
); µ
b
; µ
c
; (µ
d
,σ
d
)}, where
the first and the last feature variables are complex (calculated by
Eq. 2) and the second and the third are simple (calculated by Eq.
1) variables. Let C = {(
µ’
a
,σ
a
); µ’
b
; µ’
c
; (µ’
d
,σ
d
)}, be a
summary. Since we are interested in considering deviation
detection from a probabilistic point of view, i.e. the distance
measure among two signatures S and C, would therefore
correspond to the probability of C being different from S. The
proposed distance function can be presented as:
22
1111
),(...),(),(
nnnn
CSfCSfCSD ++=
αα
(3)
Different distance functions can be provided, by the fraud analyst,
by setting the weighing factors α
i
to different values. The use of
different distance functions will allow detecting deviations in
different scenarios. The overall distance function can be re-
defined as in 2.
Dist(S, C) = MAX{dist
1
(S, C), dist
2
(S, C),…,dist
m
(S, C)} (4)
If according to the distance function, a threshold value ε defined
by the analyst is exceeded, Dist(S, C) > ε, then an alarm should be
raised to future examination of the respective user. Otherwise, the
user is considered to be within its normal behavior.
3.3 Anomaly Detection Procedure
The anomaly detection procedure based on signature deviation
consists in several steps. It starts by a loading step, which imports
the information to the local database of the system. This
information refers to the signature and summary information of
each user. The signatures are imported only once, when the
system is started. All the signatures of a user are kept through
time. Such information will also be useful for posterior analysis.
A signature may have two different status "Active" or "Expired".
For each user only one signature can have the Active state, and it
is the most up to date one. The processing step is described by
algorithm presented in [5], and follows the previous equations for
calculating the distance and similarities among signatures.
According to equation (4), if an alarm is raised, the user is put on
a blacklist. This is performed on the triggering alarm step, which
is based on the calculation of the whole distance functions over
the signatures. At the end, all the raised alarms have to pass
through the analyst verification in order to determine if this alarm
corresponds or not to a fraud situation. The evaluation of the
alarms is supported by the interface of the system that employs
features of dashboard systems, providing a complete set of
valuable information [5].
3.4 Signature Updating
The updating process of the signatures follows the ideas presented
in [3]. The update of a signature S
t
in the instant t+1, S
t+1
, through
a set of processed CDRs (summary) C, is given by the formula:
S
t+1
= β.S
t
+ (1-β).C (5)
The constant β indicates the weight of the new actions C in the
values of the new signature. Depending on the size of the time
window ω this constant can be adjusted [3]. In contrast to the
system in [3], the value of signature is always updated. If the
Dist(S
t
, C) ε then the user is considered to have a normal
behavior. If Dist(S
t
, C) > ε then an alarm is triggered, nevertheless
the signature continues to be constantly updated. The reason for
this is that the alarm still needs to pass through the analysis of the
company fraud analyst. It could be the case in which the analyst
considers it as a false alarm. The continuous update of that user
signature avoids the loss of information that was gathered
between the moment when the alarm was triggered and the
moment the analyst gives his verdict.
4. CHANGING PATTERNS
4.1 Clustering Signatures
The analysis of changes in the clusters topology over a period of
time will provide valuable information for the better
understanding of the usage patterns of the telecommunications
services. In particular, the detection of abrupt changes in cluster
membership may provide strong evidences of a fraud situation.
We propose the application of dynamic clustering analysis
techniques over signature data. Our aim is that these changes will
also provide evidences to fraud analysts for establishing potential
fraud situations.
4.1.1 Similarity of Signatures
Signatures are composed of simple and complex variables.
Traditional similarity measures, like Euclidean distance, Pearson
correlation, Jaccard measure will not be applicable for signature
comparison. Therefore, we need to devise a new similarity
measure which will allow us to determine similarities among
signatures. We define the similarity between two signatures as the
combination of the variable similarity measures defined in section
3.1. For two signatures X and Y, where X
i
and Y
i
are respectively
the feature variable i of X and Y, and for n possible variables, the
similarity measure can be defined as in 6.
22
1111
),(...),(),(
nnnn
YXdWYXdWYXD ++=
(6)
D(X,Y) [0, 1] and W
i
defines the weight of the feature
and
.With this signature similarity measure, we
can compare all signatures. This will provide a N x N matrix, that
summarizes the similarities among the N signatures. The
clustering solution can then be obtained by taking into account the
previous calculated matrix as the input.
=
=
n
k
Wi
1
1
4.2 Clustering Migration Analysis
According to the moment of the week, different usage patterns
can be found [6]. These usage profiles are provided by means of
signature clustering analysis, according to the method describe
previously in section 4.1. Therefore, for each day of the week a
cluster topology is provided. This topology describes customers'
usage patterns during that period. Each cluster is described by the
characteristics of its centroid. The centroid is defined as a
signature. This allows making direct comparisons of the
signatures and clusters centroid. The comparison is made
according to the similarity formula 6.The signature assignment to
the cluster is done by comparing each signature against each
cluster centroid, and it is assigned to the cluster in which has the
smallest distance.
4.2.1 Absolute and Relative Similarity
In order to make the comparison of signatures against cluster
centroids, two types of similarity measures can be defined:
absolute and relative similarity. Absolute similarity defines the
similarity value between the signature and the centroid in a given
time moment t. This value is calculated according to formula 6.
Relative similarity relates the absolute similarity between instant t
and t+1, providing the percentage of the signature variation
between two consecutive time instants. This value is obtained
through the formula:
[]
[]
%100}
,(
,(
1{
1
×=
+
t
ii
t
ii
SSignClSD
SSignClSD
(7)
In formula 7, S
i
corresponds to a signature, and SignCl[S
i
] to the
cluster that S
i
belongs in the moment t. Figure 1 shows a positive
variation, where the signature S
i
is closer to the centroid 0 in the
instant t than in t+1.
Figure 1. Positive variation of the relative similarity of the
signature.
Figure 2. Negative variation of relative similarity of the
signature and change cluster membership.
A negative value of the relative similarity in the instant t+1,
indicates that the signature S
i
is now close to the centroid of the
cluster that it fits in the instant t. Nevertheless, we can detect to a
cluster membership change, since now S
i is now closer to another
cluster (cluster 1) (figure 2).
We define a cluster membership change as follows: a signature S
changes its cluster membership to cluster C
j
in the instant t+1, if it
belongs to cluster C
i
in the instant t, in the instant t+1 the distance
D(S,C
j
) is minimal concerning all clusters and D(S,C
j
)
t+1
<
D(S,C
i
)
t
. All the data relative to the cluster membership of the
signatures are kept for posterior analysis. These data, which we
call Historical data, will make possible to assess the evolution of
the customer behavior through time. In order to offer the analyst a
tool for a better examination of the changing behavior of the
customers, during a defined interval, analysis reports can be
generated [6]. This tool will provide the identification of all the
conditions used, as well as, the average and standard deviation of
the signatures variations and the maximum, minimum and
average values for all the signature feature variables. The
deviating signatures detected are included into a blacklist, for
further analysis.
Figure 3. Example of fraud situations in the blacklist.
Figure 3 shows an example of a real fraud situation detected by
our methods on the evaluation study. The first line contains a
header with the temporal reference, the analysis report
description, and the limits of the range [
µ-2σ, µ+2σ], which
indicates that any variation outside this limit is considered an
abnormal situation. For the next lines, it is listed the moment
when the anomaly was detected, the signature identification
(phone number), the cluster where the signature belongs, a flag
indicating a cluster membership change (1in the positive case),
the absolute similarity of the signature and the cluster, the relative
similarity (variation) and as the last column the description of the
respective analysis report. More detailed information about the
methods, as well as, scalability issues regarding its application
can be obtained in [5, 6].
5. EVALUATING REAL FRAUD
SITUATIONS
In order to assess the quality of our strategy methodology in
detecting anomalous behaviors, we have examined the data
correspondent to a week of voice calls from a Portuguese mobile
telecommunications network. The complete set of CDRs
corresponds to approximately 2.5 millions of records, and 700
thousand of signatures processed per day. Up to now, there isn’t
exists any accurate database with previous cases of fraud. Thus,
the settings of our methods were guided by a small list of 12
customers (fraudsters in the referenced week), provided by the
fraud analyst in order to detect other similar behaviors. In this
first stage of detecting anomalous situations, we are interested on
the effectiveness of our methods. Therefore, we worked on a
subset of the previous data concerning to a sample distribution
with approximately 5 thousands summaries per day and its
respective signatures to the whole week. The detection process
was carried out by applying the method described in section 3.
Several thresholds (ε) were used and basically four main distance
functions were designed combining different feature variables and
weights. An illustration of the alarms generated by the deviation-
based approach is given in table 2. Pay attention to the most right
(gray) column, further investigation on those alarms shown that
some of them were real fraud situations.
Table 2. Different thresholds (ε) and the alarms generated for
three particular days of the week.
(ε)/day
0.8 1.0 1.2 1.6
2.0
Tue
2141 649 139 50
25
Wed
3029 1145 251 103
56
Sat
1006 560 150 39
23
For getting more understanding under the circumstances in which
those alarms where generated one must investigate the impact of
each variable over the MAX distance function (Eq.4). In figure 4
we exemplify such evaluation by allowing top-k queries over the
complete set of alarms. We also verified that the most 10
imperative anomalous situations were raging from 2.76 up to 3.33
concerning its distance function. The feature variable which has
more impact over the distance calculation is the international call
(originated ones). On the other hand, in Figure 5 we can see that
workhours variable has great importance to the distance
calculation over the whole period.
Figure 4. The impact of each feature variable over the top-10
higher alarms.
Figure 5. An overall picture of feature variable distribution
over the max distance (ε 2).
It is important to mention that both methods provide just insights
that could be recognized as anomalous situations. In fact, the
characteristics of the data provided by the analysts don’t allow us
to apply any classification technique. Therefore, it is quite hard to
evaluate, precisely, the rates for false positive and false negatives.
Although, given the small list provided by analyst we can report a
recall of 75%.
In order to complement the previous results we further make use
of a dynamic clustering approach to detect suspect changes on
cluster membership over the whole week. The identification of
those changes will trigger alarms for future inspection. After
several executions, the qualities of the clusters were maximized
with 8 clusters. The distribution of the alarms raised by this
method can be figured out in table 3.
Table 3. Alarms raised per cluster for three particular days of
the week.
Cluster
Tue Wed Sat
1
3 9 1
2
9 7 123
3
3 12 71
4
5 17 16
5
23 21 22
6
20 31 40
7
8 11 26
8
52 72 0
The bottom (gray) line in table 3 shows the cluster with the
highest number of calls. Figure 6 shows an example of changing
on cluster membership, which represents a real fraud situation
identified by this method. The first and second customers pass
from a cluster (1 and 2) with a lower average of number of calls
to the cluster with the highest number of calls (8), in days 4 and 3
respective. The third customer in this example, although always in
the same cluster, has registered a significant variation between
days between days 5 and 6.
Figure 6. Example of anomaly situations regarded to the
increase in the number of calls.
By using dynamic clustering we can now report a recall of 91%.
As one can see this method is a little bit susceptible for detecting
anomalous situations than the previous one. This is explained by
the relative similarity measure (Eq. 7) which provides a fine
tuning of the clustering migration method by exploring signatures
relative variation over the time (whole week). Finally, the overlap
rate of both methods corresponds to approximately 62% for the
whole sample used, and 66% for the blacklist provided by the
fraud analyst. Meanwhile, the remaining cases, other anomalous
situations with the same behavior of the previous cases detected,
are under inspection by the company analysts. Thus, the next
efforts will be heading to the development of a database of fraud
cases, as well as, an induction rule engine to help analyst on the
evaluation of the alarms.
Concerning the scalability issues preliminary results showed to us
that the most costly step is the calculation of the summaries and
signatures. It requires several aggregations functions over CDRs
records with the purpose of grouping information by each
customer. At this time, this is done by several SQL scripts over a
Microsoft SQL Server 2005. By the time that this information is
available we can make use of each method discussed in this work
without pre-defined order to detect anomalies. When dealing with
such huge data we have realized that working with chunks of
information (summaries and signatures) plus clustered indexes
structures, it improves the processing time without losing quality
of the results by at least one order of magnitude. On the other
hand adds a new trouble, in sense that, when sliding the window
from ω to ω+1 requires rebuilding of the all respective indexes.
Finally, in case of using dynamic clustering we have divide the
original chunk of data D, into a set of partitions D’
i
, mutually
exclusive, in order to make the processing of each partition
feasible. After all partitions have been processed, the last step is
to merge all the clustering information resulted from each chunk
processed. The parameters that described the cluster topology
obtained for each block are gathered in a unique set of D’
f
. These
parameters are considered the data objects for further processing
of the final K clusters obtained. In a future work, we intend to
report several scenarios of utilization and optimization of the both
elements discussed in this work for detecting anomalous
situations.
6. FINAL DISCUSSION
In this work we have presented two methods for detecting telecom
fraud situations. Both methods rely on the concept of signature to
summarize the customer behavior through a certain period of
time. In the first approach, the user signature is used as a
comparison basis. A possible differentiation between the actual
behavior of the user and its signature may reveal an abnormal
situation. The second approach uses dynamic clustering analysis
in order to evaluate changes on cluster membership over the time.
The clear basis of these detection-based methods is that they
complement each other on reporting anomalous situations. For
instance in section 5 we show an overlapping of 66% fraud
situations which was raised by the proposed methods. The
experimental evaluation performed with data from a week of
voice calls, and respective comparison, with a list of previously
detected fraud cases, allowed us to conclude about the high rate of
true positives (91%) detected by the proposed methods.
Additionally, they discovered other fraud situations which were
not reported previously by the analysts. Preliminary discussion
with fraud analysts gave us feedback about the promising
capabilities of the proposed methodologies.
7. REFERENCES
[1] Y. Kou, T. Lu S. Sirwongwattana, and Y. Huang. Survey of
fraud detection techniques. In Proceedings of IEEE Intl
Conference on Networking, Sensing and Control, March
2004.
[2] T.F. Lunt. A survey of intrusion detection techniques.
Computer and Security, (53):405-418, 1999.
[3] Corrina Cortes and Daryl Pregibon. Signature-based methods
for data streams. Data Mining and Knowledge Discovery,
(5):167-182, 2001.
[4] Myers and Myers. Probability and Statistics for Engineers
and Scientists. Prentice Hall, 6th edition.
[5] Pedro Ferreira, Ronnie Alves, Orlando Belo and Luís
Cortesão. Establishing Fraud Detection Patterns Based on
Signatures. In Proceedings of Industrial Conference on Data
Mining´2006, July, 2006.
[6] Pedro Ferreira, Orlando Belo, Ronnie Alves, and Joel
Ribeiro. Fratelo - Fraud in Telecommunications: Technical
report. Tech Report 1, University of Minho, Department of
Informatics, May 2006.
... Profiling is a technique to identify behavioral patterns of users based on some properties available in a specific context. They are also referred to as signatures or patterns in the literature [1,2]. These profiles are usually constructed using data available from the past or current (direct or indirect) interactions with the system. ...
... However, the evalua-tion was carried out only on two users. Advancement of the above work was presented in [1], where after detecting the anomalous nodes, the signatures were clustered to observe the pattern of users. However, the feature set was poor while only considering the duration and number of calls with different granularity and not considering social network metrics that capture the impact of relationships. ...
Article
Full-text available
Fraud in telephony incurs huge revenue losses and causes a menace to both the service providers and legitimate users. This problem is growing alongside augmenting technologies. Yet, the works in this area are hindered by the availability of data and confidentiality of approaches. In this work, we deal with the problem of detecting different types of unsolicited users from spammers to fraudsters in a massive phone call network. Most of the malicious users in telecommunications have some of the characteristics in common. These characteristics can be defined by a set of features whose values are uncommon for normal users. We made use of graph-based metrics to detect profiles that are significantly far from the common user profiles in a real data log with millions of users. To achieve this, we looked for the high leverage points in the 99.99th percentile, which identified a substantial number of users as extreme anomalous points. Furthermore, clustering these points helped distinguish malicious users efficiently and minimized the problem space significantly. Convincingly, the learned profiles of these detected users coincided with fraudulent behaviors.
... Rather, we propose a unified ranking strategy which makes use of enhanced classical vertex measures combining attribute-based information and graph structural information, aggregating vertex measures into a unified measure for revealing abnormal patterns in call graphs. This network was also explored in other studies on telecom fraud detection by exploring customer behavior using signatures [6] and dynamic clustering [1]. The main contributions of the proposed strategy are: 1) a dynamic model for mining evolving call graph networks, so the model can be up-to-date when new information is available; b) a set of relevant vertex measures devised for allowing attribute-based and structural evaluation of call graphs; and 3 ) a vertex ranking function for mining abnormal K-vertices by applying a unified strategy that aggregates distinct vertex measures of interestingness. ...
... Further discussion about variables related to this dataset can be found in previous works [1,6]. In order to assess effectiveness, all relevance vertex measures are computed for all vertices of the 5% sample. ...
Conference Paper
Full-text available
Graphs are a very important abstraction to model complex structures and respective interactions, with a broad range of applications including web analysis, telecommunications, chemical informatics and bioinformatics. In this work we are interested in the application of graph mining to identify abnormal behavior patterns from telecom Call Detail Records (CDRs). Such behaviors could also be used to model essential business tasks in telecom, for example churning, fraud, or marketing strategies, where the number of customers is typically quite large. Therefore, it is important to rank the most interesting patterns for further analysis. We propose a vertex relevant ranking score as a unified measure for focusing the search of abnormal patterns in weighted call graphs based on CDRs. Classical graph-vertex measures usually expose a quantitative perspective of vertices in telecom call graphs. We aggregate wellknown vertex measures for handling attribute-based information usually provided by CDRs. Experimental evaluation carried out with real data streams, from a local mobile telecom company, showed us the feasibility of the proposed strategy.
... In the current section, we have discussed some of the established works related to the mobile fraud detection system. The authors of paper [4] proposed two anomaly detection methods which is dedicated fully to the knowledge of signatures in which the initial method is based on a signature deviation-based method whereas the another one is based on the method of dynamic cluster analysis. In the first method an abnormal condition is revealed when there is a distinction between the user's actual behavior and his signature while the second method evaluates changes on cluster membership values over time. ...
Article
Telecommunications fraud runs rampant recently around the world. Therefore, how to effectively detect fraudsters has become an increasingly challenging problem. However, previous studies either assume that the samples are independent of each other and use non-graph methods, or use local subgraphs with good connectivity for graph-based anomaly detection. Few prior works have performed graph-based fraud detection on real-world Call Detail Records (CDR) metadata sets with sparse connectivity. To solve this problem, we propose an end-to-end telecommunications fraud detection framework named Bridge To Graph (BTG). BTG leverages the subscriber synergy behavior to reconstruct connectivity, which bridges the gap between sparse connectivity data and graph machine learning. Concretely, we extract multi-model features from metadata and perform Box–Cox transformation first. Then, aiming at the sparse connectivity of real-world CDR metadata, the graph is reconstructed through dimensionally selectable link prediction of node similarity. Finally, the reconstructed graph and node features are input into the graph machine learning module for node embedding representation learning and fraud node classification. Comprehensive experiments on the real-world telecommunications network CDR data set show that our proposed method outperforms the classic methods in many metrics. Beyond telecom fraud detection, our method can also be extended to anomaly detection scenarios with no graph or sparse connectivity graph.
Conference Paper
Full-text available
Every year, the number of telecommunication fraud cases increases dramatically, and companies providing such services lose billions of euros worldwide. It has been receiving more and more attention lately mobile virtual network operators (MVNOs) which operate on top of existing cellular infrastructures of the basic operators, and at the same time are able to offer cheaper call plans. This paper is aimed to identify suspicious customers with unusual behaviour, typical to potential fraudsters in MVNO. In this study, different univariate outlier detection methods are applied. Univariate outliers are obtained using call detail records (CDR) and payments records information which is aggregated by users. A special emphasis in this paper is put on the metrics designed for outlier detection in the context of suspicious customer labelling which may support the fraud experts in evaluating customers and revealing fraud. In this research, we identified specific attributes that could be applied for fraud detection. Threshold values were found for the attributes examined, which could be used to compile lists of suspicious users.
Article
Recent years have witnessed the rapid growth of mobile virtual network operators (MVNOs), which operate on top of existing cellular infrastructures of base carriers, while offering cheaper or more flexible data plans compared to those of the base carriers. In this paper, we present a two-year measurement study towards understanding various fundamental aspects of today's MVNO ecosystem, including its architecture, customers, performance, economics, and the complex interplay with the base carrier. Our study focuses on a large commercial MVNO with one million customers, operating atop a nation-wide base carrier. Our measurements clarify several key concerns raised by MVNO customers, such as inaccurate billing and potential performance discrimination with the base carrier. We also leverage big data analytics, statistical modeling, and machine learning to address the MVNO's key concerns with regard to data usage prediction, data plan reselling, customer churn mitigation, and billing delay reduction. Our proposed techniques can help achieve higher revenues and improved services for commercial MVNOs.
Chapter
User profiling is the process of constructing a normal profile by accumulating the past calling behavior of a user. The technique of clustering focusses on outcome of a structure or an intrinsic grouping in unlabeled data collection. In this paper, our main intention is on building appropriate user profile by applying generalized possibilistic fuzzy c-means (GPFCM) clustering technique. All the call features required to build a user profile is collected from the call detail record of the individual users. The behavioral profile modeling of users is prepared by implementing the clustering on two relevant calling features from the reality-mining dataset. The labels are not present in the dataset and thus we have applied clustering which is an unsupervised approach. Before applying the clustering algorithm, a proper cluster validity analysis has to be done for finding the best cluster value and then the cluster analysis is done using some performance parameters.
Article
Full-text available
Anomaly detection is an important aspect of any security mechanism. We present an efficient anomaly detection algorithm, named. Using Belief Networks (BNs), the algorithm identifies abnormal behavior of a feature, like inappropriate energy consumption of a node in a network. By applying structure learning techniques to training dataset, BANBAD establishes a joint probability distribution among relevant features, such as average velocity, displacement, local computation and communication time, energy consumption, and response time of a node of the network. A directed acyclic graph (DAG) is used to represent the features and their dependencies. Using a training process, BANBAD maintains dynamic, updated profiles of network node behaviors and uses specific Bayesian inference algorithm to distinguish abnormal behavior during testing. BANBAD works especially well in ad hoc networks. Extensive simulation results demonstrate that a centralized BANBAD achieves low false alarm rates, below 5%, and high detection rates, greater than 95%. We also show that BANBAD detects anomaly efficiently and accurately in two real datasets. The key for achieving such high performance is bounding the false alarm rate at certain predefined threshold value. By fine-tuning at the threshold, we can achieve high detection rate as well.
Book
Full-text available
The Fourteenth International Conference on Networks (ICN 2015), held between April 19th-24th, 2015 in Barcelona, Spain, continued a series of events focusing on the advances in the field of networks. ICN 2015 welcomed technical papers presenting research and practical results, position papers addressing the pros and cons of specific proposals, such as those being discussed in the standard fora or in industry consortia, survey papers addressing the key problems and solutions, short papers on work in progress, and panel proposals. ICN 2015 also featured the following Symposium: - SOFTNETWORKING 2015: The International Symposium on Advances in Software Defined Networks
Conference Paper
Full-text available
All over the world we have been assisting to a signican t increase of the telecommunication systems usage. People are faced day after day with strong marketing campaigns seeking their attention to new telecommunication products and services. Telecommunication com- panies struggle in a high competitive business arena. It seems that their eorts were well done, because customers are strongly adopting the new trends and use (and abuse) systematically communication services in their quotidian. Although fraud situations are rare, they are increasing and they correspond to a large amount of money that telecommunication companies lose every year. In this work, we studied the problem of fraud detection in telecommunication systems, especially the cases of superim- posed fraud, providing an anomaly detection technique, supported by a signature schema. Our main goal is to detect deviate behaviors in useful time, giving better basis to fraud analysts to be more accurate in their decisions in the establishment of potential fraud situations.
Conference Paper
Full-text available
Due to the dramatic increase of fraud which results in loss of billions of dollars worldwide each year, several modern techniques in detecting fraud are continually developed and applied to many business fields. Fraud detection involves monitoring the behavior of populations of users in order to estimate, detect, or avoid undesirable behavior. Undesirable behavior is a broad term including delinquency, fraud, intrusion, and account defaulting. This paper presents a survey of current techniques used in credit card fraud detection, telecommunication fraud detection, and computer intrusion detection. The goal of this paper is to provide a comprehensive review of different techniques to detect frauds.
Article
Today's computer systems are vulnerable both to abuse by insiders and to penetration by outsiders, as evidenced by the growing number of incidents reported in the press. To close all security loopholes from today's systems is infeasible, and no combination of technologies can prevent legitimate users from abusing their authority in a system; thus auditing is viewed as the last line of defense.Over the past several years, the computer security community has been developing automated tools to analyze computer system audit data for suspicious user behavior. This paper describes the use of such tools for detecting computer system intrusion and describes further technologies that may be of use for intrusion detection in the future.
Article
We have been developing signature-based methods in the telecommunications industry for the past 5 years. In this paper, we describe our work as it evolved due to improvements in technology and our aggressive attitude toward scale. We discuss the types of features that our signatures contain, nuances of how these are updated through time, our treatment of outliers, and the trade-off between time-driven and event-driven processing. We provide a number of examples, all drawn from the application of signatures to toll fraud detection.
Fratelo -Fraud in Telecommunications
  • Pedro Ferreira
  • Orlando Belo
  • Ronnie Alves
  • Joel Ribeiro
Pedro Ferreira, Orlando Belo, Ronnie Alves, and Joel Ribeiro. Fratelo -Fraud in Telecommunications: Technical report. Tech Report 1, University of Minho, Department of Informatics, May 2006.