Conference PaperPDF Available

Analysis and detection of SIMbox fraud in mobility networks

Authors:
  • Independent Researcher

Abstract and Figures

Voice traffic termination fraud, often referred to as Subscriber Identity Module box (SIMbox) fraud, is a common illegal practice on mobile networks. As a result, cellular operators around the globe lose billions annually. Moreover, SIMboxes compromise the cellular network infrastructure by overloading local base stations serving these devices. This paper analyzes the fraudulent traffic from SIMboxes operating with a large number of SIM cards. It processes hundreds of millions of anonymized voice call detail records (CDRs) from one of the main cellular operators in the United States. In addition to overloading voice traffic, fraudulent SIMboxes are observed to have static physical locations and to generate disproportionately large volume of outgoing calls. Based on these observations, novel classifiers for fraudulent SIMbox detection in mobility networks are proposed. Their outputs are optimally fused to increase the detection rate. The operator's fraud department confirmed that the algorithm succeeds in detecting new fraudulent SIMboxes.
Content may be subject to copyright.
Analysis and Detection of SIMbox Fraud in
Mobility Networks
Ilona Murynets, Michael Zabarankin+, Roger Piqueras Joverand Adam Panagia
AT&T Security Research Center, New York, NY
+Stevens Institute of Technology, Hoboken, NJ
AT&T Financial Billing Operations, Picataway, NJ
{ilona,roger.jover}@att.com, +mzabaran@stevens.edu, adam.panagia@att.com
Abstract—Voice traffic termination fraud, often referred to as
Subscriber Identity Module box (SIMbox) fraud, is a common
illegal practice on mobile networks. As a result, cellular operators
around the globe lose billions annually. Moreover, SIMboxes
compromise the cellular network infrastructure by overloading
local base stations serving these devices. This paper analyzes the
fraudulent traffic from SIMboxes operating with a large number
of SIM cards. It processes hundreds of millions of anonymized
voice call detail records (CDRs) from one of the main cellular
operators in the United States. In addition to overloading voice
traffic, fraudulent SIMboxes are observed to have static physical
locations and to generate disproportionately large volume of
outgoing calls. Based on these observations, novel classifiers for
fraudulent SIMbox detection in mobility networks are proposed.
Their outputs are optimally fused to increase the detection rate.
The operator’s fraud department confirmed that the algorithm
succeeds in detecting new fraudulent SIMboxes.
I. INTRODUCTION
Cellular network operators lose about 3% of the annual
revenue due to fraudulent and illegal services. Juniper Re-
search estimated the total losses from the underground mobile
network industry to be $58 billion in 2011 [1], [2]. The
impact of voice traffic termination fraud, commonly known as
Subscriber Identity Module (SIM)-box fraud or bypass fraud,
on mobile networks is particularly severe in some parts of the
globe [2]. Recent highly publicized raids on fraudsters include
those in Mauritius, Haiti, and El Salvador [3].
Fraudulent SIMboxes hijack international voice calls and
transfer them over the Internet to a cellular device, which
injects them back into the cellular network. As a result, the
calls become local at the destination network [4], and the
cellular operators of the intermediate and destination networks
do not receive payments for the call routing and termination.
Fraudulent SIMboxes also hijack domestic traffic in certain
areas, e.g. in Alaska within the United States, where call
termination costs are high. In some cases, the traffic is injected
into a cellular network and is forwarded to the terminating
country. This increases the call routing cost for the operator
of the injected traffic.
Besides causing the economic loss, SIMboxes degrade the
local service where they operate. Often, cells are overloaded,
and voice calls routed over a SIMbox have poor quality, which
results in customer dissatisfaction.
Although some vendors provide cellular anti-fraud services,
the large amount of daily cellular traffic and the number of
connected mobile devices make detecting call bypassing fraud
extremely challenging. Moreover, traffic patterns and charac-
teristics of fraudulent SIMboxes are very similar to those of
certain legitimate devices, such as cellular network probes. So,
detecting fraudulent SIMboxes resembles searching for a few
needles in a huge haystack full of small objects that look like
needles. While operators of the intermediate and destination
networks have high financial incentives to understand the
problem, they do not have the data to analyze the international
calls that are gone. Also, the absence of publicly available
SIMbox-related data is a major obstacle for emerging of
comprehensive studies on voice bypassing fraud analysis and
detection [5]. By contrast, most of the SIMbox traffic, analyzed
in this paper, is on the originating end of the communication,
giving us insight on SIMbox fraud from a different perspective
than most networks with a bypass problem.
In this work, we analyze fraudulent SIMbox traffic based on
anonymized communication data from one of the major tier-
1 network operators in the United States. SIMboxes operate
with a large number of SIM cards from foreign and national
operators. If an operator detects and shuts down a fraudulent
account, the fraudsters deploy a set of new SIM cards as in the
Short Message Service (SMS) spam fraud [6]. Also, fraudulent
SIMboxes have almost static physical locations and generate
disproportionately large number of outgoing calls (100 times
as many as incoming calls). Based on these observations,
we introduce three classifiers for fraudulent SIMboxes and
combine their outputs into a classification rule, which has a
high detection rate and correctly filters out mobile network
probes with traffic patterns similar to those of SIMboxes.
This paper is organized into five sections. Section II
overviews voice termination fraud in mobility networks and
illustrates it with some basic examples. Section III analyzes
SIMbox related traffic, compares it to the legitimate traffic,
and, based on the extracted features, presents a novel algo-
rithm for SIMbox detection in mobility networks. Section
IV overviews the related work, and Section V concludes the
paper.
II. VOICE FRAUD IN MOBILE NETWORKS
SIMbox voice fraud occurs when the cost of terminating
domestic or international calls exceeds the cost of a lo-
cal mobile-to-mobile call in a particular region or country.
2
(a)
(b)
Fig. 1. Two examples of one-hope SIM-box bypass fraud: (a) hijacking of an international call (b) hijacking and re-injecting of an international call.
Fraudsters make profit by offering low-cost international and
sometimes domestic voice calls to other operators. To bypass
call routing fees, they buy or hijack large amounts of SIM
cards, install them into an off-the-shelf hardware1to connect
to the mobility network, which essentially becomes a SIMbox.
Then the fraudsters transfer a call via the Internet to a SIMbox
1The hardware can be used for legitimate purposes, for example in machine-
to-machine (M2M) applications. It has been recently reported to transmit SMS
spam [7].
in the area of call recipient to deliver the call as local.
As a result, the operators serving the called party do not
receive the corresponding call termination fees. In other cases,
SIMboxes re-inject telecom voice traffic into the mobility
network masked as mobile customer calls, and the operator
pays for carrying the re-injected calls.
Figure 1 shows two examples of how SIMbox bypass fraud
occurs for international phone calls. For simplicity, the exam-
ples assume that there is only one intermediate hop connection
3
Fig. 2. Legitimate International calling card scheme
between two pairs of countries: country Ato country Band
country Ato country C. The solid line marks a legitimate path
for a phone call, whereas the dotted line indicates a fraudulent
one when a SIMbox is in place. Actual SIMbox fraud is often
more complex, involving multiple intermediate steps.
In Figure 1(a), Alice, who lives in country A, calls Bob, who
lives in country B. In the legitimate case, once she dials Bob’s
number, the call is routed through the cellular infrastructure
of operator Ato a least cost route (LCR) carrier. Based
on an agreement between operator Band the LCR carrier,
the call is routed to operator B’s cellular core network. The
LCR carrier pays operator Ba fee in order to have the call
terminated. Then the call is routed through the operator B’s
cellular infrastructure and is delivered to Bob.
The fraud occurs when a fraudulent LCR carrier hijacks
Alice’s call and forwards it to country Bover the Internet,
e.g. via VoIP. Then in country B, a SIMbox (an associate of
the fraudulent LCR carrier) transforms the incoming VoIP flow
into a local mobile call to Bob, and the operator Blooses the
termination fee for the hijacked call.
Figure 1(b) shows a more elaborate SIMbox fraud scheme,
which consists of three stages. In the first stage, the fraudsters
hijack legitimate SIM cards from customers of operator A
and put them into a SIMbox. For example, they call Alice and
trick her into providing her account information. Then they
impersonate Alice and link her wireless account to their own
SIM card. As a result, Alice’s phone will be unable to connect,
whereas the SIMbox traffic will be charged to her account.
In the second stage, suppose that Bob lives in country C
and has a SIM card from operator C. International roaming
agreements and standard industry practices specific for country
Cprescribe to pass call traffic from either other network
operators or the Public Switched Telephone Network (PSTN)
(low priority traffic) through an LCR carrier and to pass the
traffic of operator As retail customers (high priority traffic)
via foreign operators (high cost routes). Suppose a low priority
call, originating in country A, is intended to reach Bob. In
the legitimate case, this call is routed over an LCR and is
terminated in operator C’s network. In this stage, the fraud
occurs when an illegitimate LCR routes the traffic over IP to
the SIMbox in country A.
Finally, in the third stage, the SIMbox injects the traffic
destined to Bob into the cellular network of operator A. At
this point, the communication becomes a wireless call from the
SIM card of retail customer Alice, so that the communication
is routed to country Cover a high cost foreign provider. In
this case, operator Calways receives the call termination fee
either from the legitimate LCR or from the foreign providers,
whereas when Alice reports her stolen identity, operator A
becomes liable for the cost of the call.
The following example illustrates the difference between
legitimate international calling cards (PennyTalk [8], ZapTel
[9], Vonage [10], etc.) and SIMbox fraud that reroutes voice
calls via VoIP. Suppose that Alice, who is in country A, pur-
chased an international calling card to call Bob in country B
(see Figure 2). She dials either a local or toll free number that
connects her to the calling card platform through operator A’s
cellular network. The platform requests and verifies Alice’s
card access code, and as soon as she enters Bob’s phone
number, it forwards her call over the PSTN to country B, and
Bob receives a local low cost call connecting him to Alice. A
portion of the call path might be routed over IP. When Alice
uses the international calling card, she is aware that her call
will be routed via VoIP, and so, she agrees to get a low call
quality. In this case, despite no termination fees, the operators
of Alice’s and Bob’s providers make profit from the legitimate
4
CDR Field Description
Time date and time of a call
Duration call duration
Originating number phone number of a caller
Originating country code country of a caller
Terminating number phone number of a called
party
Terminating country code country of a called party
Call type mobile originated/terminated
call
IMEI
international mobile
equipment identity (device
identifier)
IMSI
international mobile
subscriber identity (user
identifier)
LAC-CID
location area code and cell
ID (base station location
identifier)
Account age time since account activation
Customer segment prepaid/postpaid/corporate
account
TABLE I
VOI CE CAL L DETAI L RECO RD FI ELD S
agreement between them and the calling card.
Yet another SIMbox fraud is bypassing interconnection
fees in a high cost domestic traffic, which uses the schemes
described in the above examples.
III. SIMBOX F RAU D ANALYS IS A ND D ET EC TI ON
A. Data feeds and anonymization
We analyze samples of fully anonymized call detail records
(CDRs) from a tier-1 cellular operator in the United States
between February 2012 and June 2013. CDRs are logs of all
phone calls, text messages, and data exchanges in the network.
If two communicating parties (caller and receiver) belong to
the same cellular provider, two records are stored. The mobile
originated (MO) and mobile terminated (MT) records store
data of the caller and receiver, respectively.
At a CDR pre-processing step, all individual identifiers are
removed: caller and receiver phone numbers are replaced by
integer hashes (anonymized), the international mobile sub-
scriber identity (IMSI) is parsed and hashed, and only the first
8 digits of the international mobile equipment identity (IMEI)
are preserved and anonymized.2Table I summarizes the CDR
fields used in the analysis.
All results are normalized and aggregated to obfuscate total
counts of calls and the number of devices in the network.
2This first segment of the IMEI, known as the Type Allocation Code (TAC),
determines the manufacturer and model of a wireless device. For a phone, the
TAC identifies the phone manufacturer and model (e.g. Samsung Galaxy S4),
whereas for an M2M connected device, it identifies the embedded cellular
modem (e.g. Sierra Wireless Q2687).
B. Data sample selection and labeling
The data set contains CDRs of 500 IMEIs of fraudulent
SIMboxes and of about 93000 legitimate accounts. The fraud-
ulent SIMbox accounts were investigated by the operator’s
fraud department and cancelled due to their malicious activity.
The legitimate accounts consist of fully anonymized post-paid
family plans, unlikely to be involved in fraudulent activities
[7], corporate accounts, and mobile network probing devices.
It is a common practice that local and foreign cellular
operators and device manufacturers probe the mobility net-
work to measure the quality of service in terms of latency,
to test upcoming new cellular devices, etc. [11]. Probing
devices generate a rather large number of voice calls, most
of which are addressed to different recipients. This contrasts
with the communication pattern of regular users, who make
less phone calls to fewer contacts [12]. Also, probing devices
reuse the same IMSI for multiple physical devices as part of
the probing infrastructure. This results in CDRs from each
IMSI originating in multiple locations.
The data set is split into two parts: 66% and 34% of
the labeled accounts are used for the training and testing,
respectively.
C. Call traffic features
CDR fields in Table I (collected during one week in 2013)
are transformed into 48 features characterizing voice call
communication patterns of legitimate and fraudulent IMEIs.
Features such as average MO and MT call durations, account
age, customer segment are obtained from the corresponding
CDR fields. The total number of outgoing and incoming
calls along with their corresponding destinations and origins
(international and domestic) are counted based on MO and MT
time stamps and based on originating and terminating country
codes. Since SIMboxes typically use multiple SIMs, one of the
features is the number of IMSIs operated per IMEI, counted
for both one week in 2013 and the period of February 2012
– June 2013.
The geo-location feature is the number of base stations (MO
and MT) that IMEI connected to during that week and is
obtained from the LAC-CID field. Some features are derived
from the others, e.g. the ratio of the number of destination to
the total number of calls, the ratio of international calls to the
total number of calls, etc.
D. SIMbox data analysis
This sub-section analyzes the traffic characteristics of fraud-
ulent SIMboxes based on the features described in Section
III-C. Figure 3(a) plots the number of MO calls versus the
number of locations from which these calls originate. Dots and
triangles correspond to legitimate and fraudulent SIMbox ac-
counts, respectively. To avoid disclosing sensitive information,
the four plots are normalized with the same arbitrary positive
integer. Figure 3(a) shows that SIMboxes are physically static
5
(a) (b)
(c) (d)
Fig. 3. Voice call traffic characteristics of SIMboxes (dots) and legitimate (triangles) accounts
and connect to a very small number of base stations.3By
contrast, legitimate customers are highly mobile and connect to
multiple base stations. In particular, network probing devices
have a very large number of locations. Figure 3(b) plots the
number of MO calls versus the number of MT calls. SIMboxes
initiate thousand times as many calls as they receive and of
significantly longer duration. By contrast, legitimate accounts
have about the same number of initiated and received calls,
which are of about the same duration.
Figure 3(c) plots the total number of MO international calls
versus the number of IMSIs. Majority of legitimate customers
are clustered around low values of both IMSI and international
calls. Several network probing devices operate even more
SIMs than fraudulent SIMboxes, however, they make less
international calls.
Figure 3(d) shows the number of IMSIs for the period
3This number is not necessarily one because many cellular devices connect
to the network by means of a third generation (3G) technology based on
wideband code multiple division access (WCDMA). In this technology, a
device can be physically connected to up to 6 sectors at the same time,
combining the signal at the receiver [13]. Depending on the channel conditions
and fading, the serving base station might fluctuate throughout the set of 6
base station IDs. As a result, CDR records from the same static device will
come from up to 6 different sectors.
IMSIs per IMEI
Frequency
Fig. 4. The distribution of the number of IMSIs per IMEI
February 2012 – June 2013 versus the average duration of
MT calls. MT call durations of fraudulent SIMboxes are much
shorter than those of legitimate customers. Also, in contrast to
some network probing devices, SIMboxes operate less IMSI
during longer periods due to frequent account cancellation by
the operator.
6
The distribution of the number of IMSIs per IMEI from
February 2012 till June 2013 is obtained by parsing CDRs
of all the operator’s active subscribers (see Figure 4, where
the actual values on each axis are omitted to hide sensitive
information). As expected, the majority of devices, connected
to the mobile network, operate with one or just a few SIM
cards (IMSIs). An IMEI operating two SIM cards can repre-
sent, for example, a used device which was bought on eBay.
Also, Figure 4 shows two big spikes of IMEIs operating with
many SIM cards. Further analysis identifies those IMEIs as
legitimate cellular network probes.
E. Classification algorithm: training, testing, and implemen-
tation
To detect fraudulent SIMboxes, we use three classifiers: (i)
alternating decision tree [14], [15], (ii) functional tree [16],
and (iii) random forest [17], each of which is, in fact, a
combination of “simple” classifiers. A classification rule is
then presented as a linear combination of the three classifiers
with weight coefficients found from minimization of the total
prediction error on the training dataset.
A boosting method combines “weak” classifiers into a
“strong” classifier by repeatedly re-weighting training data.
It can focus not only on the majority of the training data,
which is usually classified correctly, but also on the outlier
records, which are often misclassified. An alternating decision
tree is derived from a variant of the boosting method and
combines decision stumps, i.e. single question decision trees.
It has two types of nodes: test nodes and predictor nodes and
alternates between them starting from the root node, which
is a predictor node. In contrast to a standard decision tree,
the alternating one allows a data record to follow multiple
paths, whose weights (could be positive and negative) are then
summed up with the sign of that sum determining the predicted
value. The random forest uses a random subset of features to
generate multiple decision trees from a sampled training set
with replacement. The prediction is then determined by the
majority rule (common vote) of the generated decision trees.
At each node, a standard decision tree performs a simple value
test with a single feature, whereas to improve classification
accuracy, a functional tree linearly combines several features
at both decision nodes and leaves, which is advantageous for
large data sets. We use the Waikato environment for knowledge
analysis (WEKA) data mining tool [18], which implements
these decision trees.
Let the alternating tree, functional tree, and random forest
be classifiers 1, 2, and 3, respectively, and let xki ∈ {0,1}be
the prediction of classifier kfor testing record i∈ {1, . . . , n},
where xki = 1 if classifier kpredicts record ias a SIMbox,
and xki = 0 if it predicts the record as a legitimate customer.
Let yi∈ {0,1}be an actual label of data record i(yi= 0
and yi= 1 if record iis a legitimate account and a fraudulent
SIMbox, respectively). To improve classification accuracy, the
data labels can be regressed with respect to the predictions of
the three classifiers:
byi=w1x1i+w2x2i+w3x3i,(1)
where byiis the prediction of data label yiand w1+w2+w3=
1. The absence of the intercept in (1) and the constraint w1+
w2+w3= 1 guarantee that the cases when all three classifier
predict either 0 or 1 will be classified as a legitimate customer
and a SIMbox, respectively. The regression coefficients w1,
w2, and w3are found by the least squares method
min
w1,w2,w3
n
X
i=1
(byiyi)2s.t. w1+w2+w3= 1,(2)
which yields a system of four linear equations with respect to
w1,w2,w3, and the Lagrange multiplier corresponding to the
constraint w1+w2+w3= 1. Now, the classification rule is
given by
y
i(x) = (1, w1x1i+w2x2i+w3x3i> α,
0,otherwise,
where the threshold αis found from minimizing the classifi-
cation error
min
α
n
X
i=1
(y
iyi)2.(3)
Observe that the regression coefficients w1,w2, and w3and the
threshold αcan be found in “one shot” from the optimization
problem
min
w1,w2,w3
n
X
i=1
(I{byi}yi)2s.t. w1+w2+w3= 1,(4)
where I{·} is the indicator function equal to 1 if the condition
in curly brackets is true and zero otherwise. However, (4) is a
nonlinear problem of four variables, whereas (2) has a closed-
form solution, and the problem (3), though being similar to
(4), has only a single variable.
Algorithm 1 summarizes the suggested approach.
Algorithm 1 Classification rule
1: τ=Time interval for collecting CDRs
2: S=Set of indices of IMEIs of all known fraudulent
SIMboxes (investigated by the fraud department)
3: L=Set of indices of IMEIs of known legitimate devices
4: for all jSLdo
5: Generate features for the period τ
6: Generate alternating decision tree, functional tree, and
random forest and compute their optimal weights w1,
w2, and w3along with corresponding threshold α
7: end for
Alternatively, given the predictions (x1i, x2i, x3i),i=
1, . . . , n, the data labels can be classified by the logistic
regression and a support vector machine (SVM) [19]. However,
the numerical experiments with the training data show that
both these methods perform worse than the regression problem
(2) with (3). This could be partially explained by the fact
7
TABLE II
CLASSIFICATION RES ULTS
Method False Positive False Negative Accuracy (%)
Alternating decision tree 0.0005 0.1 99.91
Functional tree 0.0007 0.07 99.90
Random forest 0.0001 0.16 99.93
Classification rule 0.0001 0.09 99.95
that the overwhelming majority of triplets (x1i, x2i, x3i), being
either (0,0,0) or (1,1,1), correspond to correct training data
labels (0 or 1, respectively), so that they do not affect the
objective function in (2), whereas they skew the logistic
regression. For the training data, the hard margin SVM is not
separable, whereas the soft margin SVM requires to specify the
extent of misclassifications (through an additional parameter)
and involves n+ 4 variables, which makes it less attractive
from the computational perspective.
The three classifier were trained on the training dataset,
for which their optimal weight coefficients are w1= 0.31,
w2= 0.26, and w3= 0.43 with the corresponding threshold
to be α= 0.39. Table II shows predictions of the three
classifiers and of their optimal linear combination on the test-
ing dataset. The false positive is the proportion of legitimate
customers classified as SIMboxes, whereas false negative is
the proportion of SIMboxes classified as legitimate accounts.
The “accuracy” column in Table II shows the proportions of
the total correct classifications. Among the three classifiers,
the random forest has the lowest false positive and the highest
false negative, whereas the functional tree has the lowest false
negative and the highest false positive. The optimal linear
combination of the three classifiers improves both the false
positive and false negative and has the highest accuracy of
99.95%.
For a real mobility network, generating of the set of features
for each account requires processing hundreds of millions
of CDRs, in which case, accounts with less than 10 IMSIs
per IMEI that are unlikely to be very active SIMboxes are
filtered out.4Additionally, all known legitimate accounts such
as network probing and corporate accounts are also filtered
out. Thus, after the pre-processing, only 0.02% of all active
accounts remains for feature extraction and classification.
Algorithm 2 incorporates Algorithm 1 and presents a scheme
for detecting new fraudulent SIMboxes in the real mobility
network. The operator’s fraud department confirmed that Al-
gorithm 2 successfully identifies new SIMboxes.
IV. REL ATED WOR K
The rise of the mobile communication technology for the
past decade is mirrored by the range and sophistication of
illegal activities on mobile networks. SMS spam is by far the
most prevailing illegal activity that has attracted considerable
4Section III shows that fraudulent SIMboxes use a large number of SIM
cards. Thus, first we calculate the total number of SIMs used by each active
IMEI in the network within a week. The threshold of 10 IMSIs is arbitrary.
Algorithm 2 SIMbox detection
1: loop
2: Run Algorithm 1 for the period τ
3: N=Set of indices of all IMEIs in the network
4: Ω=Ø
5: Θ = Set of indices of IMEIs corresponding to known
network probing devices and corporate accounts
6: h=Filtering threshold
7: for all iN\Θdo
8: mi=number of IMSIs operated by IMEIi
9: if mi>hthen
10: Ω=Ω∪ {i}
11: end if
12: end for
13: for all ido
14: Generate features, then apply the alternating tree,
functional tree, and random forest obtained at step
2 to the features and compute the prediction y
i
15: end for
16: return Detected SIMboxes to the fraud department
17: update Sand Lbased on the feedback from the fraud
department
18: end loop
attention from both industry and academy [20]. In this fraud,
mobile users are tricked to visit phishing urls and to provide
sensitive information. As SIMbox fraudsters, spammers also
operate large numbers of SIM cards from each IMEI [6]. The
volume of SMS spam is expected to grow at the annual rate of
500% [21]. For a detailed analysis of SMS spam on mobility
networks, see [7].
Smart phone GGtracker malware [22] is yet another ex-
ample of recent fraudulent activities. Being embedded into
a legitimately-looking app, it silently subscribes users to
premium number services such as a horoscope for $10 per
month and hides all communications with those premium
numbers. The users learn that they are victims of the fraud
only in the end of a billing cycle when they see excessive
charges on their bills. For an overview of other malware on
Android platforms, see [23].
Subscription fraud occurs when fraudsters steal customer’s
identification and use it to subscribe to a mobility network
[24], [25]. With a low cost technique, a fraudster can sniff
traffic from a GSM (Global System for Mobile Communi-
cations) mobility network and break its encryption [26]. As a
8
result, he can obtain the IMSI and the secret key of any victim
in his vicinity and then can use wireless service at the victim’s
expense. As with the GGTracker malware, victims learn about
the fraud only when their bills arrive.
Several security firms offer their services for detection and
prevention of SIMbox fraud [27], [28], however, details of
their detection techniques are not disclosed. To the best of
our knowledge, there is only one publicly available work
[5] on this subject. It uses artificial neural networks (multi
layer perception method) to detect fraudulent SIMboxes based
on 9 voice call communication features for 6415 subscribers
from one Cell ID (234,324 calls in total). The method detects
SIMboxes with 98.71% accuracy.
In contrast to [5], our classification rule is trained and tested
on a larger data sample of accounts distributed nationwide, and
our features are computed per IMEI (device identifier) rather
than per subscriber identifier. Since SIMboxes operate with
multiple SIMs, 500 IMEIs of fraudulent SIMboxes correspond
to thousands of fraudulent SIMbox subscriber identifiers. Only
few of our features coincide with those in [5]. For example,
the number of locations, which we consider, is quite important
but is not relevant for [5], since all subscribers in [5] were in
one Cell ID. Other important new features include the number
of SIM cards, the total number of international calls and its
ratio to the total number of calls. Also, we show that network
probing devices have communication patterns similar to those
of SIMboxes.
V. CONCLUSIONS
We have analyzed voice call communication features of
fraudulent SIMboxes in the mobility network of a major tier-
1 network operator in the United States and have identified
call traffic patterns distinguishing fraudulent SIMboxes from
legitimate devices. Those patterns include high number of
IMSIs per IMEI, large number of international phone calls,
imbalance between MO and MT traffic (international and
domestic) and static physical location. Based on the features,
we have proposed three classifiers of fraudulent SIMboxes in
mobility networks: alternating decision tree, functional tree,
and random forest. The random forest and functional decision
tree provide the lowest false positive and the lowest false
negative, respectively. The false positive of the alternating
decision tree is lower than that of the functional tree, and
its false negative is lower than that of the random forest. The
predictions of the three classifiers have been linearly combined
into a classification rule, where classifiers’ weight coefficients
have been found from minimization of the total classification
error on the training dataset. The random forest has the largest
weight coefficient followed by that of the alternating decision
tree. The accuracy of the classification rule is 99.95%. For
large data sets, the scalability of the algorithm can be improved
by filtering out accounts with less than 10 IMSIs (99.98% of
all active subscribers). The operator’s fraud department has
confirmed that the proposed algorithm detects new fraudulent
SIMboxes with a low false positive.
ACKNOWLEDGEMENTS
We are grateful to Angus MacLellan and Richard Becker
for their help, comments, and valuable suggestions.
REFERENCES
[1] H. Windsor, “Mobile Revenue Assurance & Fraud Management,” Ju-
niper Research, http://goo.gl/GX7G4.
[2] M. Yelland, “Fraud in mobile networks,” Computer Fraud & Security,
vol. 2013, no. 3, pp. 5–9, 2013.
[3] “Raids on SIM Box/GSM Gateway Fraudsters Save Mobile Operators
Millions,” Reuters, http://goo.gl/pHCpK.
[4] “Fraud in the Mobile World,” Revector, http://goo.gl/Uobx6.
[5] A. H. Elmi, S. Ibrahim, and R. Sallehuddin, “Detecting sim box fraud
using neural network,” in IT Convergence and Security 2012. Springer,
2013, pp. 575–582.
[6] I. Murynets and R. Piqueras Jover, “Analysis of SMS Spam in Mobility
Networks,” in International Journal of Advanced Computer Science
(IJACSci), vol. 3, num.7, July 2013.
[7] I. Murynets and R. Piqueras Jover, “Crime scene investigation: SMS
spam data analysis,” in Proceedings of the 2012 ACM conference on
Internet measurement. ACM, 2012, pp. 441–452.
[8] “PennyTalk,” http://www.pennytalk.com/.
[9] “ZapTel Calling Cards,” http://www.zaptel.com/.
[10] “Vonage Calling Cards,” http://lp.vonage.com/callingcard.
[11] “RCATS - Remote Cellular Active Test System,” JDSU, http://goo.gl/
VEbMA.
[12] A.-L. Barab´
asi and R. Albert, “Emergence of scaling in random net-
works,” science, vol. 286, no. 5439, pp. 509–512, 1999.
[13] Universal Mobile Telecommunications System (UMTS), “Physical layer
procedures (FDD). 3GPP TS 25.214,” vol. v3.17.0, 1999.
[14] Y. Freund and L. Mason, “The alternating Decision Tree Learning
Algorithm,” 1999.
[15] G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, and M. Hall, “Multiclass
Alternating Decision Trees,” 2002.
[16] J. Gama, “Functional trees,” Machine Learning, no. 55, 2004.
[17] L. Breiman, “Random Forests,” vol. 45.
[18] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and
I. Witten, “The WEKA Data Mining Software: An Update,” in SIGKDD
Explorations, vol. 11.
[19] M. Zabarankin and S. Uryasev, Statistical Decision Problems: Selected
Concepts and Portfolio Safeguard Case Studies. Springer, 2013, to
appear.
[20] N. Perlroth, “Spam Invades a Last Refuge, the Cellphone,” The New
York Times, April 2012, http://preview.tinyurl.com/7nwvm3g.
[21] A. Bobotek, “Threat of Mobile Malware and Abuse,” Messaging Anti-
Abuse Working Group (MAAWG), October 2010, http://goo.gl/Ay57e.
[22] T. Strazzere, “GGTracker Technical Tear Down,” Lookout Mobile Se-
curity, http://goo.gl/IuVfm.
[23] Y. Zhou and X. Jiang, “Dissecting android malware: Characterization
and evolution,” in Security and Privacy (SP), 2012 IEEE Symposium
on. IEEE, 2012, pp. 95–109.
[24] Y. Moreau, H. Verrelst, and J. Vandewalle, “Detection of mobile phone
fraud using supervised neural networks: A first prototype,” in Artificial
Neural NetworksICANN’97. Springer, 1997, pp. 1065–1070.
[25] P. Barson, S. Field, N. Davey, G. McAskie, and R. Frank, “The detection
of fraud in mobile phone networks,” Neural Network World, vol. 6, no. 4,
pp. 477–484, 1996.
[26] K. Nohl and S. Munaut, “Wideband GSM sniffing,” in In 27th Chaos
Communication Congress, 2010, http://goo.gl/wT5tz.
[27] “SIM Box Detection Service,” Telekom Austria, http://goo.gl/Ac12d.
[28] “SIMbox detector,” Xintec, http://goo.gl/AUZbe.
... Most detection methods from the literature [20,25,26,31,32,37,38] extract the spatiotemporal communication behavior of each SIM card by relying on Call Detail Records (CDR) traces. CDRs are timestamped and geo-referenced recordings of mobile device-generated events (i.e., call, text, data) collected by network operators. ...
... CDRs are timestamped and geo-referenced recordings of mobile device-generated events (i.e., call, text, data) collected by network operators. SIM cards used within SIMBox appliances tend to exhibit automated behavior, distinct from human or natural patterns, characterized by low mobility, repetitive calls at odd hours, or many contacts, as noted in [26]. Such literature contributions have demonstrated excellent detection performance (i.e., an average accuracy of 94.5%) and are implemented offline, leveraging historical data collected at the network core without impacting network performance. ...
... Literature approaches operating at the network core [20,25,26,31,32,37,38] aim to identify SIMBox activity from SIM cards' communication and mobility behavior extracted from CDRs datasets. These methods rely on various features (e.g., #calls at night, #contacts, #incoming calls, #locations) to distinguish SIM cards used for SIMBox termination from SIM cards used by genuine consumers. ...
Preprint
Despite their widespread adoption, cellular networks face growing vulnerabilities due to their inherent complexity and the integration of advanced technologies. One of the major threats in this landscape is Voice over IP (VoIP) to GSM gateways, known as SIMBox devices. These devices use multiple SIM cards to route VoIP traffic through cellular networks, enabling international bypass fraud with losses of up to $3.11 billion annually. Beyond financial impact, SIMBox activity degrades network performance, threatens national security, and facilitates eavesdropping on communications. Existing detection methods for SIMBox activity are hindered by evolving fraud techniques and implementation complexities, limiting their practical adoption in operator networks.This paper addresses the limitations of current detection methods by introducing SigN , a novel approach to identifying SIMBox activity at the cellular edge. The proposed method focuses on detecting remote SIM card association, a technique used by SIMBox appliances to mimic human mobility patterns. The method detects latency anomalies between SIMBox and standard devices by analyzing cellular signaling during network attachment. Extensive indoor and outdoor experiments demonstrate that SIMBox devices generate significantly higher attachment latencies, particularly during the authentication phase, where latency is up to 23 times greater than that of standard devices. We attribute part of this overhead to immutable factors such as LTE authentication standards and Internet-based communication protocols. Therefore, our approach offers a robust, scalable, and practical solution to mitigate SIMBox activity risks at the network edge.
... It has become a major problem in the business intelligence area due to its rapid daily expansion through its daily use, marketing, and development. The market is becoming more competitive as a result [7]. Companies and software developers are in intense competition with each other to prove their product quality and invest a lot of time and effort in hiring clients to maintain their future success. ...
... This program provides only 75-80% accuracy in detecting fraudulent applications. On paper [7], they provided a complete overview of the facts and a recommended fraud detection framework. They tested three types of verification: Quality-based guarantees, rating based guarantees, review-based validation. ...
Article
With the rapid advancement of technology, fraudulent phone calls, such as spam and hazardous calls, have become a major issue for the telecom industry, resulting in annual financial losses of millions of dollars worldwide. Phishing and spamming via phone or mobile devices, as well as fraudulent phone calls, have become common dangers to individuals and organizations. Machine learning (ML) and artificial intelligence (AI) have evolved into excellent approaches for detecting and understanding fraudulent or damaging calls. This paper presents an overview of AI-based methods for detecting and evaluating spam or fraud, as well as an evaluation of their shortcomings and potential solutions. A new method for detecting fraudulent calls is proposed, which has shown extraordinary precision and accuracy. A dataset of actual fraudulent calls was used to evaluate the suggested approach. The results also showed that the technique was extremely accurate in detecting possibly fraudulent or spammy calls, as well as malicious calls. Countermeasures can be developed using the knowledge gathered from the research of fraud calls, which showed scammers' methods and approaches
... Another widely used family of methods is based on machine learning models to learn the differences in statistical behavioral characteristics of fraudsters and normal users. Murynets et al. 14 collected telephone billing data from 500 fraudsters and 93,000 normal users of major cellular network operators in the U.S. and extracted static telephone social behavior features such as call duration, call length, dialing and answering percentages, and used machine learning models such as decision trees and random forests to jointly learn the fraudsters' telephone fraud behavior patterns. Statistical behavioral feature-based approach discovers fraudsters by learning static behavioral features, however, lacking the consideration of data sequential feature differences, and user-to-user messaging learning, this approach may be unsuitable when fraudulent tactics change. ...
Article
Full-text available
The evolution of telecommunication technologies not only enhances social interactions but also inadvertently fosters an environment for telecom fraud. Graph-like data generated from traceable telecommunication interactions offers a foundation for graph-based fraud detection. However, the complexity and dynamism of interaction networks present formidable challenges. Our data exploration revealed that fraudsters exhibit consistent distinguishability from normal users in specific dynamic behavioral traits. Leveraging previous research, we constructed a latent synergy network (LSN) through second-order relations. Analysis of LSN exposed that fraudsters, seeking to elude detection, adopt deceptive behaviors by establishing connections with numerous normal users, leading to the over-smoothing problem in traditional GNN models. Consequently, we introduce the Dynamic Pattern with Adaptive Filter Graph Learning framework for telecom fraud detection. With the sequential network, we capture users’ dynamic behavioral features for LSN input. Additionally, in LSN learning, we designed a trainable filter to capture differences between feature channels during information aggregation, mitigating the over-smoothing problem. On the Sichuan Telecom, Sichuan-mini Telecom, and YelpChi datasets, using AUC, Recall, and F1-score as evaluation metrics, DPGFD outperforms GCN, GraphSAGE, FRAUDRE, BWGNN, and GAGA by an average of 5%.
... Telecom fraud [68] is a long-standing problem [48,49,51] that continues to impact carriers [50,63,66,67] and subscribers [3,60]. Most fraud schemes stem from inadequate authentication mechanisms [21,22,61,62,74] and security flaws in legacy telecom signaling protocols [44,64,76,77]. ...
Preprint
Full-text available
Unsolicited telephone calls that facilitate fraud or unlawful telemarketing continue to overwhelm network users and the regulators who prosecute them. The first step in prosecuting phone abuse is traceback -- identifying the call originator. This fundamental investigative task currently requires hours of manual effort per call. In this paper, we introduce J\"ager, a distributed secure call traceback system. J\"ager can trace a call in a few seconds, even with partial deployment, while cryptographically preserving the privacy of call parties, carrier trade secrets like peers and call volume, and limiting the threat of bulk analysis. We establish definitions and requirements of secure traceback, then develop a suite of protocols that meet these requirements using witness encryption, oblivious pseudorandom functions, and group signatures. We prove these protocols secure in the universal composibility framework. We then demonstrate that J\"ager has low compute and bandwidth costs per call, and these costs scale linearly with call volume. J\"ager provides an efficient, secure, privacy-preserving system to revolutionize telephone abuse investigation with minimal costs to operators.
... An ever-evolving danger that affects people, businesses, and the government is phone-based spam or scams [7]. The Federal Trade Commission (FTC) in the United States got over 3 million reports of fraud in 2021, resulting in a $3billion-dollar loss overall. ...
Article
Full-text available
With an increase advancement of technology, fraud phone calls, including spam’s and malicious calls have become a major concern in telecommunication industry and causes millions of global financial losses every year. Fraudulent phone calls or scams and spams via telephone or mobile phone have become a common threat to individuals and organizations. Artificial Intelligence (AI) and Machine Learning (ML) has emerged as powerful tools in detecting and analyzing fraud or malicious calls. This project presents an overview of AI-based fraud or spam detection and analysis techniques, along with its challenges and potential solutions. The novel fraud call detection approach is proposed that achieved high accuracy and precision. The Proposed approach was evaluated using a dataset of real-world fraudulent calls. And results demonstrate that the approach achieved high accuracy in detecting malicious calls and identifying potential indicators of frauds or spam’s. The analysis of fraud calls also provided insights into the tactics and methods employed by fraudsters, which can be used to develop countermeasures.
... Their findings show that data mining techniques have been applied most extensively to the detection of insurance fraud, although corporate fraud and credit card fraud have also attracted a great deal of attention in recent years but noted a distinct lack of research on mortgage fraud, money laundering, and securities and commodities fraud. The main data mining techniques used for FFD are logistic models, neural networks, the Bayesian belief network, and decision trees, all of which provide primary solutions to the problems inherent in the detection and classification of fraudulent data [16] Analysed the fraudulent traffic from SIM boxes operating with a large number of SIM cards in a telecommunication network. It processes hundreds of millions of anonymized voice call detail records (CDRs) from one of the main cellular operators in the United States. ...
Article
Full-text available
Fraudulent use within the mobile communications network is costing the industry hundreds of millions of dollars each year. The industry is now making a major effort to find ways of combating the problem. In this paper we report our attempts to apply neural computational techniques to the problem of identifying fraudulent use of mobile phone networks. Our first experiments have used a Multi-Layer Perceptron network, and with this we have obtained 92.5% correct classification of our stochastically simulated data.
Conference Paper
Full-text available
The Short Messaging Service (SMS), one of the most successful cellular services, generates millions of dollars in revenue for mobile operators. Estimates indicate that billions of text messages are traveling the airwaves daily. Nevertheless, text messaging is becoming a source of customer dissatisfaction due to the rapid surge of messaging abuse activities. Although spam is a well tackled problem in the email world, SMS spam experiences a yearly growth larger than 500%. In this paper we present, to the best of our knowledge, the first analysis of SMS spam traffic from a tier-1 cellular operator. Communication patterns of spammers are compared to those of legitimate cell-phone users and Machine to Machine (M2M) connected appliances. The results indicate that M2M systems exhibit communication profiles similar to spammers, which could mislead spam filters. Beyond the expected results, such as a large load of text messages sent out to a wide target list, other interesting findings are made. For example, the results indicate that the great majority of the spammers connect to the network with just a handful of different hardware models. We find the main geographical sources of messaging abuse in the US. We also find evidence of spammer mobility, voice and data traffic resembling the behavior of legitimate customers.
Article
Full-text available
The popularity and adoption of smart phones has greatly stimulated the spread of mobile malware, especially on the popular platforms such as Android. In light of their rapid growth, there is a pressing need to develop effective solutions. However, our defense capability is largely constrained by the limited understanding of these emerging mobile malware and the lack of timely access to related samples. In this paper, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011. In addition, we systematically characterize them from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads. The characterization and a subsequent evolution-based study of representative families reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software. Based on the evaluation with four representative mobile security software, our experiments show that the best case detects 79.6% of them while the worst case detects only 20.2% in our dataset. These results clearly call for the need to better develop next-generation anti-mobile-malware solutions.
Chapter
This case study designs a portfolio of credit default swaps (CDS) and credit indices to hedge against changes in a collateralized debt obligation (CDO) book. The hedging problem is to minimize risk of portfolio losses subject to budget and cardinality constraints on hedge positions.
Article
Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mech-anisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Most people think about fraud and security in the mobile industry as having their phone stolen or hacked. However there is an underground industry that Juniper Research believes is worth 58bnayearinrevenuesthatarebeinglosttofraudandlackofeffectiverevenueprotection.1Thisdwarfsissuesaroundpersonalsecurityandoutlinesanimmenseproblemfromwhichmobilenetworkoperatorssuffer,butoftenstruggletorecognise.Mostpeoplethinkaboutfraudandsecurityinthemobileindustryashavingtheirphonestolenorhacked.HoweverthereisanundergroundindustrythatJuniperResearchbelievesisworth58bn a year in revenues that are being lost to fraud and lack of effective revenue protection.1 This dwarfs issues around personal security and outlines an immense problem from which mobile network operators suffer, but often struggle to recognise. Most people think about fraud and security in the mobile industry as having their phone stolen or hacked. However there is an underground industry that Juniper Research believes is worth 58bn a year in revenues that are being lost to fraud and lack of effective revenue protection. This is an immense problem for mobile network operators, but which they often struggle to recognise. Fraudsters exploiting weaknesses in mobile networks operate as businesses, often providing services to other fraudsters in a chain of fraud. The combining of multiple fraud practices makes detection difficult and prevention harder. Mark Yelland of Revector details how these frauds work and what can be done about them.