Content uploaded by Adriano C. M. Pereira
Author content
All content in this area was uploaded by Adriano C. M. Pereira on Feb 11, 2015
Content may be subject to copyright.
Applied Computing Review 14
Fraud Detection in Reputation Systems in e-Markets using
Logistic Regression and Stepwise Optimization
Rafael Maranzato
Universo Online Inc.
Dept. of R & D
São Paulo, SP, Brazil
rmaranzato@uolinc.com
Adriano Pereira
Federal Center for
Technological Education of
Minas Gerais (CEFET-MG)
Dept. of Computer Science
Belo Horizonte, MG, Brazil
adriano@decom.cefetmg.br
Marden Neubert
Universo Online Inc.
Dept. of R & D
São Paulo, SP, Brazil
mneubert@uolinc.com
Alair Pereira do Lago
Univ. of São Paulo - USP
Dept. de Computação
São Paulo, SP, Brazil
alair@ime.usp.br
ABSTRACT
Reputation is the opinion of the public toward a person, a
group of people, or an organization. Reputation systems are
particularly important in e-markets, where they help buyers
to decide whether to purchase a product or not. Since a
higher reputation means more profit, some users try to de-
ceive such systems to increase their reputation. E-markets
should protect their reputation systems from attacks in or-
der to maintain a sound environment. This work addresses
the task of finding attempts to deceive reputation systems
in e-markets. Our goal is to generate a list of users (sellers)
ranked by the probability of fraud. Firstly we describe char-
acteristics related to transactions that may indicate frauds
evidence and they are expanded to the sellers. We describe
results of a simple approach that ranks sellers by counting
characteristics of fraud. Then we incorporate characteristics
that cannot be used by the counting approach, and we ap-
ply logistic regression to both, improved and not improved.
We use real data from a large Brazilian e-market to train
and evaluate our methods and the improved set with lo-
gistic regression performs better, specially when we apply
stepwise optimization. We validate our results with special-
ists of fraud detection in this market place. In the end, we
increase by 112% the number of identified fraudsters against
the reputation system. In terms of ranking, we reach 93%
of average precision after specialists’ review in the list that
uses Logistic Regression and Stepwise optimization. We also
detect 55% of fraudsters with a precision of 100%.
Categories and Subject Descriptors
K.4.4 [Computers and Society]: Electronic Commerce—
e-markets; H.3.5 [Online Information Services]: Web-
based services—online reputation systems
General Terms
Experimentation, Management, Security
Keywords
e-markets, reputation systems, trust management, fraud ev-
idence, fraud detection, logistic regression
1. INTRODUCTION
In the past few years, there has been a huge development
of online commercial activity enabled by the Internet and
World Wide Web (WWW). Electronic marketplaces, or just
e-markets, such as Amazon1and eBay2, have reached great
popularity and revenue, emerging as very relevant model in
the Business-to-Consumer (B2C) and Consumer-to-Consu-
mer (C2C) e-commerce scenario. Amazon revenues reached
US$ 19.17 billion in 2008, including a fast-growing income
from selling Web Services to other companies. At eBay, sales
reached US$15.7 billion in the second quarter of the year,
with 84.5 million active users [5].
An e-market can be defined as a multi-party e-commerce
platform intermediating buyers and sellers [13]. E-markets
are therefore information systems intended to provide their
users with online services that will facilitate information ex-
change and transactions. The development of online auc-
tion sites and other forms of electronic markets has created
a new kind of online community, where people trade with
1http://www.amazon.com
2http://www.ebay.com
Applied Computing Review 15
each other in a potentially large scale. In this scenario, rep-
utation plays an important role.
Reputation is the opinion (more technically, a social evalua-
tion) of the public toward a person, a group of people, or an
organization. Reputation can also be defined as the amount
of trust inspired by a particular person or company in a
specific domain of interest [20]. The widespread adoption of
e-markets has highlighted several problems regarding trust
and deception that must be addressed to keep these envi-
ronments sound [6].
Major marketplace providers try to tackle the problem by
introducing simple reputation mechanisms [30], which give
an indication of how trustworthy a user is, based on his/her
performance in previous transactions.
In this work, we focus on frauds against reputation systems.
From a characteristic3extraction based on expert knowl-
edge [29], we use a set of characteristics for fraud detection
in e-markets that was recently found [17]. We also describe
a simple approach for identifying frauds by characteristics
counting. Next, we enhance the characteristics set in this
work and adopt a Logistic Regression Model that can deal
with this set. We also apply a technique of optimization
in the model generated by Logistic Regression. These ap-
proaches analyze characteristics of both the user and the
negotiation processes that happen in the marketplace. We
compare them by using actual data from a large Brazilian
e-market and then checking the results with specialists to
validate the outcomes. We found out that the performance
of both methods is very promising, and they are useful for
discovering a large number of fraudsters that have not been
identified before.
The remainder of this article is organized as follows. Sec-
tion 2 discusses related work. Section 3 briefly describes the
TodaOferta marketplace, used in our study. In Section 4 we
explain the characteristic extraction procedure for data used
in this work. Section 5 shows the characteristic counting ap-
proach with some results. Section 6 presents a new approach
that applies Logistic Regression with our case study and re-
sults, including Stepwise optimization. Finally, Section 7
shows our conclusions.
2. RELATED WORK
Electronic markets are getting more popular each day. One
of the most common e-markets application is online auc-
tions, which have been extensively studied lately. Several
studies have focused on reputation systems and trust in on-
line auctions. Some of them have analyzed the importance of
reputation in auction outputs, mainly in final prices. Ba and
Pavlou [2] investigate the effectiveness of reputation systems
and how reputation correlates to auction results. They con-
clude that reputation plays an important role in trust and
leads to higher ending prices.
Klos et. al [12] analyze the effect of trust and reputation over
3We use the term characteristic instead of feature, which is
the term used in some references, because we consider the
term characteristic more appropriated in statistical context
the profits obtained by intermediaries in electronic commer-
cial connections. Different trust and distrust propagation
schemes in e-commerce negotiations studied and evaluated
by Guha et. al [8]. Resnick et al. [26] show that sellers with
high reputation are more capable of selling their products,
but the gains in final prices are reduced. Using a controlled
experiment, Resnick et al. [27] study more accurately the
impact of reputation on the auction outputs. The results
show that, in general, bidders pay higher prices to sellers
with higher reputation.
Several works investigate reputation systems and how they
induce cooperative behavior in strategic settings. Dellaro-
cas [3] has done a thorough review on this topic. While pro-
viding incentive to good behavior, reputation systems may
also help eliciting deceptive behavior. In fact, some fraud-
related studies rely on reputation information as a source of
evidence of fraud [7].
This subject has long interested economists once sellers with
good reputation can increase their prices because buyers pay
for such reputation [11]. In the real world, reputation is built
with time after some transactions, and sellers build a concept
about themselves that becomes a reference to consumers.
This historical record is used by future buyers when making
a new transaction [25].
Reputation mechanisms are based on virtual opinions, given
by people who generally do not know each other personally.
Therefore electronic trust is more difficult to be established
if compared to the real world. Taking a broad view, in these
marketplaces a buyer’s reputation represents the probability
of payment and a seller’s reputation represents the estimated
probability of delivering the advertised item (product that
has been bought) after the payment [11]. These probabilities
are related to trust [21].
Resnick et al. [25] say that these reputation systems have
three main problems: (i) buyers have little motivation to
provide feedback to sellers; (ii) it is difficult to elicit negative
feedback because it is common that people negotiate and
solve problems before filling the evaluation in the system;
(iii) it is difficult to assure honest reports. Since it is very
easy to register in such systems, it is very easy to create a
false identity that can be used to trade with other users and
distort the reputation system.
As the feedback system is the basis of reputation in these
marketplaces and gives information that is used before the
moment the transaction happens, it is easy for fraudsters
to make artificial transactions so that they can have a good
reputation score. Basically, this artificial score can be used
to deceive buyers who pay and do not receive the right prod-
uct or it can be used to sell more goods because the seller
will have favorable reputation [25]. Considering this situa-
tion, marketplaces should have tools to identify fraudsters,
in order to protect honest users. Users who interact with
fraudsters may have their reputation affected too [21]. Gav-
ish and Tucci [6] show that buyers who are victims of frauds
will decrease their volume of transactions, which it is not
profitable to the marketplaces.
Applied Computing Review 16
One of the pioneers on empirical and statistical approach
on bankruptcy prediction, Ohlson [22] was probably the
first academic researcher to apply Logistic Regression in the
field. In his work, not only was he able to detect what was
the most important evidence for bankruptcy prediction, but
also find fraud an/or error evidence since he observed that
“the reports of the misclassified bankrupt firms seem to lack
any warning signals of impending bankruptcy.” Since then,
many other works have applied the method in fraud detec-
tion [14, 15, 16, 28]. In particular, Viaene et al. applied
the method, in a pool of many other methods, for automo-
bile insurance fraud data and concluded that “noteworthy
is the good overall performance of the relatively simple and
efficient techniques such as logit...” (Logistic Regression).
3. MARKETPLACE DESCRIPTION
This section describes TodaOferta1, which is a marketplace
developed by the largest Latin America Internet Service
Provider, named Universo Online (UOL)2. It also defines
some basic concepts related to the marketplace.
TodaOferta [23] is a website for buying and selling products
and services on the Web. Table 1 shows a short summary
of the TodaOferta dataset. It embeds a significant sample
of users, listings, and negotiations. Due to a confidentiality
agreement, the quantitative information of this dataset can
not be presented.
Coverage (time) Jun/2007 to Jul/2008
#categories (top-level) 32
#sub-categories 2,189
Average listings per user 4.63
Average listings per seller 42.48
Negotiation options Fixed Price and Auction
Table 1: TodaOferta Dataset - Summary
Users correspond to buyers and sellers interested in making
transactions in the marketplace. Listings are created by sell-
ers to advertise products or services at a fixed-price or in an
auction. When a buyer is interested in a listing he/she starts
a negotiation. With a fixed-price listing, the negotiation au-
tomatically starts a transaction, indicating that buyer and
seller should transact the good at the advertised price. With
an auction, the winning bid will become a transaction when
the auction finishes. Unlike eBay, where auctions represent
almost 50% of all transactions [9], in TodaOferta auctions
account for less than 2% of all transactions, since the vast
majority of listings are fixed-price.
There are 32 top-level categories in TodaOferta, which in-
clude 2,189 sub-categories providing a variety of distinct
products and services, from collectibles to electronic and ve-
hicles. The current top sales sub-categories are cell phones,
MP3 players and pen drives.
The TodaOferta marketplace employs a quite simple reputa-
tion mechanism. After each negotiation, buyers and sellers
1http://www.todaoferta.com.br
2http://www.uol.com.br
qualify each other with a rate of value 1 (positive), 0 (neu-
tral), or -1 (negative). User’s reputation is defined as the
sum of all qualifications received by him/her. Feedbacks
from a same user are considered only once when computing
the reputation score. Reputation systems are useful to com-
municate trust in electronic commerce applications. As To-
daOferta does not charge sellers after transactions, it is quite
simple to generate artificial transactions just to improve this
punctuation. However, TodaOferta provides other informa-
tion about sellers and buyers that can be used to identify
trustful and distrustful users as well (e.g., time since the
user is registered, comments left by users who negotiated
with him/her).
Basically, sellers try to cheat the reputation system with
two different purposes. The first one is to improve their sales
and, consequently, their profits because his or her reputation
in the marketplace seems better than it actually is. The
second purpose is to exploit the good reputation that he or
she has to commit other types of fraud, in general related
to finance damage to buyers.
Another relevant feature of TodaOferta is the integration
with a payment system which includes an escrow mecha-
nism. This escrow system is named PagSeguro4and it is
very similar to PayPal5. If a seller uses the payment system,
he can configure his listings so that he will receive payments
through it. However, with the escrow feature, the seller will
receive the payment only after the buyer has received the
product or after 15 days. So, if a seller has enabled the pay-
ment system, he is more likely to be trustworthy, since he
allows the buyer to block the money if the product was not
delivered properly.
4. CHARACTERISTIC EXTRACTION
In this section we describe the procedures related to charac-
teristic extraction that is the first part of the methodology
described by Maranzato [19].
Given the importance of reputation systems, we decided to
focus our experiments on identifying and evaluating charac-
teristics that can be indicative of fraud in such systems. We
present here a method for extracting fraud evidence from
transactions and the reputation systems.
First we gathered a real dataset from TodaOferta (see Ta-
ble 1) and a list of all users that were blocked for infringing
the rules of the marketplace. Each item of this list contains
a label describing the reason why the user was blocked. As
our goal is to identify users that defraud the reputation sys-
tem, we define a set FRS that contains the users blocked
specifically for this reason and FRST the transaction with
this fraud evidence. We also define a set AFr containing all
users blocked for any kind of fraud and AFrT the transac-
tion. We consider the remaining users in the system as “not
fraud” and put them in a set NFr and NFrT the transaction
4http://www.pagseguro.com.br
5http://www.paypal.com
Applied Computing Review 17
with no fraud evidence. Considering this, we can represent:
FRS ⊂AFr
AFr ∪NFr =All Users.
One user is considered fraudulent if he/she participates in at
least one fraudulent transaction, as a buyer or a seller. Thus,
the transactions in which a given user is involved determine
which set the user will be in. We organize the transactions
in sets analogous to the ones defined for users. A user is
in FRS if he/she is the buyer or the seller in at least one
transaction in FRST . If this is not the case, but the user
is involved in at least one transaction in AFrT, he/she is in
AFr. Users in NFr are those who only have transactions in
NFrT.
With this defined set, we start the actives related to gather-
ing information to the process of characteristic extraction.
Then we interviewed specialists in fraud detection in this
marketplace to understand their procedures and to identify
which evidence we should consider when looking for users
that were trying to cheat the reputation system. Most of
their work is about reaction to denunciations. The special-
ists listed a set of characteristics that they analyzed to iden-
tify fraudulent transactions but also pointed out that all
these characteristics can also occur in honest transactions.
During the interviews we suggested new characteristics that
could be used in fraud detection and we included them in
our tests – some of them were not useful. We also use our ex-
perience to try some characteristics. This kind of approach
is defined by Duda and Hart [4] as prior knowledge, that
is one of the sub-problems of pattern recognition. We also
faced on some other sub-problems like feature extraction,
overfitting and noise, for example.
It important to emphasize that we decided to consider only
transactions with positive feedbacks from buyers, since the
positive feedback is the main goal of frauds to the reputation
system and that is the type of feedback which affects the
punctuation of the feedback system. We plan to consider
other types of feedback in a future work.
After analyzing the dataset, the mechanics of this market-
place and the information collected during the interviews,
we considered five main events to be taken into account in
a fraud detection process:
1. Seller’s registration;
2. Buyer’s registration;
3. Listing publication;
4. Transaction;
5. Feedback from Buyer to Seller6.
A timeline of these events can be seen in Figure 1. As you
can see, we use not only the transaction information but all
interactions between buyer and seller in the marketplace.
Another important concept is that in electronic market-
places, transactions between users can be represented as a
6In this work, we are not considering feedbacks from sellers
because they do not benefit sellers.
Figure 2: Graph of negotiations
graph (see Figure 2), with a node for each user and an edge
for one (or more) transactions between two users.
Using connection information available about the buyer and
the seller and considering the events previously described
we have found twelve characteristics the are indicative of
frauds. We started from two connection attributes of each
event: workstation identifier7and IP address. Then we took
the three events related to the buyer (Buyer’s registration,
Transaction and Buyer’s feedback) and combined with the
two events related to the seller (Seller’s registration and List-
ing publication), obtaining six combinations to be verified.
We show these characteristics in Table 2, presenting an ex-
planation of why each one can be considered a good char-
acteristic of fraud and a warning about occurrences in le-
gitimate transactions. As an example, characteristic SWLB
is detected when we observe the same workstation identifier
when the seller created the listing and when the buyer reg-
istered. Similar comparison is done for IP address in SILB.
We also extracted five other characteristics that can not be
described by Boolean values like the ones presented in Ta-
ble 2, which we list below:
•Quick Feedbacks from Buyers, in less than Nhours
after transaction (QFB);
•Small Rate of Visits per Transactions, smaller than N
(SRVT );
•Short Interval for Transactions in the same Listing dur-
ing Nhours (SITL);
•Same domain in e-mails from buyers in the same listing
considering Ntransactions (UDTB);
•E-mails with the same domain between sellers and buy-
ers considering Ntransactions (SDBS);
In all the situations listed above, we can convert these char-
acteristics into Boolean values by establishing a threshold
for the value of Nin each case.
These characteristics are named positive characteristics be-
cause they indicate evidence of fraud. On the other hand, we
7Due to confidentiality, we can not give more details about
how this identifier is determined.
Applied Computing Review 18
Figure 1: Timeline of Events
Characteristic Suspicion Warning Code Situations
Same workstation
identifier
Seller and buyer
used the same
computer
They could have
used a public
computer
SWLB Listing and Buyer
SWSB Seller and Buyer
SWLT Listing and Transaction
SWLF Listing and Feedback
SWSF Seller and Feedback
SWST Seller and Transaction
Same IP Address
Seller and buyer
used the same
computer
They could have
used a proxy or a
public computer
SILB Listing and Buyer
SISB Seller and Buyer
SILT Listing and Transaction
SILF Listing and Feedback
SISF Seller and Feedback
SIST Seller and Transaction
Table 2: List of Characteristics
found two characteristics that decrease the chance of fraud.
We name than as negative characteristics and they are:
•Seller Recognition (REC1). This recognition can be
done by editorial analyses from the marketplace, for
example;
•Transaction paid through integrated escrow
system (TCPS), that is PagSeguro;
All positive and negative characteristics are listed in Table 3.
Our next step is to expand this evidence from transactions
and events to the sellers because the seller is the target of
fraud detection. As we mentioned before, specialists con-
sider that a user is fraudulent if he/she participates in at
least one fraudulent transaction, as a seller or a buyer. With
this approach that uses just one positive characteristic we
reach 96.8% of sellers in FRS. Besides, we also reach 78.5%
of users in AFr −FRS. Unfortunately, we also hit 54.3%
of user that were not pointed as fraudsters (users in NFr),
which shows us that only one fraud characteristic (one char-
acteristic among all positive seventeen we have obtained) is
a weak information to give certainty about a fraud behavior.
Consider a characteristics and let Fbe the set of all trans-
actions that have this characteristic. We count how many
transactions in Fare also in FRST and in NFrT , and com-
pute their respective probabilities:
p1=|F∩FRST |/|FRST |
p2=|F∩NFrT|/|NFrT |.
In order to evaluate the discriminating power of this char-
acteristic, we compute the odds ratio between the classes
FRST and NFrT. The odds ratio is a measure that com-
pares the probability of an event occurring in one group with
the probability of it occurring in another group. If the prob-
abilities of the event occurring in each of the groups are p1
and p2, then the odds ratio is:
p1/(1 −p1)
p2/(1 −p2)=p1(1 −p2)
p2(1 −p1).
In this work, we only consider positive characteristics with
odds ratio at least 3. For negative ones, we just invert the
ratio and we use the same threshold. This simple approach
Applied Computing Review 19
Characteristic Description Type
QFB Quick Feedbacks from Buyers, in less than Nhours after transaction Positive
REC1 Seller Recognition Negative
SDBS E-mails with the same domain between sellers and buyers considering Ntransactions Positive
SILB Same IP Address – Listing and Buyer Positive
SILF Same IP Address – Listing and Feedback Positive
SILT Same IP Address – Listing and Transaction Positive
SISB Same IP Address – Seller and Buyer Positive
SISF Same IP Address – Seller and Feedback Positive
SIST Same IP Address – Seller and Transaction Positive
SITL Short Interval for Transactions in the same Listing during Nhours Positive
SRVT Small Rate of Visits per Transactions, smaller than NPositive
SWLB Same workstation identifier – Listing and Buyer Positive
SWLF Same workstation identifier – Listing and Feedback Positive
SWLT Same workstation identifier – Listing and Transaction Positive
SWSB Same workstation identifier – Seller and Buyer Positive
SWSF Same workstation identifier – Seller and Feedback Positive
SWST Same workstation identifier – Seller and Transaction Positive
TCPS Transaction paid through integrated escrow system Negative
UDTB Same domain in e-mails from buyers in the same listing considering Ntransactions Positive
Table 3: Complete list of Characteristics
is useful to to validate the characteristics and we introduce it
in the next section, since it is based on counting the number
of positive characteristics of a transaction.
5. CHARACTERISTICS COUNTING
This section explains a characteristic counting approach to
validate characteristics in a fraud detection process. We can
take the positive characteristics introduced in Section 4 and
check if that the presence of only one of them is enough to
determine that a transaction is fraudulent.
Now we can find out how a minimal number of characteris-
tics kcan be used as a strong evidence of fraud. First we
rank the characteristics in decreasing order of their corre-
sponding odds ratios. Iterating kup to the 17 character-
istics, we compute the set Kof sellers that participate in
transactions with at least kcharacteristics. These charac-
teristics are the positive ones listed in Table 3 and they are
natural candidates for investigation.
Using this simply composed characteristic as a classifica-
tion/ranking criteria, we apply the usual measures of pre-
cision, recall and F-measure, used for classifiers. The per-
centage of sellers in FRS that are in Kis the recall. The
percentage of sellers in Kthat are in FRS is the precision.
The harmonic mean of recall and precision is the F-measure,
which evaluates the usual trade off between precision and
recall, providing a better measure to comparison. These re-
sults are in Table 4 and they will be compared to the Logistic
Regression Model approach next in Section 6.
We saw that the best value of measure occurs when we reach
60.3% of sellers, with a precision of only 33.0%. In terms of
a global metric for this ranking, we have reached an average
k Recall Precision F-measure
17 1.3% 80.0% 0.026
16 2.7% 66.7% 0.051
15 6.7% 66.7% 0.121
14 10.3% 59.6% 0.176
13 15.3% 53.5% 0.238
12 19.0% 49.1% 0.274
11 24.0% 49.0% 0.322
10 30.0% 45.9% 0.363
9 33.7% 42.6% 0.376
8 41.0% 38.7% 0.398
7 47.0% 35.2% 0.402
6 52.0% 34.0% 0.411
5 60.3% 33.0% 0.427
4 70.7% 28.4% 0.405
3 79.0% 22.3% 0.348
2 89.3% 18.0% 0.300
1 100.0% 12.4% 0.220
Table 4: Sellers with at least kcharacteristics
Applied Computing Review 20
precision8of 38.9%.
6. LOGISTIC REGRESSION
This section presents the Logistic Regression Model and its
application. First we describe some concepts about it (Sec-
tion 6.1) and then we describe our case study applying it to
actual data from TodaOferta to rank all sellers considering
their estimated fraud probability (Section 6.2). It is im-
portant to say that we consider this problem of identifying
fraudsters as a ranking problem instead of a classification
problem that typically classifies the results in some prede-
fined sets. Our intention is to generate a list that shows all
sellers ordered by the percentage of one seller to be consid-
ered fraudster.
6.1 Definition
One of the greatest advantage of the Logistic Regression
method over typical classification methods is that it provides
a ranking ordering on the classified data, since it tries to
predict the estimated fraud probability. Another important
reason why we used this method and not others, is the fact
that it is a quite natural extension to the odds ratio we have
computed in the characteristics extraction analysis reported
previously in Section 4. Applying to our data, the estimated
probability of fraud (p) is represented by:
p=1
1 + e−z
where, for constants βiand variables xi, with 1 ≤i≤17,
we have:
z=β0+β1x1+· · · +β17 x17.
In fact, we have 17 variables (the fraud characteristics). And
the constants βiare the best constants that fit the model to
the data. The optimization procedures and details that are
used for their obtainments are out of the scope of this work
and we refer the reader to the work done by Hosmer and
Lemeshow [10] for a better understanding of this method.
6.2 Case Study and Results
This sections describes our case-study using a real dataset
from TodaOferta (described in Section 3) and it discusses
the results of the experiments. In this section, we will apply
part of the methodology proposed by Maranzato [19] to de-
tect fraudsters against reputation systems in e-markets. We
will compare and optimize some applications of the Logistic
Regression Model.
It is important to say that the approach of sorting sellers
by the number of positive characteristics of fraud has limi-
tations if you compare it to the Logistic Regression Model
that we apply in this work and another previous work [17].
8Average precision (AP ) is defined by:
AP =P|K|
k=1(P(k)×rel(k))
|K|
where kis the rank, Kthe number of sellers, rel() a bi-
nary function that returns if it is fraudster ou not, and P(k)
precision at a given cut-off rank.
The first limitation is that it is difficult to consider negative
characteristics, that is, the percentage of the characteristic
in NFrT is significantly higher than in FRST because it only
sums each one without considering if its positive or negative
evidence of fraud.
Another limitation is related to characteristics defined by
thresholds. The counting characteristics method cannot deal
with continuous values but only binary ones, because it con-
siders only a binary response for a given characteristic of
fraud. One example is the period between the transaction
and the evaluation. We saw that the shorter the period
evaluated is, the greater the chances of fraud. If we define a
threshold we cannot use these variations of time as an input
value to the analysis.
It is also possible to apply the Logistic Regression Model to
consider percentages of occurrence of one characteristic in
the transactions instead of just considering the characteris-
tic of the seller. It is important to remember that it is a rule
of the specialists in the investigation process in TodaOferta,
but considering as fraudster a seller that has only one sus-
picious transaction in many of them can generate a lot of
false positives. Afterwards, we will see that this information
is used to improve the ranking performance.
In order to apply Logistic Regression Model the first task
is to prepare the dataset. To make a list of all sellers, for
each one, we check if there is at least one transaction that
contains the characteristics described in Section 4. We use
1 to indicate this existence and 0 otherwise. Afterwards, we
improve this list with two non-fraud characteristics: an in-
dication of if the seller passed for an editorial analysis9and
the percentage of transactions that are effectively done by
PagSeguro. To complete that list, we compute the percent-
age of transactions that has a characteristic for each seller.
The next step is to build the dataset that is going to be used
as input to the Logistic Regression Model. We decided to
adopt the full data for training and also for testing. The fact
that the obtained performance numbers are not trustable
because the generated model suffers from overfitting is not
a problem anyway since the non-fraud annotations are not
trustworthy, as we discovered in our previous work [17, 18].
We are aware of the problem, but we are interested in rank-
ing sellers by estimated fraud probability and using the full
data is better for this purpose. The real performance num-
bers of the ranking are those obtained by manual verification
by specialists analysis.
It is important to say that both for training and testing
datasets we excluded all transactions in AFrT that are not in
FRST , since our focus is to discover fraudsters in reputation
systems and these transactions can not be considered NFrT
neither FRST . We believe that improvements in our work
could handle AFrT too, but here our focus is just FRST .
That is the reason to discard frauds that are not related to
9TodaOferta has a program in which sellers can send their
documentation for analysis to get a certificate. It indicates
that the seller really exists and follows the rules of market-
places.
Applied Computing Review 21
reputation systems.
In order to apply the Logistic Regression Model to rank
sellers, we use the free software for statistical computing
R – for more details see http://www.r-project.org. Us-
ing command glm provided by this, we consider FRS as a
dependent variable, considering the selected characteristics
and using the same dataset for training and testing. Hence,
we obtain the estimated probability for a seller to be in FRS,
and we sort the sellers list from highest to lowest estimated
fraud probabilities. It is important to remember that in this
first application we have just used the same 17 characteris-
tics that can be used in Characteristic Counting described
in Section 5.
With this sorted list of sellers, we compute precision, recall
and F-measure for each percentage of the list. One short
summary of these results is listed in Table 5. We saw that
checking between 8% and 10% of the sellers sorted by the
estimated fraud probability, we can achieve the best value of
F-measure, covering around 50% of fraudsters against rep-
utation system in this marketplace. We will name this first
application of Logistic Regression Model as “APP1”for fur-
ther comparison.
% of sellers Recall Precision F-measure
1% 10,7% 82,1% 0,189
2% 17,7% 67,9% 0,280
3% 23,3% 59,8% 0,336
4% 29,3% 56,4% 0,386
5% 33,3% 51,3% 0,404
6% 38,0% 48,7% 0,427
7% 41,0% 45,1% 0,429
8% 46,7% 44,9% 0,458
9% 49,7% 42,5% 0,458
10% 52,7% 40,5% 0,458
15% 65,0% 33,3% 0,441
20% 76,0% 29,2% 0,422
30% 83,7% 21,5% 0,341
40% 93,7% 18,0% 0,302
50% 97,7% 14,7% 0,256
51% 97,7% 14,4% 0,252
60% 98,7% 12,4% 0,221
70% 98,7% 10,7% 0,193
80% 98,7% 9,4% 0,171
90% 98,7% 8,3% 0,154
100% 100,0% 7,6% 0,141
Table 5: APP1 - Sellers ordered by probability of
fraud (Recall, Precision and F-measure)
Analyzing the results of APP1, we can see that the average
precision is 45,5% - it represents an increment of 16.1% if
you compare to the Characteristic Counting approach (CC).
Next, we are going to compare the results of the Characteris-
tic Counting approach (see Section 5) with Logistic Regres-
sion Model in three different applications: the first one uses
the same 17 characteristics from Section 4 (which we have al-
ready done previously) and the second one uses percentages,
negative characteristics and characteristics with continuous
distribution, that represents 37 different variables (APP2)
and the third one optimizes the second model using Step-
wise Regression (APP3).
For a given data, this optimization basically selects the vari-
ables upon which the best model is based. In this work, we
use backward regression, which involves starting with all
candidate variables and testing them one by one for sta-
tistical significance, deleting any that are not significant to
the model. In this case, the selection criteria is based on
Akaike’s Information Criterion (AIC)10 . We refer the reader
to the work from Hosmer and Lemeshow [10] for a better
understanding of this optimization method.
In those three applications of Logistic Regression, we com-
pute the F-measure of both for each portion of sellers, in the
same way that we did in the first application (APP1). We
also compute average precision for the entire rank to have a
single metric for each situation.
Figure 3 shows the comparison among these 3 applications
for the sellers sorted by estimated probability. As we can see,
Logistic Regression Model performs better than Character-
istic Counting and that it is possible to improve the perfor-
mance of the results when we use the model that considers
variables that characteristic counting cannot deal with. We
have made experiments with all sellers from the dataset, but
we are only displaying the results of the first 25% because
after that limit the values are quite similar and it simplifies
our analysis.
In Figure 3 we can also see that when using more variables
(APP2 and APP3), the precision is better if you compare
with the first one that was using only positive and binary
characteristics (APP1). In Table 6 we have the values of av-
erage precision in this 4 different ways of ranking – one with
Characteristic Counting and 3 with the estimated probabil-
ity provided by the application of Logistic Regression. We
see that when we apply a Stepwise Regression the precision
values decrease a little, which shows that this kind of opti-
mization, with the original annotation in this dataset, does
not affect the global performance of this ranking.
Approach Average
Precision
CC - Characteristic Counting 38.9%
APP1 - Logistic Regression - 17 variables 45.1%
APP2 - Logistic Regression - 37 variables 53.0%
APP3 - Logistic Regression - Optimized 52.3%
Table 6: Average Precision in each ranking.
APP3 resulted in 17 variables and they are different from
APP1. With these results, that are in theory the best val-
ues that we can achieve with the characteristics that we have
extracted in this dataset, we return to specialists to ask them
to check what the actual performance of the output of these
10For more details about Akaike’s information criterion,
please check the reference from Akaike [1]. The command
in software R that performs this model selection is stepAIC.
Applied Computing Review 22
Figure 3: Comparison between Characteristic Counting approach and Logistic Regression Model in different
applications
applications of Logistic Regression Model is, especially at
the top of the list. In our previous work we discovered more
fraudsters than the original annotation [17, 18] so we know
in advance that the performance metrics could be better. It
is important to say that the best revision process would be
if it were possible to use the list generated by APP3, but
specialists also checked previous lists sorted by characteris-
tics counting when we were validating those characteristics.
After that revision, we reach an average precision of 78.0%
(we had around 53% before revision in APP2 and APP3).
As the number of elements in the predefined sets has changed
because new fraudsters were detected, we decided to build a
new model using the same variables of the second approach
that considers 37 characteristics, but using the values of the
reference variable after specialists’ revision (FRS Reviewed).
After that, we decided to optimize the model by applying
Stepwise Regression to find the best model that fits in this
dataset with the new fraudsters set. This optimization gen-
erates a model with 25 out of 37 variables and its perfor-
mance is shown in Table 7. In the final model, only one
characteristic extracted was not used (SILF). We can see
that either the binary characteristic, that was described in
Section 4, or its respective percentage in the transactions11
is present. It show that the process of characteristic extrac-
tion was correct. These variables are:
x1= pctSITL, x2= SWLB, x3= pctSWLB, x4= SWLT, x5= pctSWLF,
x6= SWSB, x7= SWST, x8= SWSF, x9= SDBS, x10 = pctSDBS,
x11 = pctSILB, x12 = SILT, x13 = SISB, x14 = pctSISB, x15 = SIST,
x16 = pctSIST, x17 = SISF, x18 = QBF, x19 = pctQBF, x20 = UDTB,
x21 = pctUDTB, x22 = SRVT, x23 = TCPS, x24 = SITL, x25 = REC1.
Analyzing the results in Table 7 we see that the precision
of this model is 100% when we consider 9% of the sellers in
the sorted list. It is also possibile to see that the best value
of F-measure is when whe check 14% of the sellers. At this
11The prefix “pct” shows that the variable refers to the per-
centage of the characteristic in transactions.
% of sellers Recall Precision F-measure
1% 6.1% 100.0% 0.115
2% 12.2% 100.0% 0.218
3% 18.4% 100.0% 0.310
4% 24.5% 100.0% 0.393
5% 30.6% 100.0% 0.469
6% 36.7% 100.0% 0.537
7% 42.9% 100.0% 0.600
8% 49.0% 100.0% 0.658
9% 55.1% 100.0% 0.711
10% 60.9% 99.5% 0.756
11% 66.9% 99.3% 0.799
12% 72.7% 98.9% 0.838
13% 76.9% 96.6% 0.857
14% 80.4% 93.8% 0.866
15% 82.9% 90.3% 0.864
15% 82.9% 90.3% 0.864
20% 89.0% 72.7% 0.800
30% 94.5% 51.5% 0.666
40% 98.0% 40.0% 0.568
50% 99.1% 31.7% 0.481
60% 100.0% 26.8% 0.422
70% 100.0% 23.0% 0.374
80% 100.0% 20.2% 0.336
90% 100.0% 17.9% 0.304
100% 100.0% 16.1% 0.278
Table 7: APP4 - Sellers ordered by probability of
fraud (Recall, Precision and F-measure)
Applied Computing Review 23
point, we reach more than 80% of fraudsters with 93.8% of
precision.
We have seen in Figure 3 that we are constantly improv-
ing precision values, especially on top of sellers’ list. But
after specialists’ revision, this performance is much better,
as we can see in Figure 4. This gain can also be verified in
Table 6, where we observe the average precision values of
each application. And in the last application (APP4), the
average precision is 93%.
Figure 4: Comparison between Logistic Regression
Model before and after specialists’ review
Another important tool to compare the performance of these
different ways to apply Logistic Regression Model and the
optimization done through the experiments in our dataset
is to analyze precision in terms of recall.
Figure 5: Precision x Recall
In Figure 5 we see that using a model optimized to the an-
notation after specialists’ revision, the precision is better if
we compare to the original annotation. It confirms that our
procedures improve the fraud detection process because we
generate a list of fraudsters sorted by an estimated prob-
ability that has high precision on the top of that list. In
this scenario, it does not generate many false-positives and
reduces the cost of fraud investigation.
Another benefit of using Logistic Regression Model is that it
is possible to input the model with the dataset of reviewed
analysis and rebuild it. Furthermore, it is possible to opti-
mize the model with the new training dataset. We believe
that it can improve the accuracy of the fraudsters’ list and
be adapted to new transactions.
The results also show an increment of the average precision
in the reviewed list compared to the original precision in
75% (from 53% to 93%). This indicates that our approach
of optimization improves significantly the fraud detection
process.
Considering the problem of overfitting that we have ex-
plained before in the beginning of this section, we are con-
ducting some experiments splitting the dataset in training
and testing data. We are sorting the transactions by a
chronological date and using the first ones (or the older ones,
in other words) to train and the last ones to test, simulating
the actual scenario of the marketplace, where new transac-
tions are occurring and these ones need to be investigated.
The results that we are obtaining show that the precision
remains in the same level that we have in this work – higher
than 90% in average precision.
One important benefit of using Logistic Regression and opti-
mization is that we can determine a threshold of F-measure
(or precision or recall), to automatically disable some sellers.
While specialists are checking our list, we observe that for
the top ranked sellers, they just disable the fraudster, but as
they continue checking, in some cases, they prefer to warn
them, considering that seller defrauds the reputation system,
but he has some transactions that are licit (this threshold
determination can be subject of future work). This situation
frequently happens when the seller has negative characteris-
tics of fraud. It offers an opportunity to punish those sellers,
to register fraud, and to give them another chance. It can
be recorded and used in future events. It also allows them
to reconsider some editorial recognition and the relationship
that was established.
7. CONCLUSIONS
E-markets constitute an important research scenario due to
their popularity and revenues over the last years. In this
scenario, reputation plays an important role, mainly for pro-
tecting buyers from fraudulent sellers. A reputation mecha-
nism tries to provide an indication of how trustworthy a user
is, based on his/her performance in previous transactions.
In online marketplaces, reputation is based on feedback sys-
tems that use the past transactions as reference to show user
performance with the intention of providing more informa-
tion for future transactions. Mostly, fraud detection is done
through reactive procedures where fraud experts conduct
an investigation from a user claim. This work is focused on
support decision for fraud detection against the reputation
systems as a complement to fraud experts decisions.
Here, we apply such a set of characteristics for fraud detec-
tion to the e-market reputation systems. Besides, we de-
scribe and evaluate two approaches for identifying frauds,
one based on fraud characteristics counting and another one
Applied Computing Review 24
based on Logistic Regression modeling. Moreover, we en-
hance the characteristics set with non-fraud characteristics
that Logistic Regression can deal with but Characteristics
Counting can not. We compare both approaches using ac-
tual data from a large Brazilian e-market (TodaOferta).
We apply Logistic Regression Model to these set of charac-
teristics. The method naturally ranks all sellers considering
their fraud against the reputation system estimated proba-
bility. We also see that it is possible to optimize the model
using stepwise regression to improve the quality of the rank-
ing. The output of this application is a list of sellers sorted
by the estimated fraud probability.
This list allows experts to prevent fraud instead of just react-
ing. In the end, using the rank provided by Logistic Regres-
sion, we increased by 112% the number of identified fraud-
sters in TodaOferta marketplace. And when we optimize the
model we reach an average precision of 93%.
As future work, we want to apply the same methodology to
identify other types of fraud along with the ones in repu-
tation systems. In particular, we are interested in finding
correlation between frauds in reputation systems and other
types of frauds. The idea of using network-based metrics [24]
to complement the current characteristics of fraud seems
also to be promising. We are planning a deeper analysis
of the final obtained logistic model, checking which char-
acteristics are stronger than others and analyzing their co-
efficients βi[10], before trying other classification and/or
ranking methods.
8. ACKNOWLEDGMENTS
This work was partially sponsored by Universo Online S.A.
- UOL (http://www.uol.com.br) and partially supported by
the Brazilian National Institute of Science and Technology
for the Web (CNPq grant no. 573871/2008-6), CAPES,
CNPq, Finep, and Fapemig. We also thank Aline Pereira
and Rodnei Lozano, from UOL, for their support on the
analysis and validation of our results.
9. REFERENCES
[1] H. Akaike. A new look at the statistical model
identification. IEEE Transactions on Automatic
Control, 19:716–723, 1974.
[2] S. Ba and P. A. Pavlou. Evidence of the effect of trust
building technology in electronic markets: price
premiums and buyer behavior. MIS Quarterly,
26(3):243–268, 2002.
[3] C. Dellarocas. Reputation mechanisms. In Handbook
on Economics and Information Systems, page 2006.
Elsevier Publishing, 2006.
[4] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern
Classification (2nd Edition). Wiley-Interscience, 2
edition, November 2000.
[5] J. Feigenbaum, D. C. Parkes, and D. M. Pennock.
Computational challenges in e-commerce. Commun.
ACM, 52(1):70–74, 2009.
[6] B. Gavish and C. L. Tucci. Reducing internet auction
fraud. Commun. ACM, 51(5):89–97, 2008.
[7] D. G. Gregg and J. E. Scott. The role of reputation
systems in reducing on-line auction fraud. Int. J.
Electron. Commerce, 10(3):95–120, 2006.
[8] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins.
Propagation of trust and distrust. In WWW ’04:
Proceedings of the 13th international conference on
World Wide Web, pages 403–412, New York, NY,
USA, 2004. ACM.
[9] C. Holahan. Auctions on ebay: A dying breed.
BusinessWeek online, jun 2008.
[10] D. W. Hosmer and S. Lemeshow. Applied logistic
regression. Wiley-Interscience Publication, September
2000. (Wiley Series in probability and statistics).
[11] D. Houser and J. Wooders. Reputation in auctions:
Theory, and evidence from ebay. Journal of Economics
& Management Strategy, 15(2):353–369, 06 2006.
[12] T. B. Klos and F. Alkemade. Trusted intermediating
agents in electronic trade networks. In AAMAS ’05:
Proceedings of the fourth international joint conference
on Autonomous agents and multiagent systems, pages
1249–1250, New York, NY, USA, 2005. ACM.
[13] T. T. Le. Pathways to leadership for
business-to-business electronic marketplaces.
Electronic Markets, 12(2), 2002.
[14] M. J. Lenard and P. Alam. Organizational Data
Mining: Leveraging Enterprise Data Resources for
Optimal Performance, chapter The Use of Fuzzy Logic
and Expert Reasoning for Knowledge Management
and Discovery of Financial Reporting Fraud. Idea
Group Publishing, Hershey, PA, 2004.
[15] M. J. Lenard and P. Alam. An historical perspective
on fraud detection: From bankruptcy models to most
effective indicators of fraud in recent incidents. Journal
of Forensic & Investigative Accounting, 1(1), 2009.
[16] Y.-I. Lou and M.-L. Wang. Fraud risk factor of the
fraud triangle assessing the likelihood of fraudulent
financial reporting. Journal of Business & Economics
Research, 7(2):61–78, February 2009.
[17] R. Maranzato, M. Neubert, A. Pereira, and A. P.
do Lago. Feature extraction for fraud detection in
electronic marketplaces. In LA-WEB 2009: 7th Latin
American Web Congress, M´erida, M´exico, 2009. IEEE
Computer Society.
[18] R. Maranzato, M. Neubert, A. Pereira, and A. P.
do Lago. Fraud detection in reputation systems in
e-markets using logistic regression. In SAC ’10:
Proceedings of the 2010 ACM symposium on Applied
computing, Sierre, Switzerland, 2010. ACM.
[19] R. P. Maranzato. Identifica¸c˜ao de fraude contra
sistemas de reputa¸c˜ao em mercados eletrˆonicos.
Master’s thesis, Instituto de Matem´atica e Estat´ıstica
- Universidade de S˜ao Paulo, S˜ao Paulo, Brasil, 2010.
[20] S. P. Marsh. Formalising Trust as a Computational
Concept. PhD thesis, Department of Mathematics and
Computer Science, University of Stirling, 1994.
[21] M. I. Melnik and J. Alm. Does a seller’s ecommerce
reputation matter? evidence from ebay auctions.
Journal of Industrial Economics, 50(3):337–49,
September 2002.
Applied Computing Review 25
[22] J. A. Ohlson. Financial ratios and the probabilistic
prediction of bankruptcy. Journal of Accounting
Research, 18:109–131, 1980.
[23] A. M. Pereira, D. Duarte, W. M. Jr., V. Almeida, and
P. G´oes. Analyzing seller practices in a brazilian
marketplace. In 18th International World Wide Web
Conference, pages 1031–1041, April 2009.
[24] A. M. Pereira, A. Silva, W. Meira, Jr., and
V. Almeida. Seller’s credibility in electronic markets:
a complex network based approach. In WICOW ’09:
Proceedings of the 3rd workshop on Information
credibility on the web - WWW’09 workshop, pages
59–66, New York, NY, USA, 2009. ACM.
[25] P. Resnick, K. Kuwabara, R. Zeckhauser, and
E. Friedman. Reputation systems. Commun. ACM,
43(12):45–48, 2000.
[26] P. Resnick and R. Zeckhauser. Trust among strangers
in internet transactions: Empirical analysis of ebay’s
reputation system. The Economics of the Internet and
E-Commerce, edited by M.R. Baye. Amsterdam:
Elsevier Science B.V.:127–157, 2002.
[27] P. Resnick, R. Zeckhauser, J. Swanson, and
K. Lockwood. The value of reputation on ebay: A
controlled experiment. School of Information,
University of Michigan, Ann Arbor, Michigan,
USA:34, 2003.
[28] S. Viaene, R. A. Derrig, B. Baesens, and G. Dedene. A
comparison of state-of-the-art classification techniques
for expert automobile insurance fraud detection.
Journal of Risk and Insurance, 69(3):373–421, 2002.
[29] S. M. Weiss and C. A. Kulikowski. Computer Systems
That Learn: Classification and Prediction Methods
from Statistics, Neural Nets, Machine Learning, and
Expert Systems. Morgan Kaufmann, 1991.
[30] G. Zacharia, A. Moukas, and P. Maes. Collaborative
reputation mechanisms for electronic marketplaces.
Decision Support Systems, 29(4):371 – 388, 2000.
Applied Computing Review 26
About the authors:
Rafael Plana Maranzato is a project manager in the department of research and
development of ecommerce products at Universo Online (UOL), the largest
Brazilian Internet portal. He holds a bachelor degree in mathematics at Fundação
Santo André-SP (FSA) in 1999 and received his master degree in 2010 at the
Mathematics & Statistics Institute (IME) of University of São Paulo (USP) - his
research was about fraud detection in reputation systems in marketplaces. He is
also interested in agile development and project management.
Adriano C. Machado Pereira is an Adjunct Professor at Federal Center of
Technological Education of Minas Gerais (CEFET-MG), Brazil. He received his
bachelor degree in Computer Science at UFMG in 2000, his MSc. in 2002, and
his Ph.D. in 2007. His research interests include e-Business, Information and
System’s Credibility, Distributed Systems, Web 2.0, Social Networks,
Performance of Computer Systems, Web Technologies, Workload
Characterization, Formal Methods, and Business Intelligence. He has worked as a
consultant for the Brazilian government and United Nations Development
Programme (UNDP) in a project of an Open Source Cooperation Network for e-
governance for Latin America. He is also a member of the Brazilian National
Institute of Science and Technology for the Web (www.inweb.org.br). He is also
a Post-Doc researcher in Computer Science in a cooperated project with UOL, the
largest Latin American Internet Service Provider.
Marden Neubert holds a bachelor degree and a Master's degree in Computer
Science from Federal University of Minas Gerais (UFMG), Brazil. He has
researched topics in Information Retrieval, Software Engineering and Fraud
Detection. Currently he works as Research and Development Director at Universo
Online (UOL), the largest Brazilian Internet portal.
A. Pereira do Lago, an assistant professor in the Computer Science Department of
University of São Paulo, is one of Imre Simon's students, a prized Hungarian
mathematician who helped to found Computer Science in Brazil. Imre improved
the mathematical rigor that was already present in the student since he also
obtained the third premium in the International Olympiads on Mathematics, in
1983. They both helped to solve a 20 years old conjecture in Automata Theory
and Formal Languages, and their work was cited by important researchers like
Mark Sapir and Nachum Dershowitz. Supervising other students on their Masters
and/or PhD work, the former student has also been able to develop a quite
rigorous research is areas as different as Operational Systems, Computational
Biology, Advanced Data Structures, Information Retrieval, Formal Concept
Analysis and Fraud Detection.