Spam Filter Optimality Based on Signal Detection Theory
University Graduate Center
University Graduate Center
University of Oslo, Norway
Md. Sadek Ferdous
University Graduate Center
University of Tartu, Estonia
University Graduate Center
Unsolicited bulk email, commonly known as spam, represents
a signiﬁcant problem on the Internet. The seriousness of the
situation is reﬂected by the fact that approximately 97% of
the total e-mail traﬃc currently (2009) is spam. To ﬁght
this problem, various anti-spam methods have been proposed
and are implemented to ﬁlter out spam before it gets deliv-
ered to recipients, but none of these methods are entirely
satisfactory. In this paper we analyze the properties of spam
ﬁlters from the viewpoint of Signal Detection Theory (SDT).
The Bayesian approach of Signal Detection Theory provides
a basis for determining the optimality of spam ﬁlters, i.e.
whether they provide positive utility to users. In the process
of decision making by a spam ﬁlter various tradeoﬀs are con-
sidered as a function of the costs of incorrect decisions and
the beneﬁts of correct decisions.
Categories and Subject Descriptors
D.m [Software]: Miscellaneous; D.m [Software]: Miscella-
performance, security, measurement
Spam, e-mail, ﬁlters, tradeoﬀs, Optimality, Signal Detection
Spam in the form of unwanted email is a huge and grow-
ing problem. The amount of spam that circulates through
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
SIN’09, October 6–10, 2009, North Cyprus, Turkey.
Copyright 2009 ACM 978-1-60558-412-6/09/10 ...$10.00.
the Internet is increasing day by day, and is aﬀecting every-
one on the Internet, ranging from network providers to In-
ternet Service Providers (ISPˇ
Ss), companies and end users.
Manually deleting spam in the inbox every day is annoying
and time consuming for all Internet users. In  it has been
found that approximately 97% of the total email traﬃc these
days consists of spam. The problem gets even worse when
spam is used to actively harm the recipients by attacks like
such as phishing and 419 Scams [11, 7]. Apart from these
threats, spam causes waste of time and money. For exam-
ple in a survey conducted in 2006 among employees of 500
large companies in US and Finland, it was found that on an
average an employee spends 13 minutes of his daily working
time in reading, deleting or replying to spam messages.
The increasing amount of spam has attracted the atten-
tion of Internet and security experts. As a result many
anti spam strategies have been proposed and implemented.
Current work also investigates methods to completely block
spam. The main reason behind the increasing amount of
spam lies in the cost imbalance between senders and recipi-
ents. Sending large amounts of spam has a very small cost
compared to the relatively high cost of viewing and deleting
a single spam message. Millions of emails can be sent per
hour with just 56 kbps of bandwidth. According to, if
even one among 500,000 spam messages of direct-mail print
campaigns attracts a recipient to buy the product then the
whole cost incurred in sending 500,000 spams is covered.
On the other hand the recipients and the ISPs have to carry
signiﬁcant costs. The most obvious cost is the bandwidth
consumed for processing spam. In large organization the
charging for Internet connections is based on traﬃc, and be-
cause of spam traﬃc these ﬁrms end up paying signiﬁcant
amounts for non-productive traﬃc. On the ISP side the cost
comes from wasted bandwidth and CPU time.
It is important to understand, analyze and measure the
eﬀectiveness and eﬃciency of the spam ﬁlters in order to im-
prove their quality. In the context of spam ﬁlters, ”eﬀective-
ness” means the degree to which genuine spam is detected
and removed. On the other hand, ”eﬃciency” means the de-
gree to which genuine email messages are correctly delivered.
A ﬁlter that removes most spam messages will have high ef-
fectiveness, but if it removes many genuine email messages
together with spam messages it will have poor eﬃciency.
SDT (Signal Detection Theory)[10, 2] is a mathematical
model that is suitable for analyzing the eﬀectiveness and ef-
ﬁciency of spam ﬁlters. SDT provides a rational basis for
decision making under conditions of uncertainty. For exam-
ple, the question ”Is this my dog barking, or is it just the
television?” is a typical situation where SDT can be applied
to guide the dog owner to the most optimal action, i.e. to
ignore the sound, or to go to look after the dog. Visualiza-
tion techniques used in SDT can provide additional decision
support in situations of uncertainty.
Section 2 brieﬂy describes related studies on analyzing
spam ﬁlter performance. Section 3 presents the background
of SDT, Section 4 describes how SDT can be applied to spam
ﬁlter analysis, Section 5 discusses the presented technique,
and Section 6 concludes this paper.
2. RELATED WORK
In the context of spam ﬁltering, genuine (non-spam) email
messages are commonly called ”ham”. Since spam ﬁlters
are trying to identify spam, a message identiﬁed as spam
is called a ”positive”. A ham message incorrectly classiﬁed
as spam therefore represents an instance of false positive
(FP), and a spam message identiﬁed as ham represents a
false negative (FN).
Various analyzes of the performance of spam ﬁlters have
been done in previous studies. The eﬀectiveness of a spam
ﬁlter is aﬀected by the domain in which it is used. For exam-
ple the cost of a lost genuine email message incorrectly de-
tected as spam will depend on the recipient’s (and sender’s)
business area, as well as on the recipient’s (and sender’s)
perception, attitude and level of frustration.
A method for analyzing spam ﬁlters was proposed by Gar-
cia et al. in 2004 . Garcia’s analysis was restricted to open
source ﬁlters, and only considered content based ﬁlters, i.e.
not for example black/white lists. In  apart from com-
puting false positive and false negative rates, a function was
proposed for calculating a single measure of a ﬁlterˇ
ror rate as a function of its false positive and false negative
Another approach to analyzing spam ﬁlter performance is
through the Precision and Recall metrics. This method was
extensively used for spam ﬁlter classiﬁcation in . Preci-
sion is the ratio of spam messages classiﬁed as spam relative
to the total number of messages classiﬁed as spam, and Re-
call is the ratio of spam messages classiﬁed as spam relative
to the total number of spam messages. For example, if 5
out of 10 spam messages are correctly identiﬁed as spam
then the Recall rate is 0.5. As long as no ham messages
are classiﬁed as spam the Precision will be 1, but as soon
as some ham messages are incorrectly classiﬁed as spam the
Precision falls below 1. For spam ﬁlters, an instance of FP
is normally considered more problematic than an instance of
FN. Precision which reﬂects a ﬁlter’s FP property is there-
fore considered to be a more important measure than Recall
which reﬂects the ﬁlter’s FN property. The Precision value
therefore needs to be higher than the Recall value, but at
the same time there should be a proper balance between the
Another proposed method for measuring the eﬀectiveness
of spam ﬁlters is Weighted Accuracy which uses the accuracy
and error rate as measures . They assign equal relative
weights (λ) to the error types FP (False Positive) and FN
(False Negative), as well as to the correct classiﬁcation types.
An instance of FP counts λtimes an instance of FN. An
instance of TN (true negative), i.e. a correct classiﬁcation
of a genuine email message, counts λtimes an instance of
TP (true positive), i.e. a correct classiﬁcation of spam. This
method reﬂects that an instance of FP is λtimes more costly
than an instance of FN.
In , 10-fold cross validation is used as an evaluation
method to estimate how well the ﬁlter works after training.
According to this method the corpus is spilt into 10 mutually
exclusive parts and the subject is tested against all of these
parts. And ﬁnally the estimation is made on the basis of the
mean of all the tests.
The ROC (Receiver Operating Characteristics) curve is
another method for spam ﬁlter evaluation suggested by Hi-
dalgo in . It has a discrimination threshold value which
when varied produces the trade-oﬀ between FP and TP.
From a visualization viewpoint, if the ROC curve of one
ﬁlter is uniformly above than that of another ﬁlter, then
it can be inferred that the performance of the ﬁrst ﬁlter is
superior that of the other.
3. SIGNAL DETECTION THEORY
This section presents a model for analyzing spam ﬁlters
based on SDT (Signal Detection Theory)[10, 2, 9, 15]. SDT
is based on probability theory and is an eﬀective means to
analyze ambiguous data. In the SDT framework each event
is assumed to be either: 1) signal (from a known process) or
2) noise (from an unknown process). SDT provides a formal
framework for setting optimal thresholds for distinguishing
between signal and noise. For example, in radar system the
operator tries to determine from the display on the radar
screen whether it is a signal (aircraft) or a noise (bird or
something else), and setting the optimal decision threshold
is importance for the success of military operations.
SDT assumes that signal and noise distributions overlap
each other and that an observed stimulus may come from
any side of the distribution. In addition to this SDT also
assumes that the signal is added to the noise and that the
decision maker tries to ﬁnd out the optimal performance by
balancing cost and beneﬁt.
Fig.1 shows the SDT model with the two distributions
(signal and noise) assuming that both distributions are nor-
mal with equal standard deviations. The X-axis / horizon-
tal axis represents the strength of the internal response (also
called hidden variable, decision variable or internal variable)
which is a function of the external observed stimulus. The
Y-axis / vertical axis represents the probability of the inter-
nal response. These distributions are used in the process of
making the decision whether the stimulus represents signal
or noise. The vertical line between the two distributions is
the criterion threshold for the internal response that is used
to make a decision.
In the process of decision making any internal response
with a value less than the criterion is determined to come
from the noise distribution while an internal response with
a value greater than the criterion is determined to come
from the signal distribution. After receiving the stimulus
the decision maker has to decide whether to accept or reject
The overlapping between noise distribution and signal dis-
tribution results in four possible decisions which are shown
Figure 1: SDT model showing overlap between sig-
nal and noise distribution
•False Negative (FN): Stimulus coming from the signal
distribution incorrectly detected as noise1.
•True Positive (TP): Stimulus coming from the signal
distribution correctly detected as signal2.
•False Positive (FP): Stimulus coming from the noise
distribution incorrectly detected as signal3.
•True Negative (TN): Stimulus coming from the noise
distribution correctly detected as noise 4.
FP and FN are also known as Type I error and Type II
errors respectively in statistics. The SDT decision making
method is based on the concepts of TP Rate and FP Rate.
The TP Rate is the total number of times a genuine signal
is detected as signal divided by the total number of genuine
signals. Hence, it can be calculated as follows:
TP Rate = TP
TP + FN (1)
The FP Rate is the total number of times genuine noise is
detected as signal, divided by the total number of genuine
noise instances. Hence the FP Rate can be calculated using
the following formula:
FP Rate = FP
FP + TN (2)
It can be noted that the sum of the TP and FN Rates, as
well as the sum of the FP and TN Rates both are equal to
1. This can be expressed as:
FN Rate = 1 −TP Rate
TN Rate = 1 −FP Rate
Fig. 3 illustrates the analysis of TP and FP rates. The
lower half of Fig. 3 sets the criterion at the left-most edge
1Called ”Miss” in SDT terminology.
2Called ”Hit” in SDT terminology.
3Called ”False Alarm” or ”FA” in SDT terminology.
4Called ”Correct Identiﬁcation” or ”CI” or ”Correct Rejec-
tion” or ”CR”in SDT terminology.
Figure 2: The model of SDT showing TP,FN,FP and
of the signal distribution. Statistically, it means that the
TP Rate is 100%. Let us assume the example of a doctor
who makes the decision whether there is a tumor in the
brain based on the internal response of a brain scan. If
the value of the criterion is lowered such that the TP Rate
is 100% then the FP Rate also increases as shown in the
lower half of Fig. 3. The doctor will therefore never miss
a real tumor, but a negative side-eﬀect of increasing TP
Rate is a corresponding increase in the FP rate. In case
the criterion value is increased to the rightmost edge of the
noise distribution as shown in the upper half of Fig.3 then
the FP Rate becomes 0%, but at the same time the TP
Rate also gets very low. This means that the doctor gets no
false alarms, but will miss many real tumors. The optimal
criterion value will depend on the cost of FPs and FNs.
SDT assumes that it is practically impossible to simulta-
neously have a 100% TP Rate and 0% FP Rate because of
the overlap between the signal and the noise distributions.
STD oﬀers a method for deﬁning the criterion value which
will result in optimal decision making. Thus the choice of
the criterion value is important. In this paper we use STD
and Bayesian methods for analyzing spam ﬁlters with regard
to their inherent criterion values.
SDT based decision making is mainly inﬂuenced by two
1. Likelihood Ratio (LR) which can be called as Actual
2. Optimal LR (LR’) which is compared with the actual
LR to ﬁnd out the optimality of the decision maker.
Actual LR is calculated using the following formula:
LR = TP Rate
FP Rate (4)
Figure 3: SDT model showing showing criterion
at two diﬀerent places: FP Rates=0% and TP
The Optimal LR value is dependent on the base rate prob-
abilities of stimulus being signal or noise, and also on the
costs of incorrect and the beneﬁts of correct detection and
it is calculated by multiplying the ratio of the base rate prob-
ability of noise P(noise) and the base rate probability of sig-
nal P(signal) with a constant Kthat incorporates the costs
of errors and beneﬁts of correct identiﬁcations. Note that
for every stimulus, the equation P(noise) + P(signal) = 1
where the constant Kis calculated as follows:
K=Beneﬁts of TN −Costs of FP
Beneﬁts of TP −Costs of FN (6)
In the process of decision making in SDT the four possible
outcomes are TP, FN, FP and TN. The decision matrix of
the spam detector is shown in Fig.4.
Eq.(6) is useful in deciding whether the decision maker is
behaving optimally or not. The all four values in the Eq.(6)
can be diﬀerent and there could be signiﬁcantly large dif-
ference. For example, in the case of Tsunami detection the
cost of FN is very high while in the case of Spam detection
the cost of FP is relatively high in comparison with the cost
of FN. The Bayesian approach used in this paper for deci-
sion making considers all the costs and beneﬁts and various
4. SIGNAL DETECTION THEORY USED FOR
SPAM FILTER ANALYSIS
Spam ﬁlters are used to separate spam from ham. A spam
ﬁlter carries out this separation in diﬀerent ways. For exam-
ple, content based ﬁltering  is done by analyzing the body
of the message. Origin based ﬁltering is done by judg-
ing the source of the message. SDT can be used to analyze
the spam ﬁlters based on a single method as well as ﬁlters
based on multiple methods like those used by email service
providers like: Gmail, Yahoo mail and Hotmail.
When applying SDT to spam ﬁlter analysis, we will use
the terminology convention that an instance of spam is con-
sidered as a signal, and an instance of ham is considered as
noise. Within the SDT framework, the diﬃculty of distin-
guishing between spam and ham increases with the degree
of overlap between the two distributions, as would be ex-
pected. The overlap between spam and ham distributions
results in two types of incorrect and two types of correct
decisions, deﬁned as:
1. Ham classiﬁed as ham (TN)
2. Spam classiﬁed as ham (FN)
3. Spam classiﬁed as spam (TP)
4. Ham classiﬁed as spam (FP)
The 3rd and 4th outcomes are important from the SDT
point of view as they are used in the mathematical expres-
sions. In the following Sdenotes a genuine spam message,
and S′denotes an assumed spam message. Similarly, Hde-
notes a genuine ham message, and H′denotes an assumed
ham message. The four possible outcomes of the spam ﬁl-
ter are shown in Table 4. P(S′|S), P(H′|S), P(S′|H) and
P(H′|H) in the Fig. 4 represents the four conditional prob-
Figure 4: Decision Matrix for a spam ﬁlter showing
four possible cases
All the four possible cases are dependent on each other.
For example, when the message really is spam (1st row)
the proportion of TP and FN add up to 1 because the ﬁl-
ter can only respond in one of the two ways- either Yes or
No. Likewise when the message really is ham (2nd row), the
proportion of FP and TN add up to 1. Thus all the infor-
mation in the decision matrix can be obtained from TP and
FP. Therefore we have
P(H′|S) = 1 −P(S′|S)(7)
P(H′|H) = 1 −P(S′|H)(8)
The conditional probabilities P(S′|S) and P(S′|H) repre-
sent the TP and FP rates respectively. The TP rate in-
dicates the successful ﬁltering of spam messages, and can
therefore be used to analyze the eﬀectiveness of the spam
ﬁlter. The FP rate on the other hand shows errors which
can be used to determine the eﬃciency of spam ﬁlters. Ef-
ﬁciency can be increased by reducing the FP rate. The ef-
fectiveness of the spam ﬁlter increases as the TP rate gets
closer to 1 and the eﬃciency increases as the FP rate gets
closer to 0.
It can be easily concluded that spam ﬁlters will behave
in the best way when the TP rate is maximum and the
FP rate is minimum. Practically no automated spam ﬁlter
can be both 100% eﬀective and 100% eﬃcient at the same
time. The reason for this is of course that clever composition
of spam messages give them similar characteristics to ham
messages. For automated ﬁlters that do not have the same
cognitive and semantic capabilities as humans, separation
between ham and spam is not always possible.
Spam ﬁlters makes use of the TP rate and the FP rate to
calculate the LR (Likelihood Ratio). The formula to calcu-
late the LR is as follows:
LR = TP
After the Actual LR has been calculated it is compared with
the Optimal LR (LR’). The LR’ is calculated using the base
rate probabilities of occurrence of spam messages in a repre-
sentative set of messages. In addition, LR’ is also based on
the cost associated incorrect and the beneﬁts associated with
correct decisions. With the goal of maximizing the gains and
minimizing the losses, LR’ value can be calculated as follows:
where P(H) and P(S) represent the base rate probabilities
of ham and spam in the message set. The additivity P(H)+
P(S) = 1 always holds.
In the above equation BH′|Hdenotes the beneﬁt asso-
ciated with TN, and BS′|Sdenotes the beneﬁt associated
with TP. Similarly CS′|Hdenotes the cost associated with
FP, and CH′|Sdenotes the cost associated with FN. Eq.(10)
shows that the optimal LR’ value depends on two factors:
1. Base rates of spam and ham
2. Relative costs of errors and beneﬁts of correct identi-
In Eq.(10) if the cost of errors is the same as the beneﬁts
of correct responses then the value of LR’ becomes equal to
the fraction of base rate probabilities of spam and ham i.e.
From empirical researches [13, 5, 3] it has been found that
the base rate probability of spam aﬀects the detection of
spam. The base rate probability will therefore inﬂuence the
The cost of FP is normally signiﬁcantly higher than the
cost of FN. People are normally more concerned about the
loss of a ham that about receiving a spam. With the help of
Eq.(11) diﬀerent aspects of the spam ﬁlter can be evaluated
While comparing LR and LR’ the most optimal tuning of
the spam ﬁlter is when the following equation holds:
In case the LR is equal to LR’ then it can be concluded that
the spam ﬁlter is optimal for the particular user otherwise
Eq.(11) represents the equation for a ﬁlter equipped with
just one technique to distinguish between ham and spam,
meaning that it will maximize the utility for the user. When
a spam ﬁlter has more than one ﬁltering techniques, which
is generally the case, then additional considerations must be
All the ﬁltering techniques are assumed to be in sequence.
In addition to this, the inherent characteristics of each ﬁl-
tering technique are statistically independent of each other.
If the ﬁltering techniques are not statistically independent
then the sequential set of ﬁlters is assumed to consist of just
one ﬁltering technique, and this ﬁlter would be relatively less
eﬀective. A ﬁltering technique at one point in the chain will
change the base rate probabilities for the next ﬁltering tech-
nique in the chain. If the base rate probabilities are changed
by the stimulus emanating from the 1st ﬁltering technique,
it should result in LR equal to that of Eq.(9). This new
value will be denoted as LR1.
In the above equation the left hand side determines the new
base rate probability for the 2nd ﬁltering technique. The
base rate probability and the LR changes every time an e-
mail passes through the new ﬁltering technique. LR1indi-
cates the LR after the 1st ﬁltering technique. If the ﬁlter
incorporates nﬁltering techniques then the Eq.13 changes
Eq.15 can be used for analyzing multiple technique spam
It can be concluded from the Eq.(11) that if the base rate
probabilities of spam and ham are equal, then we get
P(H)/P (S) = 1 .(16)
If in addition the cost beneﬁt diﬀerences are balanced,
(BH′|H+CS′|H) = (BS′|S+CH′|S),(17)
then the LR’ becomes equal to 1. This means TP rate is
equal to FP rate which is not good at all from the ﬁlter’s
eﬃciency and aﬀectivity point of view.
Considering the scenario from FP rate and FN rate point
of view then we can easily conclude that either the FP can
be minimized or the FN can be minimized but not both at
the same time. Therefore Eq.(11) helps in ﬁnding out the
optimal criterion value.
In case of e-mails one would normally prefer receiving a
spam message over losing a ham message because the cost of
a FP is signiﬁcantly higher than cost of a FN. Therefore, in
order to be more eﬃcient, spam ﬁlters should use a stricter
criterion while classifying e-mails. Since a spam message
represents a signal for the spam ﬁlter, by behaving stricter
the spam ﬁlter would classify incoming messages as ham,
even with a certain likelihood of being a spam. This would
eventually result in ham messages ending up in the normal
inbox. Hence, resulting in less FPs.
If we consider the Eq.(15) derived from the perspective
of multi-technique spam ﬁlters then we can ﬁnd interesting
results. Assuming that beneﬁts of correct responses are ap-
proximately equal. The major diﬀerence lies between the
costs associated with the FN and FP (generally the main
concern is with the FP and FN rates). Therefore, assuming
the payoﬀs as the ratio of cost of a FP and cost of a FN.
Moreover, considering modern day spam and spam ﬁlters
we assume that the base rate probability of spam is equal
to 97% i.e. P(H)|P(S) = 3/97 and the TP and FP rates
are 80% and 20% respectively and also assuming the pay-
oﬀs at right hand side of Eq.(15) to be 1000/1 then a ﬁlter
needs to incorporate three ﬁltering techniques to satisfy the
needs and provide positive utility to the user as shown by
the calculation: (80/20)*(80/20)*(80/20) which is greater
than (3/97)*1000/1. This means likelihood ratio is greater
and hence means less FP rate and more TP rate.
Smaller the diﬀerence between the LR and LR’, lesser the
tuning will be needed for the spam ﬁlters to behave opti-
mally for the particular user.
This paper describes the analysis of spam ﬁlters within
the framework of signal detection theory.
The criterion value plays an important part in decision
making. It represents the environment in which the spam
ﬁlter operates as well as the user’s subjective view of the
cost and beneﬁts of false and correct ﬁltering.
For a spam ﬁlter that is perfect, the cost and beneﬁts of
false and correct ﬁltering are less important. The spam ﬁlter
will normally make optimal ﬁltering decisions and provide
positive utility for the user.
However, if the spam ﬁlter characteristics are not close to
optimal, the values that the user assigns to the cost of in-
correct ﬁltering and the beneﬁts of correct ﬁltering do mat-
ter for determining whether the ﬁlter behaves optimally, i.e.
whether it provides positive utility. If not, the user would
be better of not using the spam ﬁlter, because that would
provide better utility.
 Security intelligence. Technical report, Microsoft,
 H. Abdi. Signal detection theory (sdt) overview.
 A. R. Agustin Orﬁla, Javier Carbo. Decision model
analysis for spam. Information and Security: An
International Journal, 15(2):151–161, 2004.
 G. P. F. R. Andre Bergholz, Jeong-Ho Chang and
S. Strobel. Improved phishing detection using
model-based features. In Fifth Conference on Email
and Anti-Spam. CEAS, August 2008.
 I. Androutsopoulos, J. Koutsias, K. V. Ch,
G. Paliouras, and C. D. Spyropoulos. An evaluation of
naive bayesian anti-spam ﬁltering. In Proceedings of
the workshop on Machine Learning in the New
Information Age, G. Potamias, V. Moustakis and M.
van Someren (eds.), 11th European Conference on
Machine Learning, pages 9–17, 2000.
 A. Cournane and R. Hunt. An analysis of the tools
used for the generation and prevention of spam.
 M. A. Dyrud. ”i brought you a good news”: An
analysis of nigerian 419 letters. In Proc. of the 2005
Association for Business Communication Annual
 F. D. Garcia, J. henk Hoepman, and J. V.
Nieuwenhuizen. Spam ﬁlter analysis. In in
’Proceedings of 19th IFIP International Information
Security Conference, WCC2004-SEC, pages 395–410.
Kluwer Academic Publishers, 2004.
 D. M. Green and J. A. Swets. Signal Detection Theory
and Psychophysics. Peninsula Publishing, 1966.
 D. Heeger. Signal detection theory. Technical report,
November 1997. Available
 A. Jøsang and S. Pope. User centric identity
management. In in Asia Paciﬁc Information
Technology Security Conference, AusCERT2005,
Austrailia, pages 77–89, 2005.
 S. Mikko and S. Carl. Eﬀective anti-spam strategies in
companies: An international study. In HICSS ’06:
Proceedings of the 39th Annual Hawaii International
Conference on System Sciences, page 127.3,
Washington, DC, USA, 2006. IEEE Computer Society.
 M. Sahami, S. Dumais, D. Heckerman, and
E. Horvitz. A bayesian approach to ﬁltering junk
e-mail. In AAAI Workshop on Learning for Text
Categorization, July 1998.
 S. Shirali-Shahreza and A. Movaghar. A new
anti-spam protocol using captcha. pages 234–238,
 T. D. Wickens. Elementary Signal Detection Theory.
Oxford University Press (OUP), 2001.