A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters.

Journal of Machine Learning Research (Impact Factor: 2.85). 01/2008; 9:1115-1146. DOI: 10.1145/1390681.1390719
Source: DBLP

ABSTRACT Statistical spam filters are known to be vulnerable to adversarial attacks. One of the more common adversarial attacks, known as the good word attack, thwarts spam filters by appending to spam messages sets of "good" words, which are words that are common in legitimate email but rare in spam. We present a counterattack strategy that attempts to differentiate spam from legitimate email in the input space by transforming each email into a bag of multiple segments, and subsequently applying multiple instance logistic regression on the bags. We treat each segment in the bag as an instance. An email is classified as spam if at least one instance in the corresponding bag is spam, and as legitimate if all the instances in it are legitimate. We show that a classifier using our multiple instance counterattack strategy is more robust to good word attacks than its single instance counterpart and other single instance learners commonly used in the spam filtering domain.

  • [Show abstract] [Hide abstract]
    ABSTRACT: An increasing number of machine learning applications involve detecting the malicious behavior of an attacker who wishes to avoid detection. In such domains, attackers modify their behavior to evade the classifier while accomplishing their goals as efficiently as possible. The attackers typically do not know the exact classifier parameters, but they may be able to evade it by observing the classifier's behavior on test instances that they construct. For example, spammers may learn the most effective ways to modify their spams by sending test emails to accounts they control. This problem setting has been formally analyzed for linear classifiers with discrete features and convex-inducing classifiers with continuous features, but never for non-linear classifiers with discrete features. In this paper, we extend previous ACRE learning results to convex polytopes representing unions or intersections of linear classifiers. We prove that exponentially many queries are required in the worst case, but that when the features used by the component classifiers are disjoint, previous attacks on linear classifiers can be adapted to efficiently attack them. In experiments, we further analyze the cost and number of queries required to attack different types of classifiers. These results move us closer to a comprehensive understanding of the relative vulnerability of different types of classifiers to malicious adversaries.
    Proceedings of the 2013 ACM workshop on Artificial intelligence and security; 11/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an agent-based model to analyse behaviour produced under noisy and deceptive information conditions. A simple yet powerful simulation environment is developed where adaptive agents act and adapt to varying levels of information quality that they sense about their environment. The simulation environment consists of two types of agents moving in a bounded two-dimensional continuous plane: a neuro-evolutionary learning agent that adapts its manoeuvreing strategies to escape a pre-programmed deceptive agent; and a pre-programmed agent, whose goal is to capture the adaptive agent, that acts on noisy information about the adaptive agent’s manoeuvres that it senses from the environment. The pre-programmed agent is also able to produce deceptive actions to confuse the adaptive agent. The behaviour is represented in terms of the manoeuvreing strategies that the agents adopt as their actions to the environmental changes. A behaviour analysis methodology is developed to compare agent actions under different information conditions, that elicits interesting relationships between behaviour and the studied information conditions. The framework is easily extendable to analyse human behaviour in similar environments by replacing the adaptive agent with an interactive human–machine interface.
    Adaptive Behavior 04/2013; 21(2):96-117. · 1.15 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the challenge of large margin classification for spam filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afford to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spam messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack.
    Journal of Zhejiang University: Science C 03/2012; 13(3). · 0.38 Impact Factor


Available from