May 2007
·
2,001 Reads
·
804 Citations
Phishing is a significant problem involving fraudul ent email and web sites that trick unsuspecting users into reveal ing private information. In this paper, we present the design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF i nformation retrieval algorithm. We also discuss the design and evaluation of several heuristics we developed to reduce false pos itives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phis hing sites.