Generalized BadRank with Graduated Trust


BadRank is a method for detecting spam web sites, based on the premise that a page is spam if it points to another spam page; i.e., the BadRank score of a page is the weighted sum of the BadRank scores of the pages that it links to. BadRank is an important tool in spam detection. We consider the mathematical structure of BadRank, showing how it can be modified to guarantee that the iterates converge. Additionally, we consider methods for incorporating knowledge about trusted (known non-spam) sites into the BadRank calculation by changing the underlying iteration matrix. The effectiveness of BadRank in web spam detection is demonstrated in a statistically significant evaluation on the WEBSPAM-UK2007 data set.

Tech. Rep., Sandia National Laboratories
T. G. Kolda, M. J. Procopio. Generalized BadRank with Graduated Trust. Tech. Rep. No. SAND2009-6670, Sandia National Laboratories, 2009.


PageRank, Web Spam, Distrust


