Generalized BadRank with Graduated Trust

Abstract

BadRank is a method for detecting spam web sites, based on the premise that a page is spam if it points to another spam page; i.e., the BadRank score of a page is the weighted sum of the BadRank scores of the pages that it links to. BadRank is an important tool in spam detection. We consider the mathematical structure of BadRank, showing how it can be modified to guarantee that the iterates converge. Additionally, we consider methods for incorporating knowledge about trusted (known non-spam) sites into the BadRank calculation by changing the underlying iteration matrix. The effectiveness of BadRank in web spam detection is demonstrated in a statistically significant evaluation on the WEBSPAM-UK2007 data set.

Publication
Tech. Rep., Sandia National Laboratories
Date
Links
Citation
T. G. Kolda, M. J. Procopio. Generalized BadRank with Graduated Trust. Tech. Rep. No. SAND2009-6670, Sandia National Laboratories, 2009.

Keywords

PageRank, Web Spam, Distrust

BibTeX

@techreport{SAND2009-6670,  
author = {Tamara G. Kolda and Michael J. Procopio}, 
title = {Generalized BadRank with Graduated Trust}, 
number = {SAND2009-6670}, 
institution = {Sandia National Laboratories}, 
month = {October}, 
year = {2009},
}