We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several met- rics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
Link-based characterization and detection of web spam / Becchetti, Luca; Carlos, Castillo; D., Donato; Leonardi, Stefano; A., Baeza Yates Ricardo. - (2006), pp. 1-8. (Intervento presentato al convegno 2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2006 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006 tenutosi a Seattle; United States nel 10 August 2006).
Link-based characterization and detection of web spam
BECCHETTI, Luca;LEONARDI, Stefano;
2006
Abstract
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several met- rics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.