We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several met- rics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.

Link-based characterization and detection of web spam / Becchetti, Luca; Carlos, Castillo; D., Donato; Leonardi, Stefano; A., Baeza Yates Ricardo. - (2006), pp. 1-8. (Intervento presentato al convegno 2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2006 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006 tenutosi a Seattle; United States nel 10 August 2006).

Link-based characterization and detection of web spam

BECCHETTI, Luca;LEONARDI, Stefano;
2006

Abstract

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several met- rics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
2006
2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2006 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006
Degree correlation; False positive; Link-based; Spam detection; TrustRank; Web spam
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Link-based characterization and detection of web spam / Becchetti, Luca; Carlos, Castillo; D., Donato; Leonardi, Stefano; A., Baeza Yates Ricardo. - (2006), pp. 1-8. (Intervento presentato al convegno 2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2006 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006 tenutosi a Seattle; United States nel 10 August 2006).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/60794
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 106
  • ???jsp.display-item.citation.isi??? ND
social impact