Catalogo dei prodotti della ricerca

We propose link-based techniques for automatic detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines. The use of Web spam is widespread and difficult to solve, mostly due to the large size of the Web which means that, in practice, many algorithms are infeasible. We perform a statistical analysis of a large collection of Web pages. In particular, we compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the entire Web graph in a scalable way. These statistical features are used to build Web spam classifiers which only consider the link structure of the Web, regardless of page contents. We then present a study of the performance of each of the classifiers alone, as well as their combined performance, by testing them over a large collection of Web link spam. After tenfold cross-validation, our best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, but are orthogonal to content-based methods.

Link analysis for Web spam detection / Becchetti, Luca; Carlos, Castillo; Debora, Donato; Ricardo Baeza, Yates; Leonardi, Stefano. - In: ACM TRANSACTIONS ON THE WEB. - ISSN 1559-1131. - 2:1(2008), pp. 1-42. [10.1145/1326561.1326563]

Link analysis for Web spam detection

BECCHETTI, Luca;Carlos Castillo;Debora Donato;Ricardo Baeza Yates;LEONARDI, Stefano

2008

Abstract

We propose link-based techniques for automatic detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines. The use of Web spam is widespread and difficult to solve, mostly due to the large size of the Web which means that, in practice, many algorithms are infeasible. We perform a statistical analysis of a large collection of Web pages. In particular, we compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the entire Web graph in a scalable way. These statistical features are used to build Web spam classifiers which only consider the link structure of the Web, regardless of page contents. We then present a study of the performance of each of the classifiers alone, as well as their combined performance, by testing them over a large collection of Web link spam. After tenfold cross-validation, our best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, but are orthogonal to content-based methods.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2008
			
	Parole chiave
	
				adversarial information retrieval; link analysis
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Link analysis for Web spam detection / Becchetti, Luca; Carlos, Castillo; Debora, Donato; Ricardo Baeza, Yates; Leonardi, Stefano. - In: ACM TRANSACTIONS ON THE WEB. - ISSN 1559-1131. - 2:1(2008), pp. 1-42. [10.1145/1326561.1326563]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
VE_2008_11573-115157.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.09 MB Formato Adobe PDF Contatta l'autore	2.09 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/115157

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

89

66

social impact