Catalogo dei prodotti della ricerca

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.

A Comparison of Sampling Techniques for Web Graph Characterization / Becchetti, Luca; Carlos, Castillo; Donato, Debora; Fazzone, Adriano. - (2006). ( LinkKDD’06 Philadelphia, Pennsylvania, USA August 20 2006).

A Comparison of Sampling Techniques for Web Graph Characterization

BECCHETTI, Luca;Carlos Castillo;DONATO, DEBORA;FAZZONE, ADRIANO

2006

Abstract

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.

Scheda breve

Scheda completa

	Anno
	
				2006
			
	Abstract
	
				We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.
			
	Appartiene alla tipologia:
	
				04a Atto di comunicazione a congresso

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/516130

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact