We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.
We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.
A Comparison of Sampling Techniques for Web Graph Characterization / Becchetti, Luca; Carlos, Castillo; Donato, Debora; Fazzone, Adriano. - (2006). (Intervento presentato al convegno LinkKDD’06 tenutosi a Philadelphia, Pennsylvania, USA nel August 20 2006).
A Comparison of Sampling Techniques for Web Graph Characterization
BECCHETTI, Luca;DONATO, DEBORA;FAZZONE, ADRIANO
2006
Abstract
We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.