Catalogo dei prodotti della ricerca

In this paper we address the problem of estimating the index size needed by web search engines to answer as many queries as possible by exploiting the marked difference between query and click frequencies. We provide a possible formal definition for the notion of essential web pages as those that cover a large fraction of distinct queries --- i.e., we look at the problem as a version of MaxCover. Although in general MaxCover is approximable to within a factor of 1-1/e ~0.632 from the optimum, we provide a condition under which the greedy algorithm does find the actual best cover (or remains at a known bounded factor from it). The extra check for optimality (or for bounding the ratio from the optimum) comes at a negligible algorithmic cost. Moreover, in most practical instances of this problem, the algorithm is able to provide solutions that are provably optimal, or close to optimal. We relate this observed phenomenon to some properties of the queries' click graph. Our experimental results confirm that a small number of web pages can respond to a large fraction of the queries (e.g., 0.4% of the pages answers 20% of the queries). Our approach can be used in several related search applications, and has in fact an even more general appeal --- as a first example, our preliminary experimental study confirms that our algorithm has extremely good performances on other (social network based) MaxCover instances

Essential web pages are easy to find / Baeza Yates, R., Boldi, P., Chierichetti, F.. - (2015), pp. 97-107. (24th International Conference on World Wide Web, WWW 2015 Firenze 2015) [10.1145/2736277.2741100].

Essential web pages are easy to find

Baeza Yates, Ricardo;Boldi, Paolo;CHIERICHETTI, FLAVIO

2015

Abstract

In this paper we address the problem of estimating the index size needed by web search engines to answer as many queries as possible by exploiting the marked difference between query and click frequencies. We provide a possible formal definition for the notion of essential web pages as those that cover a large fraction of distinct queries --- i.e., we look at the problem as a version of MaxCover. Although in general MaxCover is approximable to within a factor of 1-1/e ~0.632 from the optimum, we provide a condition under which the greedy algorithm does find the actual best cover (or remains at a known bounded factor from it). The extra check for optimality (or for bounding the ratio from the optimum) comes at a negligible algorithmic cost. Moreover, in most practical instances of this problem, the algorithm is able to provide solutions that are provably optimal, or close to optimal. We relate this observed phenomenon to some properties of the queries' click graph. Our experimental results confirm that a small number of web pages can respond to a large fraction of the queries (e.g., 0.4% of the pages answers 20% of the queries). Our approach can be used in several related search applications, and has in fact an even more general appeal --- as a first example, our preliminary experimental study confirms that our algorithm has extremely good performances on other (social network based) MaxCover instances

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2015
			
	Nome convegno
	
				24th International Conference on World Wide Web, WWW 2015
			
	Parole chiave
	
				Approximation algorithms; Click graph; Greedy algorithms.; Layered indices; Max cover; Query log analysis; Tiering; Web search; Computer Networks and Communications; Software
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Essential web pages are easy to find / Baeza Yates, R., Boldi, P., Chierichetti, F.. - (2015), pp. 97-107. (24th International Conference on World Wide Web, WWW 2015 Firenze 2015) [10.1145/2736277.2741100].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Chierichetti_Essential_2015.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 713.45 kB Formato Adobe PDF Contatta l'autore	713.45 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/877846

Citazioni

ND

6

4

social impact