Load-balancing and caching for collection selection architectures

Puppin, D.; Silvestri, F.; Perego, R.; Baeza-Yates, R.

doi:10.4108/infoscale.2007.892

To address the rapid growth of the Internet, moder Web search engines have to adopt distributed organization where the collection of indexed documents is partitioned amon several servers, and query answering is performed as a parall and distributed task. Collection selection can be a way to reduc the overall computing load, by finding a trade-off between th quality of results retrieved and the cost of solving queries. I this paper, we analyze the relationship between the collectio selection strategy, the effect on load balancing and on the cachin subsystem, by exploring the design-space of a distributed searc engine based on collection selection. In particular, we propose strategy to perform collection selection in a load-driven wa and a novel caching policy able to incrementally refine th effectiveness of the results returned for each subsequent cach hit. The combination of load-driven collection selection an incremental caching strategies allows our system to retrieve tw thirds of the top-ranked results returned by a baseline centralize index, with only one fifth of the computing workload.

Load-balancing and caching for collection selection architectures / Puppin, D., Silvestri, F., Perego, R., Baeza-Yates, R.. - 06-08-:(2007). (2nd International Conference on Scalable Information Systems, InfoScale 2007 chn ) [10.4108/infoscale.2007.892].

Load-balancing and caching for collection selection architectures

Puppin D.;Silvestri F.;Perego R.;Baeza-Yates R.

2007

Abstract

To address the rapid growth of the Internet, moder Web search engines have to adopt distributed organization where the collection of indexed documents is partitioned amon several servers, and query answering is performed as a parall and distributed task. Collection selection can be a way to reduc the overall computing load, by finding a trade-off between th quality of results retrieved and the cost of solving queries. I this paper, we analyze the relationship between the collectio selection strategy, the effect on load balancing and on the cachin subsystem, by exploring the design-space of a distributed searc engine based on collection selection. In particular, we propose strategy to perform collection selection in a load-driven wa and a novel caching policy able to incrementally refine th effectiveness of the results returned for each subsequent cach hit. The combination of load-driven collection selection an incremental caching strategies allows our system to retrieve tw thirds of the top-ranked results returned by a baseline centralize index, with only one fifth of the computing workload.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2007
			
	Nome convegno
	
				2nd International Conference on Scalable Information Systems, InfoScale 2007
			
	Parole chiave
	
				Collection selection
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Load-balancing and caching for collection selection architectures / Puppin, D., Silvestri, F., Perego, R., Baeza-Yates, R.. - 06-08-:(2007). (2nd International Conference on Scalable Information Systems, InfoScale 2007 chn ) [10.4108/infoscale.2007.892].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1572800

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

15

ND

Catalogo dei prodotti della ricerca