To address the rapid growth of the Internet, moder Web search engines have to adopt distributed organization where the collection of indexed documents is partitioned amon several servers, and query answering is performed as a parall and distributed task. Collection selection can be a way to reduc the overall computing load, by finding a trade-off between th quality of results retrieved and the cost of solving queries. I this paper, we analyze the relationship between the collectio selection strategy, the effect on load balancing and on the cachin subsystem, by exploring the design-space of a distributed searc engine based on collection selection. In particular, we propose strategy to perform collection selection in a load-driven wa and a novel caching policy able to incrementally refine th effectiveness of the results returned for each subsequent cach hit. The combination of load-driven collection selection an incremental caching strategies allows our system to retrieve tw thirds of the top-ranked results returned by a baseline centralize index, with only one fifth of the computing workload.
Load-balancing and caching for collection selection architectures / Puppin, D.; Silvestri, F.; Perego, R.; Baeza-Yates, R.. - 06-08-:(2007). (Intervento presentato al convegno 2nd International Conference on Scalable Information Systems, InfoScale 2007 tenutosi a chn) [10.4108/infoscale.2007.892].
Load-balancing and caching for collection selection architectures
Silvestri F.;
2007
Abstract
To address the rapid growth of the Internet, moder Web search engines have to adopt distributed organization where the collection of indexed documents is partitioned amon several servers, and query answering is performed as a parall and distributed task. Collection selection can be a way to reduc the overall computing load, by finding a trade-off between th quality of results retrieved and the cost of solving queries. I this paper, we analyze the relationship between the collectio selection strategy, the effect on load balancing and on the cachin subsystem, by exploring the design-space of a distributed searc engine based on collection selection. In particular, we propose strategy to perform collection selection in a load-driven wa and a novel caching policy able to incrementally refine th effectiveness of the results returned for each subsequent cach hit. The combination of load-driven collection selection an incremental caching strategies allows our system to retrieve tw thirds of the top-ranked results returned by a baseline centralize index, with only one fifth of the computing workload.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.