The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically.
Representing document lengths with identifiers / Perego, R.; Silvestri, F.; Tonellotto, N.. - 6611:(2011), pp. 665-669. (Intervento presentato al convegno 33rd European Conference on Information Retrieval, ECIR 2011 tenutosi a irl) [10.1007/978-3-642-20161-5_66].
Representing document lengths with identifiers
Silvestri F.;
2011
Abstract
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.