The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above groundtruth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.

Identifying task-based sessions in search engine query logs / Lucchese, Claudio; Orlando, Salvatore; Perego, R.; Silvestri, F.; Tolomei, Gabriele. - (2011), pp. 277-286. (Intervento presentato al convegno Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011 tenutosi a Hong Kong, China).

Identifying task-based sessions in search engine query logs

F. Silvestri;TOLOMEI, GABRIELE
2011

Abstract

The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above groundtruth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.
2011
Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011
query log analysis; query log session detection; task-based session; query clustering; user search intent
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Identifying task-based sessions in search engine query logs / Lucchese, Claudio; Orlando, Salvatore; Perego, R.; Silvestri, F.; Tolomei, Gabriele. - (2011), pp. 277-286. (Intervento presentato al convegno Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011 tenutosi a Hong Kong, China).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1382704
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 107
  • ???jsp.display-item.citation.isi??? ND
social impact