In recent years, research in text summarization has mainly focused on the news domain, where texts are typically short and have strong layout features. The task of full-book summarization presents additional challenges which are hard to tackle with current resources, due to their limited size and availability in English only. To overcome these limitations, we present “Echoes from Alexandria”, or in shortened form, “Echoes”, a large resource for multilingual book summarization. Echoes features three novel datasets: i) Echo-Wiki, for multilingual book summarization, ii) Echo-XSum, for extremely-compressive multilingual book summarization, and iii) Echo-FairySum, for extractive book summarization. To the best of our knowledge, Echoes – with its thousands of books and summaries – is the largest resource, and the first to be multilingual, featuring 5 languages and 25 language pairs. In addition to Echoes, we also introduce a new extractive-then-abstractive baseline, and, supported by our experimental results and manual analysis of the summaries generated, we argue that this baseline is more suitable for book summarization than purely-abstractive approaches. We release our resource and software at https://github.com/Babelscape/echoes-from-alexandria in the hope of fostering innovative research in multilingual book summarization.

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization / Scirã, Alessandro; Conia, Simone; Ciciliano, Simone; Navigli, Roberto. - (2023), pp. 853-867. (Intervento presentato al convegno Association for Computational Linguistics tenutosi a Toronto; Canada) [10.18653/v1/2023.findings-acl.54].

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization

Alessandro ScirÃ
;
Simone Conia
;
Simone Ciciliano
;
Roberto Navigli
2023

Abstract

In recent years, research in text summarization has mainly focused on the news domain, where texts are typically short and have strong layout features. The task of full-book summarization presents additional challenges which are hard to tackle with current resources, due to their limited size and availability in English only. To overcome these limitations, we present “Echoes from Alexandria”, or in shortened form, “Echoes”, a large resource for multilingual book summarization. Echoes features three novel datasets: i) Echo-Wiki, for multilingual book summarization, ii) Echo-XSum, for extremely-compressive multilingual book summarization, and iii) Echo-FairySum, for extractive book summarization. To the best of our knowledge, Echoes – with its thousands of books and summaries – is the largest resource, and the first to be multilingual, featuring 5 languages and 25 language pairs. In addition to Echoes, we also introduce a new extractive-then-abstractive baseline, and, supported by our experimental results and manual analysis of the summaries generated, we argue that this baseline is more suitable for book summarization than purely-abstractive approaches. We release our resource and software at https://github.com/Babelscape/echoes-from-alexandria in the hope of fostering innovative research in multilingual book summarization.
2023
Association for Computational Linguistics
echoes; multilingual; summarization; book summarization; long document summarization
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Echoes from Alexandria: A Large Resource for Multilingual Book Summarization / Scirã, Alessandro; Conia, Simone; Ciciliano, Simone; Navigli, Roberto. - (2023), pp. 853-867. (Intervento presentato al convegno Association for Computational Linguistics tenutosi a Toronto; Canada) [10.18653/v1/2023.findings-acl.54].
File allegati a questo prodotto
File Dimensione Formato  
Sciré_Echoes_2023.pdf

accesso aperto

Note: DOI: 10.18653/v1/2023.findings-acl.54
Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 895.04 kB
Formato Adobe PDF
895.04 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1685072
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact