Recent advancements in text summarization, particularly with the advent of Large Language Models (LLMs), have shown remarkable performance. However, a notable challenge persists as a substantial number of automatically-generated summaries exhibit factual inconsistencies, such as hallucinations. In response to this issue, various approaches for the evaluation of consistency for summarization have emerged. Yet, these newly-introduced metrics face several limitations, including lack of interpretability, focus on short document summaries (e.g., news articles), and computational impracticality, especially for LLM-based metrics. To address these shortcomings, we propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE), a more interpretable and efficient factuality-oriented metric. FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation. Moreover, we extend our evaluation to a more challenging setting by conducting a human annotation process of long-form summarization. In the hope of fostering research in summarization factuality evaluation, we release the code of our metric and our factuality annotations of long-form summarization at ity annotations of long-form summarization at https://github.com/Babelscape/FENICE.

FENICE: Factuality Evaluation of Summarization Based on Natural Language Inference and Claim Extraction / Scire', Alessandro; Ghonim, K.; Navigli, R.. - (2024), pp. 14148-14161. ( 62nd Annual Meeting of the Association-for-Computational-Linguistics (ACL) Bangkok; Thailand ) [10.18653/v1/2024.findings-acl.841].

FENICE: Factuality Evaluation of Summarization Based on Natural Language Inference and Claim Extraction

SCIRE'
;
Ghonim K.;Navigli R.
2024

Abstract

Recent advancements in text summarization, particularly with the advent of Large Language Models (LLMs), have shown remarkable performance. However, a notable challenge persists as a substantial number of automatically-generated summaries exhibit factual inconsistencies, such as hallucinations. In response to this issue, various approaches for the evaluation of consistency for summarization have emerged. Yet, these newly-introduced metrics face several limitations, including lack of interpretability, focus on short document summaries (e.g., news articles), and computational impracticality, especially for LLM-based metrics. To address these shortcomings, we propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE), a more interpretable and efficient factuality-oriented metric. FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation. Moreover, we extend our evaluation to a more challenging setting by conducting a human annotation process of long-form summarization. In the hope of fostering research in summarization factuality evaluation, we release the code of our metric and our factuality annotations of long-form summarization at ity annotations of long-form summarization at https://github.com/Babelscape/FENICE.
2024
62nd Annual Meeting of the Association-for-Computational-Linguistics (ACL)
Summarization; Factuality Evaluation; Interpretability
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
FENICE: Factuality Evaluation of Summarization Based on Natural Language Inference and Claim Extraction / Scire', Alessandro; Ghonim, K.; Navigli, R.. - (2024), pp. 14148-14161. ( 62nd Annual Meeting of the Association-for-Computational-Linguistics (ACL) Bangkok; Thailand ) [10.18653/v1/2024.findings-acl.841].
File allegati a questo prodotto
File Dimensione Formato  
Scire_FENICE_2024.pdf

accesso aperto

Note: DOI: 10.18653/v1/2024.findings-acl.841
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 558.15 kB
Formato Adobe PDF
558.15 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726537
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact