Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs

Servedio, Giovanni; De Bellis, Alessandro; Di Palma, Dario; Anelli, Vito Walter; Di Noia, Tommaso

doi:10.18653/v1/2025.acl-long.304

Factual hallucinations are a major challenge for Large Language Models (LLMs). They undermine reliability and user trust by generating inaccurate or fabricated content. Recent studies suggest that when generating false statements, the internal states of LLMs encode information about truthfulness. However, these studies often rely on synthetic datasets that lack realism, which limits generalization when evaluating the factual accuracy of text generated by the model itself. In this paper, we challenge the findings of previous work by investigating truthfulness encoding capabilities, leading to the generation of a more realistic and challenging dataset. Specifically, we extend previous work by introducing: (1) a strategy for sampling plausible true-false factoid sentences from tabular data and (2) a procedure for generating realistic, LLM-dependent true-false datasets from Question Answering collections. Our analysis of two open-source LLMs reveals that while the findings from previous studies are partially validated, generalization to LLM-generated datasets remains challenging. This study lays the groundwork for future research on factuality in LLMs and offers practical guidelines for more effective evaluation.

Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs / Servedio, Giovanni; De Bellis, Alessandro; Di Palma, Dario; Anelli, Vito Walter; Di Noia, Tommaso. - (2025), pp. 6089-6104. (Intervento presentato al convegno Association for Computational Linguistics tenutosi a Vienna) [10.18653/v1/2025.acl-long.304].

Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs

Servedio, Giovanni;De Bellis, Alessandro;Di Palma, Dario;Anelli, Vito Walter;Di Noia, Tommaso

2025

Abstract

Factual hallucinations are a major challenge for Large Language Models (LLMs). They undermine reliability and user trust by generating inaccurate or fabricated content. Recent studies suggest that when generating false statements, the internal states of LLMs encode information about truthfulness. However, these studies often rely on synthetic datasets that lack realism, which limits generalization when evaluating the factual accuracy of text generated by the model itself. In this paper, we challenge the findings of previous work by investigating truthfulness encoding capabilities, leading to the generation of a more realistic and challenging dataset. Specifically, we extend previous work by introducing: (1) a strategy for sampling plausible true-false factoid sentences from tabular data and (2) a procedure for generating realistic, LLM-dependent true-false datasets from Question Answering collections. Our analysis of two open-source LLMs reveals that while the findings from previous studies are partially validated, generalization to LLM-generated datasets remains challenging. This study lays the groundwork for future research on factuality in LLMs and offers practical guidelines for more effective evaluation.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				Association for Computational Linguistics
			
	Parole chiave
	
				large language models; probing; factuality
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs / Servedio, Giovanni; De Bellis, Alessandro; Di Palma, Dario; Anelli, Vito Walter; Di Noia, Tommaso. - (2025), pp. 6089-6104. (Intervento presentato al  convegno Association for Computational Linguistics tenutosi a Vienna) [10.18653/v1/2025.acl-long.304].

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1754132

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

Catalogo dei prodotti della ricerca