Catalogo dei prodotti della ricerca

The evaluation of large language models for Italian faces unique challenges due to morphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines / Moroni, L., Pappacoda, G., Barba, E., Conia, S., Galassi, A., Magnini, B., Navigli, R., Torroni, P., Zanoli, R.. - 4112:(2025). (the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari; Italia ).

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines

Luca Moroni;Gianmarco Pappacoda;Edoardo Barba;Simone Conia;Andrea Galassi;Bernardo Magnini;Roberto Navigli;Paolo Torroni;Roberto Zanoli

2025

Abstract

The evaluation of large language models for Italian faces unique challenges due to morphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
			
	Parole chiave
	
				LLM; Evaluation; Guidelines
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines / Moroni, L., Pappacoda, G., Barba, E., Conia, S., Galassi, A., Magnini, B., Navigli, R., Torroni, P., Zanoli, R.. - 4112:(2025). (the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari; Italia ).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1768942

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

0

social impact