The evaluation of large language models for Italian faces unique challenges due to morphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines / Moroni, Luca; Pappacoda, Gianmarco; Barba, Edoardo; Conia, Simone; Galassi, Andrea; Magnini, Bernardo; Navigli, Roberto; Torroni, Paolo; Zanoli, Roberto. - (2025), pp. 747-759. ( the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari, Italia ).

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines

Luca Moroni
;
Edoardo Barba;Simone Conia;Andrea Galassi;Roberto Navigli;
2025

Abstract

The evaluation of large language models for Italian faces unique challenges due to morphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.
2025
the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
LLM, Evaluation, Guidelines
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines / Moroni, Luca; Pappacoda, Gianmarco; Barba, Edoardo; Conia, Simone; Galassi, Andrea; Magnini, Bernardo; Navigli, Roberto; Torroni, Paolo; Zanoli, Roberto. - (2025), pp. 747-759. ( the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) Cagliari, Italia ).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1768942
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact