We study whether a large language model can reliably evaluate human creativity in constrained, innovation-like tasks. Using expert-generated creative outputs from a validated experiment with workers in cultural and creative industries, we embed ChatGPT as an evaluator and benchmark its assessments against expert human judgments obtained through the Consensual Assessment Technique. Study 1 supports AI reliability by showing that AI-based creativity evaluations exhibit internal consistency comparable to that of expert judges across repeated and independent runs, even under conservative scenarios. Replacing a human judge with an AI evaluator does not reduce inter-rater reliability across drawing, mathematical, and verbal tasks. Beyond reliability, AI evaluations display three additional features that are difficult to achieve with human-only panels: lower evaluative variability, systematically higher scores consistent with a potentially more inclusive evaluative stance, and task-independence of evaluative standards. Study 2 further supports task-independence by showing that AI evaluations are structured along fluency, flexibility, originality, and elaboration, with dimension weights that adapt to task-specific constraints.

Evaluating creative work with artificial intelligence. Evidence from constrained innovation tasks / Addis, Valerio Fedele; Attanasi, Giuseppe; Di Bartolomeo, Giovanni; Mariella, Michele; Peruzzi, Valentina. - In: TECHNOVATION. - ISSN 0166-4972. - 155:(2026), pp. -1. [10.1016/j.technovation.2026.103571]

Evaluating creative work with artificial intelligence. Evidence from constrained innovation tasks

Valerio Fedele Addis;Giuseppe Attanasi
;
Giovanni Di Bartolomeo;Michele Mariella;Valentina Peruzzi
2026

Abstract

We study whether a large language model can reliably evaluate human creativity in constrained, innovation-like tasks. Using expert-generated creative outputs from a validated experiment with workers in cultural and creative industries, we embed ChatGPT as an evaluator and benchmark its assessments against expert human judgments obtained through the Consensual Assessment Technique. Study 1 supports AI reliability by showing that AI-based creativity evaluations exhibit internal consistency comparable to that of expert judges across repeated and independent runs, even under conservative scenarios. Replacing a human judge with an AI evaluator does not reduce inter-rater reliability across drawing, mathematical, and verbal tasks. Beyond reliability, AI evaluations display three additional features that are difficult to achieve with human-only panels: lower evaluative variability, systematically higher scores consistent with a potentially more inclusive evaluative stance, and task-independence of evaluative standards. Study 2 further supports task-independence by showing that AI evaluations are structured along fluency, flexibility, originality, and elaboration, with dimension weights that adapt to task-specific constraints.
2026
artificial intelligence; creativity evaluation; constrained creativity tasks; consensual assessment technique; cultural and creative industry professionals; innovation-like tasks
01 Pubblicazione su rivista::01a Articolo in rivista
Evaluating creative work with artificial intelligence. Evidence from constrained innovation tasks / Addis, Valerio Fedele; Attanasi, Giuseppe; Di Bartolomeo, Giovanni; Mariella, Michele; Peruzzi, Valentina. - In: TECHNOVATION. - ISSN 0166-4972. - 155:(2026), pp. -1. [10.1016/j.technovation.2026.103571]
File allegati a questo prodotto
File Dimensione Formato  
Addis_Evaluating_2026.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1767943
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact