Hierarchical processing and information granulation have proven essential for intelligent systems, as exemplified by Large Language Models (LLMs) backed by Transformer architectures, which leverage stacked attention modules to learn progressively richer semantic features. In this work, we offer an investigation of the role of attention layers in the hierarchy through a GPT-2 layer ablation methodology, which recalls the deactivation of the HAL 9000 computer modules in the iconic scene of the film “2001: A Space Odyssey”. The adopted methodology is based on the measurement of appropriate indices (Dale-Chall Readability, BLEU and Text Flow – a measure of the coherence within the flow of sentences) characterizing the text produced following the removal of a combination of layers assisted by a single-way analysis, to characterize these combinations. Subsequently, through a “machine-in-the-loop” procedure, we let GPT-4 to judge the texts produced by GPT-2. The obtained results are in line with the basic hypothesis according to which the hierarchical organization of the Transformers is the ground of the high semantic performances, opening the path to further insights and application hypotheses such as Explainable AI and the analytical characterization of the texts produced by Generative AI models.

2025: A GPT Odyssey. Deconstructing Intelligence by Gradual Dissolution of a Transformer / De Santis, Enrico; Martino, Alessio; Bruno, Edoardo; Rizzi, Antonello. - (2025). (Intervento presentato al convegno 2025 International Joint Conference on Neural Networks (IJCNN) tenutosi a Rome, Italy).

2025: A GPT Odyssey. Deconstructing Intelligence by Gradual Dissolution of a Transformer

Enrico De Santis;Antonello Rizzi
2025

Abstract

Hierarchical processing and information granulation have proven essential for intelligent systems, as exemplified by Large Language Models (LLMs) backed by Transformer architectures, which leverage stacked attention modules to learn progressively richer semantic features. In this work, we offer an investigation of the role of attention layers in the hierarchy through a GPT-2 layer ablation methodology, which recalls the deactivation of the HAL 9000 computer modules in the iconic scene of the film “2001: A Space Odyssey”. The adopted methodology is based on the measurement of appropriate indices (Dale-Chall Readability, BLEU and Text Flow – a measure of the coherence within the flow of sentences) characterizing the text produced following the removal of a combination of layers assisted by a single-way analysis, to characterize these combinations. Subsequently, through a “machine-in-the-loop” procedure, we let GPT-4 to judge the texts produced by GPT-2. The obtained results are in line with the basic hypothesis according to which the hierarchical organization of the Transformers is the ground of the high semantic performances, opening the path to further insights and application hypotheses such as Explainable AI and the analytical characterization of the texts produced by Generative AI models.
2025
2025 International Joint Conference on Neural Networks (IJCNN)
large language models; explainable ai; text modeling; natural language processing; text embedding
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
2025: A GPT Odyssey. Deconstructing Intelligence by Gradual Dissolution of a Transformer / De Santis, Enrico; Martino, Alessio; Bruno, Edoardo; Rizzi, Antonello. - (2025). (Intervento presentato al convegno 2025 International Joint Conference on Neural Networks (IJCNN) tenutosi a Rome, Italy).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1749682
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact