By now, it is widely known that pre-trained Neural Language Models (NLM) and Large Language Models (LLM) possess remarkable capabilities and they are able to solve many Natural Language Processing Tasks. However, not as much is understood regarding how Transformer-based models acquire this ability during their complex training process. In this context, an interesting line of work surfaced in the last few years: the study of the so-called learning trajectories. Several studies tested the knowledge acquired by a model not only when it was fully trained, but also in its checkpoints, i.e. intermediate versions of the model at different stages during its training. Nonetheless, most of these works focused on simple tasks, often analysing single grammatical aspects (such as part-of-speech tags, transitive verbs, etc.) without a proper comparison with more complex tasks and with semantics-based aspects. In this paper, we consider two additional tasks to study the learning trajectory of NLMs and to compare different aspects. The first one consists on classifying a sentence as correct or wrong, from the grammatical point of view, from a novel dataset which can contain several types of errors. The second one is a totally semantic-based task revolving understanding whether a sentence is funny or not. In our experimental evaluation, we compare the learning trajectories on these two tasks with three simpler grammatical aspects. Thus, we highlight the most important similarities and divergences in terms of how these types of knowledge are learned by three GPT-NeoX models. Moreover, we analyse the behaviour of each layers of the models considered, verifying whether there are significant differences among them.

An Analysis on How Pre-Trained Language Models Learn Different Aspects / Gjinika, Ejdis; Arici, Nicola; Putelli, Luca; Gerevini, Alfonso Emilio; Serina, Ivan. - 3839:(2024), pp. 28-41. ( 5th Italian Workshop on Explainable Artificial Intelligence, XAI.it 2024 Bolzano; Italy ).

An Analysis on How Pre-Trained Language Models Learn Different Aspects

Gjinika, Ejdis
;
Gerevini, Alfonso Emilio;
2024

Abstract

By now, it is widely known that pre-trained Neural Language Models (NLM) and Large Language Models (LLM) possess remarkable capabilities and they are able to solve many Natural Language Processing Tasks. However, not as much is understood regarding how Transformer-based models acquire this ability during their complex training process. In this context, an interesting line of work surfaced in the last few years: the study of the so-called learning trajectories. Several studies tested the knowledge acquired by a model not only when it was fully trained, but also in its checkpoints, i.e. intermediate versions of the model at different stages during its training. Nonetheless, most of these works focused on simple tasks, often analysing single grammatical aspects (such as part-of-speech tags, transitive verbs, etc.) without a proper comparison with more complex tasks and with semantics-based aspects. In this paper, we consider two additional tasks to study the learning trajectory of NLMs and to compare different aspects. The first one consists on classifying a sentence as correct or wrong, from the grammatical point of view, from a novel dataset which can contain several types of errors. The second one is a totally semantic-based task revolving understanding whether a sentence is funny or not. In our experimental evaluation, we compare the learning trajectories on these two tasks with three simpler grammatical aspects. Thus, we highlight the most important similarities and divergences in terms of how these types of knowledge are learned by three GPT-NeoX models. Moreover, we analyse the behaviour of each layers of the models considered, verifying whether there are significant differences among them.
2024
5th Italian Workshop on Explainable Artificial Intelligence, XAI.it 2024
Natural Language Processing, Explainability; Interpretability, Learning Trajectory
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An Analysis on How Pre-Trained Language Models Learn Different Aspects / Gjinika, Ejdis; Arici, Nicola; Putelli, Luca; Gerevini, Alfonso Emilio; Serina, Ivan. - 3839:(2024), pp. 28-41. ( 5th Italian Workshop on Explainable Artificial Intelligence, XAI.it 2024 Bolzano; Italy ).
File allegati a questo prodotto
File Dimensione Formato  
Gjinika_Analysis_2024.pdf

accesso aperto

Note: https://ceur-ws.org/Vol-3839/paper2.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.23 MB
Formato Adobe PDF
1.23 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1733086
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact