Syntax-based methods have been largely used as key components of Natural Language Processing systems for solving a variety of tasks. Yet, pre-trained Transformers are challenging all these pre-existing methods and even humans in nearly all tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we show that syntax-based neural networks rival Transformers models on tasks over definitely unseen sentences even after fine-tuning and domain adaptation. Experiments on tasks over definitely unseen sentences, provided by classification tasks over a DarkNet corpus, show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning and domain adaptation. Only after what we call extreme domain adaptation, that is, allowing BERT to retraining on the test set with the masked language model task, pre-trained Transformers reach their standard high results. Hence, in normal conditions where sentences are really unseen, syntax-based models are a viable alternative that is more transparent and has fewer parameters than transformer-based approaches.
The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences / Onorati, Dario; Ranaldi, Leonardo; Nourbakhsh, Aria; Patrizi, Arianna; Sofia Ruzzetti, Elena; Mastromattei, Michele; Fallucchi, Francesca; Massimo Zanzotto, Fabio. - (2023), pp. 111-118. (Intervento presentato al convegno 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) tenutosi a Venice; Italy) [10.1109/WI-IAT59888.2023.00021].
The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences
Dario Onorati
Membro del Collaboration Group
;
2023
Abstract
Syntax-based methods have been largely used as key components of Natural Language Processing systems for solving a variety of tasks. Yet, pre-trained Transformers are challenging all these pre-existing methods and even humans in nearly all tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we show that syntax-based neural networks rival Transformers models on tasks over definitely unseen sentences even after fine-tuning and domain adaptation. Experiments on tasks over definitely unseen sentences, provided by classification tasks over a DarkNet corpus, show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning and domain adaptation. Only after what we call extreme domain adaptation, that is, allowing BERT to retraining on the test set with the masked language model task, pre-trained Transformers reach their standard high results. Hence, in normal conditions where sentences are really unseen, syntax-based models are a viable alternative that is more transparent and has fewer parameters than transformer-based approaches.File | Dimensione | Formato | |
---|---|---|---|
Onorati_The-Dark_2023.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
481.28 kB
Formato
Adobe PDF
|
481.28 kB | Adobe PDF | Contatta l'autore |
Onorati_preprint_The-Dark_2023.pdf
accesso aperto
Note: DOI10.1109/WI-IAT59888.2023.00021
Tipologia:
Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
707.21 kB
Formato
Adobe PDF
|
707.21 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.