Catalogo dei prodotti della ricerca

Syntax-based methods have been largely used as key components of Natural Language Processing systems for solving a variety of tasks. Yet, pre-trained Transformers are challenging all these pre-existing methods and even humans in nearly all tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we show that syntax-based neural networks rival Transformers models on tasks over definitely unseen sentences even after fine-tuning and domain adaptation. Experiments on tasks over definitely unseen sentences, provided by classification tasks over a DarkNet corpus, show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning and domain adaptation. Only after what we call extreme domain adaptation, that is, allowing BERT to retraining on the test set with the masked language model task, pre-trained Transformers reach their standard high results. Hence, in normal conditions where sentences are really unseen, syntax-based models are a viable alternative that is more transparent and has fewer parameters than transformer-based approaches.

The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences / Onorati, Dario; Ranaldi, Leonardo; Nourbakhsh, Aria; Patrizi, Arianna; Sofia Ruzzetti, Elena; Mastromattei, Michele; Fallucchi, Francesca; Massimo Zanzotto, Fabio. - (2023), pp. 111-118. (Intervento presentato al convegno 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) tenutosi a Venice; Italy) [10.1109/WI-IAT59888.2023.00021].

The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences

Dario Onorati^{Membro del Collaboration Group};

2023

Abstract

Syntax-based methods have been largely used as key components of Natural Language Processing systems for solving a variety of tasks. Yet, pre-trained Transformers are challenging all these pre-existing methods and even humans in nearly all tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we show that syntax-based neural networks rival Transformers models on tasks over definitely unseen sentences even after fine-tuning and domain adaptation. Experiments on tasks over definitely unseen sentences, provided by classification tasks over a DarkNet corpus, show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning and domain adaptation. Only after what we call extreme domain adaptation, that is, allowing BERT to retraining on the test set with the masked language model task, pre-trained Transformers reach their standard high results. Hence, in normal conditions where sentences are really unseen, syntax-based models are a viable alternative that is more transparent and has fewer parameters than transformer-based approaches.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
			
	Parole chiave
	
				Transformers; Syntax-based Neural Networks; Pre-Training; Unseen datasets
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				The Dark Side of the Language: Syntax-based Neural Networks rivaling Transformers in Definitely Unseen Sentences / Onorati, Dario; Ranaldi, Leonardo; Nourbakhsh, Aria; Patrizi, Arianna; Sofia Ruzzetti, Elena; Mastromattei, Michele; Fallucchi, Francesca; Massimo Zanzotto, Fabio. - (2023), pp. 111-118. (Intervento presentato al  convegno 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) tenutosi a Venice; Italy) [10.1109/WI-IAT59888.2023.00021].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Onorati_The-Dark_2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 481.28 kB Formato Adobe PDF Contatta l'autore	481.28 kB	Adobe PDF	Contatta l'autore
Onorati_preprint_The-Dark_2023.pdf accesso aperto Note: DOI10.1109/WI-IAT59888.2023.00021 Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 707.21 kB Formato Adobe PDF	707.21 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696819

Citazioni

ND

1

0

social impact