Catalogo dei prodotti della ricerca

Neural attention has become a key component in many deep learning applications, ranging from machine translation to time series forecasting. While many variations of attention have been developed over recent years, all share a common component in the application of a softmax function to normalize the attention weights, in order to transform them into valid mixing coefficients. In this paper, we aim to improve the modeling flexibility of a generic attention module by innovatively replacing this softmax operation with a learnable softmax, in which the normalizing functions are also adapted from the data. Specifically, our generalized softmax builds upon recent work in learning activation functions for deep networks, in particular the kernel activation function and its extensions. We describe the application of the proposed technique for the challenging case of time series forecasting with the dual-stage attention-based recurrent neural network (DA-RNN), an innovative model for predicting time series that employs two different attention modules for handling exogenous factors and long-term dependencies. A series of real-world benchmarks are used to show that simply plugging-in our generalized attention model can improve results on all datasets, even when keeping the number of trainable parameters in the model constant. To further evaluate the algorithm, we collect a novel dataset for predicting the Bitcoin closing exchange rate, a problem of high practical significance lately. Finally, to foster research in the topic, we also release both the dataset and our model as an open source extensible library. Over a baseline DA-RNN, our proposed model delivers an improvement of MAR ranging from 6% to 15% using our newly-released dataset.

A non-parametric softmax for improving neural attention in time-series forecasting / Totaro, S.; Hussain, A.; Scardapane, S.. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 381:(2020), pp. 177-185. [10.1016/j.neucom.2019.10.084]

A non-parametric softmax for improving neural attention in time-series forecasting

Totaro S.;Hussain A.;Scardapane S.

2020

Abstract

Neural attention has become a key component in many deep learning applications, ranging from machine translation to time series forecasting. While many variations of attention have been developed over recent years, all share a common component in the application of a softmax function to normalize the attention weights, in order to transform them into valid mixing coefficients. In this paper, we aim to improve the modeling flexibility of a generic attention module by innovatively replacing this softmax operation with a learnable softmax, in which the normalizing functions are also adapted from the data. Specifically, our generalized softmax builds upon recent work in learning activation functions for deep networks, in particular the kernel activation function and its extensions. We describe the application of the proposed technique for the challenging case of time series forecasting with the dual-stage attention-based recurrent neural network (DA-RNN), an innovative model for predicting time series that employs two different attention modules for handling exogenous factors and long-term dependencies. A series of real-world benchmarks are used to show that simply plugging-in our generalized attention model can improve results on all datasets, even when keeping the number of trainable parameters in the model constant. To further evaluate the algorithm, we collect a novel dataset for predicting the Bitcoin closing exchange rate, a problem of high practical significance lately. Finally, to foster research in the topic, we also release both the dataset and our model as an open source extensible library. Over a baseline DA-RNN, our proposed model delivers an improvement of MAR ranging from 6% to 15% using our newly-released dataset.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Parole chiave
	
				activation function; attention; softmax; time series forecasting
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A non-parametric softmax for improving neural attention in time-series forecasting / Totaro, S.; Hussain, A.; Scardapane, S.. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 381:(2020), pp. 177-185. [10.1016/j.neucom.2019.10.084]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Totaro_Non-parametric_2020.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.27 MB Formato Adobe PDF Contatta l'autore	1.27 MB	Adobe PDF	Contatta l'autore
Totaro_pre-print_Non-parametric _2019.pdf accesso aperto Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 8.8 MB Formato Adobe PDF	8.8 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1503640

Citazioni

ND

22

18

social impact