In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks and explain the general properties that are common to several existing architectures. We introduce the basis of their training procedure, the backpropagation through time, as a general way to propagate and distribute the prediction error to previous states of the network. The learning procedure consists of updating the model parameters by minimizing a suitable loss function, which includes the error achieved on the target task and, usually, also one or more regularization terms. We then discuss several ways of regularizing the system, highlighting their advantages and drawbacks. Beside the standard stochastic gradient descent procedure, we also present several additional optimization strategies proposed in the literature for updating the network weights. Finally, we illustrate the problem of the vanishing gradient effect, an inherent problem of the gradient-based optimization techniques which occur in several situations while training neural networks. We conclude by discussing the most recent and successful approaches proposed in the literature to limit the vanishing of the gradients.
Properties and training in recurrent neural networks / Bianchi, Filippo Maria; Maiorino, Enrico; Kampffmeyer, Michael C.; Rizzi, Antonello; Jenssen, Robert. - STAMPA. - (2017), pp. 9-21. - SPRINGERBRIEFS IN COMPUTER SCIENCE. [10.1007/978-3-319-70338-1_2].
Properties and training in recurrent neural networks
Bianchi, Filippo Maria;Maiorino, Enrico;Rizzi, Antonello;
2017
Abstract
In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks and explain the general properties that are common to several existing architectures. We introduce the basis of their training procedure, the backpropagation through time, as a general way to propagate and distribute the prediction error to previous states of the network. The learning procedure consists of updating the model parameters by minimizing a suitable loss function, which includes the error achieved on the target task and, usually, also one or more regularization terms. We then discuss several ways of regularizing the system, highlighting their advantages and drawbacks. Beside the standard stochastic gradient descent procedure, we also present several additional optimization strategies proposed in the literature for updating the network weights. Finally, we illustrate the problem of the vanishing gradient effect, an inherent problem of the gradient-based optimization techniques which occur in several situations while training neural networks. We conclude by discussing the most recent and successful approaches proposed in the literature to limit the vanishing of the gradients.File | Dimensione | Formato | |
---|---|---|---|
Bianchi_Properties_2017.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
249.55 kB
Formato
Adobe PDF
|
249.55 kB | Adobe PDF | Contatta l'autore |
Bianchi_Recurrent_Frontespizio-colophon-indice_2017.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.48 MB
Formato
Adobe PDF
|
1.48 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.