Training agents over sequences of tasks is often employed in deep reinforcement learning to let the agents progress more quickly towards better behaviours. This problem, known as curriculum learning, has been mainly tackled in the literature by numerical methods based on enumeration strategies, which, however, can handle only small size problems. In this work, we define a new optimization perspective to the curriculum learning problem with the aim of developing efficient solution methods for solving complex reinforcement learning tasks. Specifically, we show how the curriculum learning problem can be viewed as an optimization problem with a nonsmooth and nonconvex objective function and with an integer feasible region. We reformulate it by defining a grey-box function that includes a suitable scheduling problem. Numerical results on a benchmark environment in the reinforcement learning community show the effectiveness of the proposed approaches in reaching better performance also on large problems.
A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework / Seccia, R.; Foglino, F.; Leonetti, M.; Sagratella, S.. - In: OPTIMIZATION AND ENGINEERING. - ISSN 1389-4420. - 24:2(2022), pp. 831-846. [10.1007/s11081-021-09708-x]
A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework
Seccia R.;Sagratella S.
2022
Abstract
Training agents over sequences of tasks is often employed in deep reinforcement learning to let the agents progress more quickly towards better behaviours. This problem, known as curriculum learning, has been mainly tackled in the literature by numerical methods based on enumeration strategies, which, however, can handle only small size problems. In this work, we define a new optimization perspective to the curriculum learning problem with the aim of developing efficient solution methods for solving complex reinforcement learning tasks. Specifically, we show how the curriculum learning problem can be viewed as an optimization problem with a nonsmooth and nonconvex objective function and with an integer feasible region. We reformulate it by defining a grey-box function that includes a suitable scheduling problem. Numerical results on a benchmark environment in the reinforcement learning community show the effectiveness of the proposed approaches in reaching better performance also on large problems.File | Dimensione | Formato | |
---|---|---|---|
Seccia_ANovel_2022.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.32 MB
Formato
Adobe PDF
|
1.32 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.