This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks / Giuseppi, A.; Pietrabissa, A.. - In: INTERNATIONAL JOURNAL OF CONTROL. - ISSN 0020-7179. - 95:9(2022), pp. 2448-2459. [10.1080/00207179.2021.1913516]

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

Giuseppi A.
Co-primo
;
Pietrabissa A.
Co-primo
2022

Abstract

This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.
2022
Bellman's principle; deep reinforcement learning; finite-horizon optimal control; model-free control
01 Pubblicazione su rivista::01a Articolo in rivista
Bellman's principle of optimality and deep reinforcement learning for time-varying tasks / Giuseppi, A.; Pietrabissa, A.. - In: INTERNATIONAL JOURNAL OF CONTROL. - ISSN 0020-7179. - 95:9(2022), pp. 2448-2459. [10.1080/00207179.2021.1913516]
File allegati a questo prodotto
File Dimensione Formato  
Giuseppi_preprint_Bellmans_2021.pdf

accesso aperto

Note: https://doi.org/10.1080/00207179.2021.1913516
Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 530.59 kB
Formato Adobe PDF
530.59 kB Adobe PDF
Giuseppi_Bellman's_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.13 MB
Formato Adobe PDF
2.13 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1545947
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact