Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

Giuseppi, A.; Pietrabissa, A.

doi:10.1080/00207179.2021.1913516

This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks / Giuseppi, A.; Pietrabissa, A.. - In: INTERNATIONAL JOURNAL OF CONTROL. - ISSN 0020-7179. - 95:9(2022), pp. 2448-2459. [10.1080/00207179.2021.1913516]

Bellman's principle of optimality and deep reinforcement learning for time-varying tasks

Giuseppi A.^Co-primo;Pietrabissa A.^Co-primo

2022

Abstract

This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Parole chiave
	
				Bellman's principle; deep reinforcement learning; finite-horizon optimal control; model-free control
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Bellman's principle of optimality and deep reinforcement learning for time-varying tasks / Giuseppi, A.; Pietrabissa, A.. - In: INTERNATIONAL JOURNAL OF CONTROL. - ISSN 0020-7179. - 95:9(2022), pp. 2448-2459. [10.1080/00207179.2021.1913516]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Giuseppi_preprint_Bellmans_2021.pdf accesso aperto Note: https://doi.org/10.1080/00207179.2021.1913516 Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 530.59 kB Formato Adobe PDF	530.59 kB	Adobe PDF
Giuseppi_Bellman's_2021.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.13 MB Formato Adobe PDF Contatta l'autore	2.13 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1545947

Citazioni

ND

5

3

Catalogo dei prodotti della ricerca