Optimal control and Reinforcement Learning deal both with sequential decision-making problems, although they use different tools. In this thesis, we have investigated the connection between these two research areas. In particular, our contributions are twofold. In the first part of the thesis, we present and study an optimal control problem with uncertain dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution π on the space of possible dynamics functions. The goal is to minimize an average cost functional, where the average is computed with respect to the probability distribution π. This framework describes well the behavior of a class of model-based RL algorithms, which build a probabilistic model (here represented by π) of the dynamics, and then compute the control by minimizing the expectation of the cost functional with respect to π. In this context, we establish some convergence results for the value function and the optimal control. These results constitute an important step in the convergence analysis of this class of RL algorithms. In the second part, we propose a new online algorithm for dealing with LQR problems where the state matrix A is unknown. Our algorithm provides an approximation of the dynamics and finds a suitable control at the same time, during a single simulation. It is based on an integration between RL and optimal control techniques. A probabilistic model is updated at each iteration using Bayesian linear regression formulas, and the control is obtained in feedback form by solving a Riccati differential equation. Numerical tests show how the algorithm can efficiently bring the system to the origin, despite not having full knowledge of the system at the beginning of the simulation.

An optimal control approach to Reinforcement Learning / Pesare, Andrea. - (2022 Jan 25).

An optimal control approach to Reinforcement Learning

PESARE, ANDREA
25/01/2022

Abstract

Optimal control and Reinforcement Learning deal both with sequential decision-making problems, although they use different tools. In this thesis, we have investigated the connection between these two research areas. In particular, our contributions are twofold. In the first part of the thesis, we present and study an optimal control problem with uncertain dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution π on the space of possible dynamics functions. The goal is to minimize an average cost functional, where the average is computed with respect to the probability distribution π. This framework describes well the behavior of a class of model-based RL algorithms, which build a probabilistic model (here represented by π) of the dynamics, and then compute the control by minimizing the expectation of the cost functional with respect to π. In this context, we establish some convergence results for the value function and the optimal control. These results constitute an important step in the convergence analysis of this class of RL algorithms. In the second part, we propose a new online algorithm for dealing with LQR problems where the state matrix A is unknown. Our algorithm provides an approximation of the dynamics and finds a suitable control at the same time, during a single simulation. It is based on an integration between RL and optimal control techniques. A probabilistic model is updated at each iteration using Bayesian linear regression formulas, and the control is obtained in feedback form by solving a Riccati differential equation. Numerical tests show how the algorithm can efficiently bring the system to the origin, despite not having full knowledge of the system at the beginning of the simulation.
25-gen-2022
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Pesare.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 2.49 MB
Formato Adobe PDF
2.49 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1634795
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact