One major limitation to the applicability of Reinforcement Learning (RL) to many domains of practical relevance, in particular in robotic applications, is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose novel techniques to automatically define Reward Shaping and Reward Heuristic functions that are based on the solution obtained at a higher level of abstraction and provide rewards to the finer (possibly the concrete) MDP at the lower level, thus inducing an exploration heuristic that can effectively guide the learning process in the more complex domain. In contrast with other works in Hierarchical RL, our technique imposes fewer requirements on the design of the abstract models and is tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain, we prove that the method guarantees optimal convergence, and finally demonstrate its effectiveness experimentally in several complex robotic domains.

Exploiting robot abstractions in episodic RL via reward shaping and heuristics / Cipollone, Roberto; Favorito, Marco; Maiorana, Flavio; De Giacomo, Giuseppe; Iocchi, Luca; Patrizi, Fabio. - In: ROBOTICS AND AUTONOMOUS SYSTEMS. - ISSN 0921-8890. - 193:(2025). [10.1016/j.robot.2025.105116]

Exploiting robot abstractions in episodic RL via reward shaping and heuristics

Cipollone, Roberto;Favorito, Marco;Maiorana, Flavio;De Giacomo, Giuseppe;Iocchi, Luca;Patrizi, Fabio
2025

Abstract

One major limitation to the applicability of Reinforcement Learning (RL) to many domains of practical relevance, in particular in robotic applications, is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose novel techniques to automatically define Reward Shaping and Reward Heuristic functions that are based on the solution obtained at a higher level of abstraction and provide rewards to the finer (possibly the concrete) MDP at the lower level, thus inducing an exploration heuristic that can effectively guide the learning process in the more complex domain. In contrast with other works in Hierarchical RL, our technique imposes fewer requirements on the design of the abstract models and is tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain, we prove that the method guarantees optimal convergence, and finally demonstrate its effectiveness experimentally in several complex robotic domains.
2025
Abstraction; Exploration heuristics; Hierarchical reinforcement learning; Planning; Reward shaping; Robotics
01 Pubblicazione su rivista::01a Articolo in rivista
Exploiting robot abstractions in episodic RL via reward shaping and heuristics / Cipollone, Roberto; Favorito, Marco; Maiorana, Flavio; De Giacomo, Giuseppe; Iocchi, Luca; Patrizi, Fabio. - In: ROBOTICS AND AUTONOMOUS SYSTEMS. - ISSN 0921-8890. - 193:(2025). [10.1016/j.robot.2025.105116]
File allegati a questo prodotto
File Dimensione Formato  
Cipollone_Exploiting-robot_2025.pdf

accesso aperto

Note: https://doi.org/10.1016/j.robot.2025.105116
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 3.27 MB
Formato Adobe PDF
3.27 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1748989
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact