Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.

Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation / Capobianco, Roberto; Riccio, Francesco; Nardi, Daniele. - 867:(2019), pp. 414-427. (Intervento presentato al convegno 15th International Conference on Intelligent Autonomous Systems, IAS 2018 tenutosi a Baden-Baden; Germany) [10.1007/978-3-030-01370-7_33].

Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation

Roberto Capobianco
Primo
;
Francesco Riccio
Secondo
;
Daniele Nardi
Ultimo
2019

Abstract

Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.
2019
15th International Conference on Intelligent Autonomous Systems, IAS 2018
Robot Planning; Robot Learning; Hierarchical Value Function Learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation / Capobianco, Roberto; Riccio, Francesco; Nardi, Daniele. - 867:(2019), pp. 414-427. (Intervento presentato al convegno 15th International Conference on Intelligent Autonomous Systems, IAS 2018 tenutosi a Baden-Baden; Germany) [10.1007/978-3-030-01370-7_33].
File allegati a questo prodotto
File Dimensione Formato  
Capobianco_Preprint_HI-VAL_2019.pdf

accesso aperto

Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Creative commons
Dimensione 857.69 kB
Formato Adobe PDF
857.69 kB Adobe PDF
Capobianco_HI-VAL_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Contatta l'autore
Capobianco_Frontespizio-indice_HI-VAL_2019.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 193.98 kB
Formato Adobe PDF
193.98 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1132271
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact