Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.
Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation / Capobianco, Roberto; Riccio, Francesco; Nardi, Daniele. - 867:(2019), pp. 414-427. (Intervento presentato al convegno 15th International Conference on Intelligent Autonomous Systems, IAS 2018 tenutosi a Baden-Baden; Germany) [10.1007/978-3-030-01370-7_33].
Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation
Roberto Capobianco
Primo
;Francesco RiccioSecondo
;Daniele NardiUltimo
2019
Abstract
Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot.File | Dimensione | Formato | |
---|---|---|---|
Capobianco_Preprint_HI-VAL_2019.pdf
accesso aperto
Tipologia:
Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza:
Creative commons
Dimensione
857.69 kB
Formato
Adobe PDF
|
857.69 kB | Adobe PDF | |
Capobianco_HI-VAL_2019.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.13 MB
Formato
Adobe PDF
|
1.13 MB | Adobe PDF | Contatta l'autore |
Capobianco_Frontespizio-indice_HI-VAL_2019.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
193.98 kB
Formato
Adobe PDF
|
193.98 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.