Policy gradient learning for a humanoid soccer robot

Cherubini, A.; Giannone, F.; Iocchi, Luca; Lombardo, M.; Oriolo, Giuseppe

doi:10.1016/j.robot.2009.03.006

In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning. © 2009 Elsevier B.V. All rights reserved.

Policy gradient learning for a humanoid soccer robot / A., Cherubini; F., Giannone; Iocchi, Luca; M., Lombardo; Oriolo, Giuseppe. - In: ROBOTICS AND AUTONOMOUS SYSTEMS. - ISSN 0921-8890. - 57:8(2009), pp. 808-818. [10.1016/j.robot.2009.03.006]

Policy gradient learning for a humanoid soccer robot

A. Cherubini;F. Giannone;IOCCHI, Luca;M. Lombardo;ORIOLO, Giuseppe

2009

Abstract

In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning. © 2009 Elsevier B.V. All rights reserved.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2009
		
	Parole chiave
	
			humanoid robotics; machine learning; motion control
		
	Tipologia
	
			01 Pubblicazione su rivista::01a Articolo in rivista
		
	Citazione
	
			Policy gradient learning for a humanoid soccer robot / A., Cherubini; F., Giannone; Iocchi, Luca; M., Lombardo; Oriolo, Giuseppe. - In: ROBOTICS AND AUTONOMOUS SYSTEMS. - ISSN 0921-8890. - 57:8(2009), pp. 808-818. [10.1016/j.robot.2009.03.006]
		
	Appartiene alla tipologia:
	
			01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
VE_2009_11573-227891.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.18 MB Formato Adobe PDF Contatta l'autore	3.18 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/227891

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

13

8

Catalogo dei prodotti della ricerca