Catalogo dei prodotti della ricerca

A common problem in Reinforcement Learning (RL) is that the reward function is hard to express. This can be overcome by resorting to Inverse Reinforcement Learning (IRL), which consists in first obtaining a reward function from a set of execution traces generated by the expert agent, and then making the learning agent learn the expert's behavior --this is known as Transfer Learning (TL). Typical IRL solutions rely on a numerical representation of the reward function, which raises problems related to the adopted optimization procedures. We describe a TL method where the execution traces generated by the expert agent, possibly via planning, are used to produce a logical (as opposed to numerical) specification of the reward function, to be incorporated in a device known as Restraining Bolt (RB). The RB can be attached to the learning agent to drive the learning process and ultimately make it imitate the expert. We show that TL can be applied to heterogeneous agents, with the expert, the learner and the RB using different representations of the environment's actions and states, without specifying mappings among their representations.

Imitation learning over heterogeneous agents with restraining bolts / DE GIACOMO, Giuseppe; Iocchi, Luca; Favorito, Marco; Patrizi, Fabio. - 30:(2020), pp. 517-521. (Intervento presentato al convegno International Conference on Automated Planning and Scheduling tenutosi a Nancy; France).

Imitation learning over heterogeneous agents with restraining bolts

DE GIACOMO, Giuseppe;Iocchi, Luca;Favorito, Marco;Patrizi, Fabio

2020

Abstract

A common problem in Reinforcement Learning (RL) is that the reward function is hard to express. This can be overcome by resorting to Inverse Reinforcement Learning (IRL), which consists in first obtaining a reward function from a set of execution traces generated by the expert agent, and then making the learning agent learn the expert's behavior --this is known as Transfer Learning (TL). Typical IRL solutions rely on a numerical representation of the reward function, which raises problems related to the adopted optimization procedures. We describe a TL method where the execution traces generated by the expert agent, possibly via planning, are used to produce a logical (as opposed to numerical) specification of the reward function, to be incorporated in a device known as Restraining Bolt (RB). The RB can be attached to the learning agent to drive the learning process and ultimately make it imitate the expert. We show that TL can be applied to heterogeneous agents, with the expert, the learner and the RB using different representations of the environment's actions and states, without specifying mappings among their representations.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2020
		
	Nome convegno
	
			International Conference on Automated Planning and Scheduling
		
	Parole chiave
	
			Restraining bolts; non-markovian rewards; imitation learning
		
	Tipologia
	
			04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
		
	Citazione
	
			Imitation learning over heterogeneous agents with restraining bolts / DE GIACOMO, Giuseppe; Iocchi, Luca; Favorito, Marco; Patrizi, Fabio. - 30:(2020), pp. 517-521. (Intervento presentato al  convegno International Conference on Automated Planning and Scheduling tenutosi a Nancy; France).
		
	Appartiene alla tipologia:
	
			04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
DeGiacomo_Imitation-Learning_2020.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 582.53 kB Formato Adobe PDF	582.53 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1434370

Citazioni

ND

9

ND

social impact