Reinforcement learning through global stochastic search in N-MDPs

Leonetti, Matteo; Iocchi, Luca; Subramanian, Ramamoorthy

doi:10.1007/978-3-642-23783-6_21

Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored - something that is crucial for realistic autonomous agents that must bound search time - it does render the domain Non-Markovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples. © 2011 Springer-Verlag.

Reinforcement learning through global stochastic search in N-MDPs / Matteo, Leonetti; IOCCHI, Luca; Ramamoorthy, Subramanian. - 6912 LNAI:PART 2(2011), pp. 326-340. (Intervento presentato al convegno European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2011 tenutosi a Athens; Greece) [10.1007/978-3-642-23783-6_21].

Reinforcement learning through global stochastic search in N-MDPs

Matteo Leonetti;IOCCHI, Luca;Ramamoorthy Subramanian

2011

Abstract

Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored - something that is crucial for realistic autonomous agents that must bound search time - it does render the domain Non-Markovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples. © 2011 Springer-Verlag.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2011
		
	Nome convegno
	
			European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2011
		
	Parole chiave
	
			reinforcement learning; decision problems; markovian
		
	Tipologia
	
			04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
		
	Citazione
	
			Reinforcement learning through global stochastic search in N-MDPs / Matteo, Leonetti; IOCCHI, Luca; Ramamoorthy, Subramanian. - 6912 LNAI:PART 2(2011), pp. 326-340. (Intervento presentato al  convegno European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2011 tenutosi a Athens; Greece) [10.1007/978-3-642-23783-6_21].
		
	Appartiene alla tipologia:
	
			04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Leonetti_Reinforcement_2011.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 396.39 kB Formato Adobe PDF Contatta l'autore	396.39 kB	Adobe PDF	Contatta l'autore
VE_2011_11573-436436.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 396.39 kB Formato Adobe PDF Contatta l'autore	396.39 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/436436

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

0

Catalogo dei prodotti della ricerca