Catalogo dei prodotti della ricerca

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

LTLf/LDLf Non-Markovian Rewards / Brafman, R.I., DE GIACOMO, G., Patrizi, F.. - (2018), pp. 1771-1778. (32th AAAI Conference on Artificial Intelligence (AAAI-18) New Orleans, Louisiana; USA ).

LTLf/LDLf Non-Markovian Rewards

BRAFMAN, RONEN ISRAEL;Giuseppe De Giacomo;Fabio Patrizi

2018

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Nome convegno
	
				32th AAAI Conference on Artificial Intelligence (AAAI-18)
			
	Parole chiave
	
				Artificial Intelligence; Linear Time Logic; Markov Decision Processes
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				LTLf/LDLf Non-Markovian Rewards / Brafman, R.I., DE GIACOMO, G., Patrizi, F.. - (2018), pp. 1771-1778. (32th AAAI Conference on Artificial Intelligence (AAAI-18) New Orleans, Louisiana; USA ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Brafman_Postprint_LTLf_2018.pdf accesso aperto Note: https://www.google.com/search?client=firefox-b-d&q=LTLf%2FLDLfNon-Markovian+Rewards Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 394.13 kB Formato Adobe PDF	394.13 kB	Adobe PDF
Brafman_LTLf_2018.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 546.48 kB Formato Adobe PDF Contatta l'autore	546.48 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1182865

Citazioni

ND

77

35

social impact