In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

LTLf/LDLf Non-Markovian Rewards / Brafman, RONEN ISRAEL; DE GIACOMO, Giuseppe; Patrizi, Fabio. - (2018), pp. 1771-1778. (Intervento presentato al convegno 32th AAAI Conference on Artificial Intelligence (AAAI-18) tenutosi a New Orleans, Louisiana; USA).

LTLf/LDLf Non-Markovian Rewards

BRAFMAN, RONEN ISRAEL
;
Giuseppe De Giacomo
;
Fabio Patrizi
2018

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.
2018
32th AAAI Conference on Artificial Intelligence (AAAI-18)
Artificial Intelligence; Linear Time Logic; Markov Decision Processes
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
LTLf/LDLf Non-Markovian Rewards / Brafman, RONEN ISRAEL; DE GIACOMO, Giuseppe; Patrizi, Fabio. - (2018), pp. 1771-1778. (Intervento presentato al convegno 32th AAAI Conference on Artificial Intelligence (AAAI-18) tenutosi a New Orleans, Louisiana; USA).
File allegati a questo prodotto
File Dimensione Formato  
Brafman_Postprint_LTLf_2018.pdf

accesso aperto

Note: https://www.google.com/search?client=firefox-b-d&q=LTLf%2FLDLfNon-Markovian+Rewards
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 394.13 kB
Formato Adobe PDF
394.13 kB Adobe PDF
Brafman_LTLf_2018.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 546.48 kB
Formato Adobe PDF
546.48 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1182865
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 44
  • ???jsp.display-item.citation.isi??? 14
social impact