Catalogo dei prodotti della ricerca

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning.

Temporal logic monitoring rewards via transducers / De Giacomo, Giuseppe; Favorito, Marco; Iocchi, Luca; Patrizi, Fabio; Ronca, Alessandro. - (2020), pp. 860-870. (Intervento presentato al convegno International Conference on the Principles of Knowledge Representation and Reasoning tenutosi a Rhodes; Greece) [10.24963/kr.2020/89].

Temporal logic monitoring rewards via transducers

De Giacomo, Giuseppe;Favorito, Marco;Iocchi, Luca;Patrizi, Fabio;Ronca, Alessandro

2020

Abstract

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Nome convegno
	
				International Conference on the Principles of Knowledge Representation and Reasoning
			
	Parole chiave
	
				Symbolic reinforcement learning; general reasoning about actions and change; action languages
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Temporal logic monitoring rewards via transducers / De Giacomo, Giuseppe; Favorito, Marco; Iocchi, Luca; Patrizi, Fabio; Ronca, Alessandro. - (2020), pp. 860-870. (Intervento presentato al  convegno International Conference on the Principles of Knowledge Representation and Reasoning tenutosi a Rhodes; Greece) [10.24963/kr.2020/89].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
DeGiacomo_Temporal_2020.pdf accesso aperto Note: https://doi.org/10.24963/kr.2020/89 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 326.71 kB Formato Adobe PDF	326.71 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1471641

Citazioni

ND

10

8

social impact