Coordinating and synchronizing multiple agents in reinforcement learning (RL) presents significant challenges, particularly when concurrent actions and shared objectives are required. We propose a novel framework that integrates Reward Machines (RMs) with Partial-Order Planning (POP) to enhance coordination in multiagent reinforcement learning (MARL). By transforming high-level POP strategies into individual RMs for each agent, our approach explicitly captures action dependencies and concurrency requirements, enabling agents to learn and execute coordinated plans effectively in complex environments. We validate our approach in a grid-based multiagent domain in which agents have to synchronize actions such as jointly accessing limited pathways or collaboratively manipulating objects. The explicit representation of action dependencies and synchronization points in RMs provides a scalable and flexible mechanism to model concurrent actions, enabling agents to focus on relevant tasks and reducing exploration.

Concurrent Multiagent Reinforcement Learning with Reward Machines / Trapasso, Alessandro; Jonsson, Anders. - 413:(2025), pp. 3735-3742. (Intervento presentato al convegno 28th European Conference on Artificial Intelligence (ECAI 2025) tenutosi a Bologna; Italy) [10.3233/FAIA251253].

Concurrent Multiagent Reinforcement Learning with Reward Machines

Alessandro Trapasso
;
2025

Abstract

Coordinating and synchronizing multiple agents in reinforcement learning (RL) presents significant challenges, particularly when concurrent actions and shared objectives are required. We propose a novel framework that integrates Reward Machines (RMs) with Partial-Order Planning (POP) to enhance coordination in multiagent reinforcement learning (MARL). By transforming high-level POP strategies into individual RMs for each agent, our approach explicitly captures action dependencies and concurrency requirements, enabling agents to learn and execute coordinated plans effectively in complex environments. We validate our approach in a grid-based multiagent domain in which agents have to synchronize actions such as jointly accessing limited pathways or collaboratively manipulating objects. The explicit representation of action dependencies and synchronization points in RMs provides a scalable and flexible mechanism to model concurrent actions, enabling agents to focus on relevant tasks and reducing exploration.
2025
28th European Conference on Artificial Intelligence (ECAI 2025)
reinforcement learning; automata; multi-agent reinforcement learning; reward machines; concurrent actions; partial-order planning; multi-agent planning; planning; concurrent planning; coordination
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Concurrent Multiagent Reinforcement Learning with Reward Machines / Trapasso, Alessandro; Jonsson, Anders. - 413:(2025), pp. 3735-3742. (Intervento presentato al convegno 28th European Conference on Artificial Intelligence (ECAI 2025) tenutosi a Bologna; Italy) [10.3233/FAIA251253].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1756085
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact