An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents

Monaci, Marta; Agasucci, Valerio; Grani, Giorgio

doi:10.1016/j.ejor.2023.07.037

In this work, we present a method that applies Deep Reinforcement Learning, an approximate dynamic programming procedure using deep neural networks, to the job shop scheduling problem (JSSP). The aim is to show that a greedy-like heuristic trained on a subset of problems, can effectively generalize to some extent to unseen instances, and be competitive compared to other methods. We model the JSSP as a Markov Decision Process and we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme based on policy gradients, specifically the Proximal Policy Gradient method, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures take into account the challenging nature of JSSP, where the state and the action space change for every instance and after each decision. To tackle this variability, we introduced a novel model based on two incident Long-Short Term Memory networks, followed by an encoding model, different in structure for both the actor and the critic. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. We compared our algorithms against several established heuristics, an adaptive method, a commercial solver based on branch and cut, and another approach based on Deep Reinforcement Learning, proving the validity of the proposed method in terms of time and makespan. The model can generalize, to some extent, to larger problems originating from a different distribution.

An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents / Monaci, Marta; Agasucci, Valerio; Grani, Giorgio. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 312:3(2024), pp. 910-926. [10.1016/j.ejor.2023.07.037]

An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents

Marta Monaci^Primo;Valerio Agasucci^Secondo;Giorgio Grani^Ultimo

2024

Abstract

In this work, we present a method that applies Deep Reinforcement Learning, an approximate dynamic programming procedure using deep neural networks, to the job shop scheduling problem (JSSP). The aim is to show that a greedy-like heuristic trained on a subset of problems, can effectively generalize to some extent to unseen instances, and be competitive compared to other methods. We model the JSSP as a Markov Decision Process and we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme based on policy gradients, specifically the Proximal Policy Gradient method, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures take into account the challenging nature of JSSP, where the state and the action space change for every instance and after each decision. To tackle this variability, we introduced a novel model based on two incident Long-Short Term Memory networks, followed by an encoding model, different in structure for both the actor and the critic. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. We compared our algorithms against several established heuristics, an adaptive method, a commercial solver based on branch and cut, and another approach based on Deep Reinforcement Learning, proving the validity of the proposed method in terms of time and makespan. The model can generalize, to some extent, to larger problems originating from a different distribution.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				scheduling; machine learning; reinforcement learning; neural networks
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents / Monaci, Marta; Agasucci, Valerio; Grani, Giorgio. - In: EUROPEAN JOURNAL OF OPERATIONAL RESEARCH. - ISSN 0377-2217. - 312:3(2024), pp. 910-926. [10.1016/j.ejor.2023.07.037]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Monaci_An-actor-critic_2024.pdf accesso aperto Note: https://doi.org/10.1016/j.ejor.2023.07.037 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 1.77 MB Formato Adobe PDF	1.77 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1645969

Citazioni

ND

25

22

Catalogo dei prodotti della ricerca