The use of Artificial Intelligence principles represents the next research challenge to support future network applications in the upcoming 6G era. In this work, we propose a novel approach: exploiting the principles of Reinforcement Learning (RL) and the availability of programmable switches to implement a new forwarding mechanism in the data plane of the 6G core network. More in detail, we define a Q-learning-based forwarding mechanism that acts at packet level and is able to select the minimum latency path at line rate. Our solution, referred to as Q-Learning-based Queue Length Routing in DAta Plane ((QL)2-RODAP), is fully decentralized and exploits in-band network telemetry to distribute network states among network nodes. We show that, either in random and real network topologies, our (QL)2-RODAP algorithm promptly reacts to sudden traffic bursts, and allows reducing the peak of queuing delays of about 65 - 85 % with respect to other RL based approaches, thus cutting off the long tail of end-to-end latency that is critical for delay sensitive applications.

In-network Q-learning-based packet forwarding for delay sensitive applications / Polverini, M.; Cianfrani, A.; Listanti, M.; Caiazzi, T.; Scazzariello, M.. - In: IEEE NETWORK. - ISSN 0890-8044. - 39:3(2025), pp. 127-133. [10.1109/MNET.2025.3552929]

In-network Q-learning-based packet forwarding for delay sensitive applications

Polverini M.
;
Listanti M.;
2025

Abstract

The use of Artificial Intelligence principles represents the next research challenge to support future network applications in the upcoming 6G era. In this work, we propose a novel approach: exploiting the principles of Reinforcement Learning (RL) and the availability of programmable switches to implement a new forwarding mechanism in the data plane of the 6G core network. More in detail, we define a Q-learning-based forwarding mechanism that acts at packet level and is able to select the minimum latency path at line rate. Our solution, referred to as Q-Learning-based Queue Length Routing in DAta Plane ((QL)2-RODAP), is fully decentralized and exploits in-band network telemetry to distribute network states among network nodes. We show that, either in random and real network topologies, our (QL)2-RODAP algorithm promptly reacts to sudden traffic bursts, and allows reducing the peak of queuing delays of about 65 - 85 % with respect to other RL based approaches, thus cutting off the long tail of end-to-end latency that is critical for delay sensitive applications.
2025
in-band network telemetry; in-network computing; low latency communications; network programmability
01 Pubblicazione su rivista::01a Articolo in rivista
In-network Q-learning-based packet forwarding for delay sensitive applications / Polverini, M.; Cianfrani, A.; Listanti, M.; Caiazzi, T.; Scazzariello, M.. - In: IEEE NETWORK. - ISSN 0890-8044. - 39:3(2025), pp. 127-133. [10.1109/MNET.2025.3552929]
File allegati a questo prodotto
File Dimensione Formato  
Polverini_In-network_2025.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 496.9 kB
Formato Adobe PDF
496.9 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1767928
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact