The use of Artificial Intelligence principles represents the next research challenge to support future network applications in the upcoming 6G era. In this work, we propose a novel approach: exploiting the principles of Reinforcement Learning (RL) and the availability of programmable switches to implement a new forwarding mechanism in the data plane of the 6G core network. More in detail, we define a Q-learning-based forwarding mechanism that acts at packet level and is able to select the minimum latency path at line rate. Our solution, referred to as Q-Learning-based Queue Length Routing in DAta Plane ((QL)2-RODAP), is fully decentralized and exploits in-band network telemetry to distribute network states among network nodes. We show that, either in random and real network topologies, our (QL)2-RODAP algorithm promptly reacts to sudden traffic bursts, and allows reducing the peak of queuing delays of about 65 - 85 % with respect to other RL based approaches, thus cutting off the long tail of end-to-end latency that is critical for delay sensitive applications.
In-network Q-learning-based packet forwarding for delay sensitive applications / Polverini, M.; Cianfrani, A.; Listanti, M.; Caiazzi, T.; Scazzariello, M.. - In: IEEE NETWORK. - ISSN 0890-8044. - 39:3(2025), pp. 127-133. [10.1109/MNET.2025.3552929]
In-network Q-learning-based packet forwarding for delay sensitive applications
Polverini M.
;Listanti M.;
2025
Abstract
The use of Artificial Intelligence principles represents the next research challenge to support future network applications in the upcoming 6G era. In this work, we propose a novel approach: exploiting the principles of Reinforcement Learning (RL) and the availability of programmable switches to implement a new forwarding mechanism in the data plane of the 6G core network. More in detail, we define a Q-learning-based forwarding mechanism that acts at packet level and is able to select the minimum latency path at line rate. Our solution, referred to as Q-Learning-based Queue Length Routing in DAta Plane ((QL)2-RODAP), is fully decentralized and exploits in-band network telemetry to distribute network states among network nodes. We show that, either in random and real network topologies, our (QL)2-RODAP algorithm promptly reacts to sudden traffic bursts, and allows reducing the peak of queuing delays of about 65 - 85 % with respect to other RL based approaches, thus cutting off the long tail of end-to-end latency that is critical for delay sensitive applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
Polverini_In-network_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
496.9 kB
Formato
Adobe PDF
|
496.9 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


