DQN-Routing: a novel adaptive routing algorithm for torus networks based on deep reinforcement learning

Lonardo, Alessandro

Torus networks are widely adopted as custom interconnects in High-Performance Computing (HPC) systems because of a series of interesting features, such as their regular physical arrangement, short cabling at low dimensions, good path diversity and good performance for the rather wide class of workloads characterized by local communication patterns. One of their main disadvantages is that they have a larger diameter compared to other network topologies, resulting in an increased communication latency at large sizes. Using a relatively small sized torus network as the lowest tier of a multi-tiered hybrid interconnect allows exploiting all the advantages of this class of networks while circumventing their inherent limitations, as demonstrated by recent works. A large number of routing algorithms for this class of networks has been proposed throughout the years, ranging from deterministic to fully adaptive ones, with the aim of improving performance – especially under non uniform traffic conditions – and fault-tolerance. This thesis describes DQN-Routing: a novel, distributed, unicast, fully adaptive non-minimal routing algorithm for torus networks. The idea behind the algorithm is to leverage the constantly ever-increasing availability of ubiquitous computing power to delegate the routing decision to an agent trained by reinforcement learning. The agent is implemented, according to the Deep Reinforcement Learning approach, with a convolutional neural network trained with a variant of Q-learning (DQN), having local and first-neighbour routers states and packet source and destination coordinates as inputs, and the value functions estimating future rewards for all possible routing actions as output. The agents calculate the reward corresponding to their routing action using the receive timestamp contained in the acknowledge message sent along the reverse path from destination to source node. These rewards are used to guide the training process toward better performance, i.e. to perform routing actions that try to minimize the communication latency of the routed packets given the experienced network state. In our experimental setup, the routing problem is represented as an independent multi-agent reinforcement learning problem , where the environment is provided by the OMNeT++ discrete event simulator framework modeling a torus network under different traffic conditions. The reference network architectures for our investigation have been APEnet and its latest incarnation, the ExaNet multi-tier hybrid network dedicated to HPC. In this context, we focused on the configuration characterized by a number of nodes in the sub-torus tiers equal to sixteen in a 4x4 bi-dimensional torus, which allowed to effectively simulate the network by means of a single, although powerful, GPU-accelerated workstation. We compare the performance of DQN-Routing as measured on this experimental setup for different traffic conditions with those obtained by state-of-the-art routing algorithms, using traffic patterns generated both synthetically and by our reference application, the Distributed Polychronous Spiking Neural Network simulator.

DQN-Routing: a novel adaptive routing algorithm for torus networks based on deep reinforcement learning / Lonardo, Alessandro. - (2020 Feb 18).