Next-generation real-time compute-intensive applications, such as extended reality, multi-user gaming, and autonomous transportation, are increasingly composed of heterogeneous AI-intensive functions with diverse resource requirements and stringent latency constraints. While recent advances have enabled very efficient algorithms for joint service placement, routing, and resource allocation for increasingly complex applications, current models fail to capture the non-linear relationship between delay and resource usage that becomes especially relevant in AI-intensive workloads. In this paper, we extend the cloud network flow optimization framework to support queueing-delay-aware orchestration of distributed AI applications over edge-cloud infrastructures. We introduce two execution models, Guaranteed-Resource (GR) and Shared-Resource (SR), that more accurately capture how computation and communication delays emerge from system-level resource constraints. These models incorporate M/M/1 and M/G/1 queue dynamics to represent dedicated and shared resource usage, respectively. The resulting optimization problem is non-convex due to the non-linear delay terms. To overcome this, we develop SPARQ, an iterative approximation algorithm that decomposes the problem into two convex sub-problems, enabling joint optimization of service placement, routing, and resource allocation under nonlinear delay constraints. The modeling approach is validated against real-world data. Simulation results demonstrate that the SPARQ not only offers a more faithful representation of system delays, but also substantially improves resource efficiency and the overall cost-delay tradeoff compared to existing state-of-the-art methods.

SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications Under Non-Linear Delay Constraints / Spadaccino, Pietro; Di Lorenzo, Paolo; Barbarossa, Sergio; Tulino, Antonia Maria; Llorca, Jaime. - In: IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT. - ISSN 1932-4537. - 23:(2026), pp. 3479-3496. [10.1109/tnsm.2026.3673194]

SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications Under Non-Linear Delay Constraints

Spadaccino, Pietro
;
Di Lorenzo, Paolo;Barbarossa, Sergio;
2026

Abstract

Next-generation real-time compute-intensive applications, such as extended reality, multi-user gaming, and autonomous transportation, are increasingly composed of heterogeneous AI-intensive functions with diverse resource requirements and stringent latency constraints. While recent advances have enabled very efficient algorithms for joint service placement, routing, and resource allocation for increasingly complex applications, current models fail to capture the non-linear relationship between delay and resource usage that becomes especially relevant in AI-intensive workloads. In this paper, we extend the cloud network flow optimization framework to support queueing-delay-aware orchestration of distributed AI applications over edge-cloud infrastructures. We introduce two execution models, Guaranteed-Resource (GR) and Shared-Resource (SR), that more accurately capture how computation and communication delays emerge from system-level resource constraints. These models incorporate M/M/1 and M/G/1 queue dynamics to represent dedicated and shared resource usage, respectively. The resulting optimization problem is non-convex due to the non-linear delay terms. To overcome this, we develop SPARQ, an iterative approximation algorithm that decomposes the problem into two convex sub-problems, enabling joint optimization of service placement, routing, and resource allocation under nonlinear delay constraints. The modeling approach is validated against real-world data. Simulation results demonstrate that the SPARQ not only offers a more faithful representation of system delays, but also substantially improves resource efficiency and the overall cost-delay tradeoff compared to existing state-of-the-art methods.
2026
Edge computing; service function chain; service graph; service placement; resource allocation; cloud network flow
01 Pubblicazione su rivista::01a Articolo in rivista
SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications Under Non-Linear Delay Constraints / Spadaccino, Pietro; Di Lorenzo, Paolo; Barbarossa, Sergio; Tulino, Antonia Maria; Llorca, Jaime. - In: IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT. - ISSN 1932-4537. - 23:(2026), pp. 3479-3496. [10.1109/tnsm.2026.3673194]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1762456
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact