In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on communication load and computation. Transformers have become the standard architecture for pre-training large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable communication constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multi-head attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and communication bottleneck constraints, without the need for deploying multiple architectures tailored to each constraint. Additionally, the proposed token selection mechanism extracts powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.
Adaptive Semantic Token Selection for AI-native Goal-oriented Communications / Devoto, Alessio; Petruzzi, Simone; Pomponi, Jary; Di Lorenzo, Paolo; Scardapane, Simone. - (2024). ( Globecomm 2024 Cape Town ).
Adaptive Semantic Token Selection for AI-native Goal-oriented Communications
Devoto, Alessio;Petruzzi, Simone;Pomponi, Jary;Di Lorenzo, Paolo;Scardapane, Simone
2024
Abstract
In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on communication load and computation. Transformers have become the standard architecture for pre-training large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable communication constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multi-head attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and communication bottleneck constraints, without the need for deploying multiple architectures tailored to each constraint. Additionally, the proposed token selection mechanism extracts powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


