This paper studies how Transformer models with Rotary Position Embeddings (RoPE) develop emergent, wavelet-like properties that compensate for the positional encoding’s theoretical limitations. Through an analysis span- ning model scales, architectures, and training checkpoints, we show that attention heads evolve to implement multi-resolution processing analogous to wavelet transforms. We demonstrate that this scale-invariant behavior is unique to RoPE, emerges through distinct evolutionary phases during training, and statistically adheres to the fundamental uncertainty principle. Our findings suggest that the effectiveness of modern Transformers stems from their remarkable ability to spontaneously develop optimal, multi-resolution decompositions to address inherent architectural constraints.

Beyond Position: the emergence of wavelet-like properties in Transformers / Ruscio, Valeria; Nanni, Umberto; Silvestri, Fabrizio. - (2025), pp. 6074-6088. (Intervento presentato al convegno 63rd Annual Meeting of the Association for Computational Linguistics tenutosi a Vienna, Austria) [10.48448/nmnx-th58].

Beyond Position: the emergence of wavelet-like properties in Transformers

Valeria Ruscio;Umberto Nanni;Fabrizio Silvestri
2025

Abstract

This paper studies how Transformer models with Rotary Position Embeddings (RoPE) develop emergent, wavelet-like properties that compensate for the positional encoding’s theoretical limitations. Through an analysis span- ning model scales, architectures, and training checkpoints, we show that attention heads evolve to implement multi-resolution processing analogous to wavelet transforms. We demonstrate that this scale-invariant behavior is unique to RoPE, emerges through distinct evolutionary phases during training, and statistically adheres to the fundamental uncertainty principle. Our findings suggest that the effectiveness of modern Transformers stems from their remarkable ability to spontaneously develop optimal, multi-resolution decompositions to address inherent architectural constraints.
2025
63rd Annual Meeting of the Association for Computational Linguistics
transformers, transformer architectures, rotary position embeddings
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Beyond Position: the emergence of wavelet-like properties in Transformers / Ruscio, Valeria; Nanni, Umberto; Silvestri, Fabrizio. - (2025), pp. 6074-6088. (Intervento presentato al convegno 63rd Annual Meeting of the Association for Computational Linguistics tenutosi a Vienna, Austria) [10.48448/nmnx-th58].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1749255
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact