Multi-modal inference requires heterogeneous perceptual streams to converge to a single, internally consistent interpretation. Standard cross-attention does not enforce this consistency: each modality maintains an independent posterior over a shared memory bank, which admits chimeric states, namely stable configurations in which different modalities retrieve different prototypes from the shared memory. We introduce the Multi-modal Transformer Associative Memory (mTAM), an energy-based architecture that precludes chimeric states by construction. Its core mechanism, Consensus Split-Bank Attention (CSA), aggregates query-key evidence across modalities into a single global score, produces one shared distribution over memory, and broadcasts it synchronously to every modality. The resulting dynamics correspond to the Concave-Convex Procedure applied to a Difference-of-Convex energy, which guarantees monotonic descent and convergence of each trajectory to a stationary point. A graph-lifting construction maps the model to a Modern Hopfield Network and yields a topology-dependent critical load through an extreme-value capacity analysis in the spirit of the Random Energy Model. Synthetic experiments show retrieval transitions, one-step chimera resolution where standard baselines fail, and topology-dependent capacity scaling consistent with the theory.

Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories / Agliari, Elena; Barra, Adriano; Ladiana, Andrea; Lepre, Andrea. - (2026). ( International Conference on Learning Representations (ICLR 2026) – Workshop “New Frontiers in Associative Memories (NFAM 2026) Rio de Janeiro, Brazil ).

Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories

Elena Agliari;Adriano Barra;Andrea Ladiana
;
Andrea Lepre
2026

Abstract

Multi-modal inference requires heterogeneous perceptual streams to converge to a single, internally consistent interpretation. Standard cross-attention does not enforce this consistency: each modality maintains an independent posterior over a shared memory bank, which admits chimeric states, namely stable configurations in which different modalities retrieve different prototypes from the shared memory. We introduce the Multi-modal Transformer Associative Memory (mTAM), an energy-based architecture that precludes chimeric states by construction. Its core mechanism, Consensus Split-Bank Attention (CSA), aggregates query-key evidence across modalities into a single global score, produces one shared distribution over memory, and broadcasts it synchronously to every modality. The resulting dynamics correspond to the Concave-Convex Procedure applied to a Difference-of-Convex energy, which guarantees monotonic descent and convergence of each trajectory to a stationary point. A graph-lifting construction maps the model to a Modern Hopfield Network and yields a topology-dependent critical load through an extreme-value capacity analysis in the spirit of the Random Energy Model. Synthetic experiments show retrieval transitions, one-step chimera resolution where standard baselines fail, and topology-dependent capacity scaling consistent with the theory.
2026
International Conference on Learning Representations (ICLR 2026) – Workshop “New Frontiers in Associative Memories (NFAM 2026)
associative memory; multi-modal learning; binding problem; energy-based models; transformer attention
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories / Agliari, Elena; Barra, Adriano; Ladiana, Andrea; Lepre, Andrea. - (2026). ( International Conference on Learning Representations (ICLR 2026) – Workshop “New Frontiers in Associative Memories (NFAM 2026) Rio de Janeiro, Brazil ).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1766723
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact