Multi-modal inference requires heterogeneous perceptual streams to converge to a single, internally consistent interpretation. Standard cross-attention does not enforce this consistency: each modality maintains an independent posterior over a shared memory bank, which admits chimeric states, namely stable configurations in which different modalities retrieve different prototypes from the shared memory. We introduce the Multi-modal Transformer Associative Memory (mTAM), an energy-based architecture that precludes chimeric states by construction. Its core mechanism, Consensus Split-Bank Attention (CSA), aggregates query-key evidence across modalities into a single global score, produces one shared distribution over memory, and broadcasts it synchronously to every modality. The resulting dynamics correspond to the Concave-Convex Procedure applied to a Difference-of-Convex energy, which guarantees monotonic descent and convergence of each trajectory to a stationary point. A graph-lifting construction maps the model to a Modern Hopfield Network and yields a topology-dependent critical load through an extreme-value capacity analysis in the spirit of the Random Energy Model. Synthetic experiments show retrieval transitions, one-step chimera resolution where standard baselines fail, and topology-dependent capacity scaling consistent with the theory.
Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories / Agliari, Elena; Barra, Adriano; Ladiana, Andrea; Lepre, Andrea. - (2026). ( International Conference on Learning Representations (ICLR 2026) – Workshop “New Frontiers in Associative Memories (NFAM 2026) Rio de Janeiro, Brazil ).
Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories
Elena Agliari;Adriano Barra;Andrea Ladiana
;Andrea Lepre
2026
Abstract
Multi-modal inference requires heterogeneous perceptual streams to converge to a single, internally consistent interpretation. Standard cross-attention does not enforce this consistency: each modality maintains an independent posterior over a shared memory bank, which admits chimeric states, namely stable configurations in which different modalities retrieve different prototypes from the shared memory. We introduce the Multi-modal Transformer Associative Memory (mTAM), an energy-based architecture that precludes chimeric states by construction. Its core mechanism, Consensus Split-Bank Attention (CSA), aggregates query-key evidence across modalities into a single global score, produces one shared distribution over memory, and broadcasts it synchronously to every modality. The resulting dynamics correspond to the Concave-Convex Procedure applied to a Difference-of-Convex energy, which guarantees monotonic descent and convergence of each trajectory to a stationary point. A graph-lifting construction maps the model to a Modern Hopfield Network and yields a topology-dependent critical load through an extreme-value capacity analysis in the spirit of the Random Energy Model. Synthetic experiments show retrieval transitions, one-step chimera resolution where standard baselines fail, and topology-dependent capacity scaling consistent with the theory.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


