Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how individual attention heads in text-generative models specialize in specific semantic or visual attributes. Building on an established interpretability method, we reinterpret the practice of probing intermediate activations with the final decoding layer through the lens of signal processing. This lets us analyze multiple samples in a principled way and rank attention heads based on their relevance to target concepts. Our results show consistent patterns of specialization at the head level across both unimodal and multimodal transformers. Remarkably, we find that editing as few as 1% of the heads, selected using our method, can reliably suppress or enhance targeted concepts in the model output. We validate our approach on language tasks such as question answering and toxicity mitigation, as well as vision-language tasks including image classification and captioning. Our findings highlight an interpretable and controllable structure within attention layers, offering simple tools for understanding and editing large-scale generative models.

Head Pursuit: Probing Attention Specialization in Multimodal Transformers / Basile, Lorenzo; Maiorca, Valentino; Doimo, Diego; Locatello, Francesco; Cazzaniga, Alberto. - (2025). (Intervento presentato al convegno Advances in Neural Information Processing Systems tenutosi a San Diego, California, USA).

Head Pursuit: Probing Attention Specialization in Multimodal Transformers

Valentino Maiorca
;
2025

Abstract

Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how individual attention heads in text-generative models specialize in specific semantic or visual attributes. Building on an established interpretability method, we reinterpret the practice of probing intermediate activations with the final decoding layer through the lens of signal processing. This lets us analyze multiple samples in a principled way and rank attention heads based on their relevance to target concepts. Our results show consistent patterns of specialization at the head level across both unimodal and multimodal transformers. Remarkably, we find that editing as few as 1% of the heads, selected using our method, can reliably suppress or enhance targeted concepts in the model output. We validate our approach on language tasks such as question answering and toxicity mitigation, as well as vision-language tasks including image classification and captioning. Our findings highlight an interpretable and controllable structure within attention layers, offering simple tools for understanding and editing large-scale generative models.
2025
Advances in Neural Information Processing Systems
attention head specialization; sparse decomposition; matching pursuit; unembedding dictionary; generative transformer; head-level intervention; vision-language model; interpretability control
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Head Pursuit: Probing Attention Specialization in Multimodal Transformers / Basile, Lorenzo; Maiorca, Valentino; Doimo, Diego; Locatello, Francesco; Cazzaniga, Alberto. - (2025). (Intervento presentato al convegno Advances in Neural Information Processing Systems tenutosi a San Diego, California, USA).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1755638
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact