A central question in cognition is how representations are integrated across different modalities, such as language and vision. One prominent hypothesis posits the existence of an abstract, prelinguistic “language of vision” as a representational system that organizes meaning compositionally, enabling cross-modal integration. This hypothesis predicts that the language of vision operates universally, independent of linguistic surface features such as word order. We conducted eye-tracking experiments where participants described visual scenes in English, Portuguese, and Japanese. By analyzing spoken descriptions alongside eye-movement sequences divided into planning and articulation phases, we demonstrate that semantic similarity between sentences strongly predicts the similarity of associated scan patterns in all three languages, even across scenes and between sentences in different languages. In contrast, the effect of syntactic constraints was secondary and transient: it was restricted to within language and within-scene comparisons, and temporally confined to the early planning phase of the utterance. Our findings support an interactive account of cross-modal coordination in which a universal language of vision provides stable semantic scaffolding, while syntax serves as a local constraint, primarily active during message linearization.
The cross‐linguistic coordination of overt attention and speech production as evidence for a language of vision / Coco, M.I., Fernandes, E.G., Arai, M., Keller, F.. - In: COGNITIVE SCIENCE. - ISSN 0364-0213. - 50:2(2026). [10.1111/cogs.70185]
The cross‐linguistic coordination of overt attention and speech production as evidence for a language of vision
Coco, Moreno I.
Primo
;
2026
Abstract
A central question in cognition is how representations are integrated across different modalities, such as language and vision. One prominent hypothesis posits the existence of an abstract, prelinguistic “language of vision” as a representational system that organizes meaning compositionally, enabling cross-modal integration. This hypothesis predicts that the language of vision operates universally, independent of linguistic surface features such as word order. We conducted eye-tracking experiments where participants described visual scenes in English, Portuguese, and Japanese. By analyzing spoken descriptions alongside eye-movement sequences divided into planning and articulation phases, we demonstrate that semantic similarity between sentences strongly predicts the similarity of associated scan patterns in all three languages, even across scenes and between sentences in different languages. In contrast, the effect of syntactic constraints was secondary and transient: it was restricted to within language and within-scene comparisons, and temporally confined to the early planning phase of the utterance. Our findings support an interactive account of cross-modal coordination in which a universal language of vision provides stable semantic scaffolding, while syntax serves as a local constraint, primarily active during message linearization.| File | Dimensione | Formato | |
|---|---|---|---|
|
Coco_ Cross-Linguistic_coordination_2026.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
1.85 MB
Formato
Adobe PDF
|
1.85 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


