Catalogo dei prodotti della ricerca

The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i.e., image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classification over an open set of concepts, both concrete and non-concrete. Our approach is based on the recently introduced Vision-Language Pretraining (VLP) model, and builds upon a manually-annotated dataset of concept-image pairs. We use our technique to clean up the image-to-concept mapping that is provided within a multilingual knowledge base, resulting in over 258,000 images associated with 42,500 concepts. We show that our VLP-based model can be used to create multimodal sense embeddings starting from our automatically-created dataset. In turn, we also show that these multimodal embeddings improve the performance of a Word Sense Disambiguation architecture over a strong unimodal baseline. We release code, dataset and embeddings at http://babelpic.org.

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings / Calabrese, Agostina; Bevilacqua, Michele; Navigli, Roberto. - In: IJCAI. - ISSN 1045-0823. - (2020), pp. 481-487. (Intervento presentato al convegno Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-PRICAI 2020 tenutosi a Yokohama) [10.24963/ijcai.2020/67].

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Calabrese, Agostina;Bevilacqua, Michele;Navigli, Roberto

2020

Abstract

The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i.e., image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classification over an open set of concepts, both concrete and non-concrete. Our approach is based on the recently introduced Vision-Language Pretraining (VLP) model, and builds upon a manually-annotated dataset of concept-image pairs. We use our technique to clean up the image-to-concept mapping that is provided within a multilingual knowledge base, resulting in over 258,000 images associated with 42,500 concepts. We show that our VLP-based model can be used to create multimodal sense embeddings starting from our automatically-created dataset. In turn, we also show that these multimodal embeddings improve the performance of a Word Sense Disambiguation architecture over a strong unimodal baseline. We release code, dataset and embeddings at http://babelpic.org.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2020
		
	Nome convegno
	
			Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-PRICAI 2020
		
	Parole chiave
	
			multimodality; natual language processing; computer vision
		
	Tipologia
	
			04 Pubblicazione in atti di convegno::04c Atto di convegno in rivista
		
	Citazione
	
			EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings / Calabrese, Agostina; Bevilacqua, Michele; Navigli, Roberto. - In: IJCAI. - ISSN 1045-0823. - (2020), pp. 481-487. (Intervento presentato al  convegno Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-PRICAI 2020 tenutosi a Yokohama) [10.24963/ijcai.2020/67].
		
	Appartiene alla tipologia:
	
			04c Atto di convegno in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Calabrese_EViLBERT_2020.pdf accesso aperto Note: https://www.ijcai.org/Proceedings/2020/67 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 520.14 kB Formato Adobe PDF	520.14 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1431898

Citazioni

ND

7

0

social impact