Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are used to assess the building blocks of a VQA system. This paper rather considers the proposals that focus on real-world applications, possibly using as benchmarks suitable data bound to the application domain. The paper also reports about some recent challenges in VQA research.

Visual question answering: Which investigated applications? / Barra, S.; Bisogni, C.; De Marsico, M.; Ricciardi, S.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 151:(2021), pp. 325-331. [10.1016/j.patrec.2021.09.008]

Visual question answering: Which investigated applications?

De Marsico M.;
2021

Abstract

Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are used to assess the building blocks of a VQA system. This paper rather considers the proposals that focus on real-world applications, possibly using as benchmarks suitable data bound to the application domain. The paper also reports about some recent challenges in VQA research.
2021
Real-world VQA; Visual question answering; VQA for assistive applications; VQA for context awareness; VQA for medical applicatons; VQA in cultural heritage and education
01 Pubblicazione su rivista::01a Articolo in rivista
Visual question answering: Which investigated applications? / Barra, S.; Bisogni, C.; De Marsico, M.; Ricciardi, S.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 151:(2021), pp. 325-331. [10.1016/j.patrec.2021.09.008]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1598577
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 20
social impact