VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.

VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures / De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide. - (2024), pp. 1-5. (Intervento presentato al convegno Advanced Visual Interfaces tenutosi a Arenzano; Italy) [10.1145/3656650.3656677].

VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures

De Marsico, Maria
;
Giacanelli, Chiara
;
Manganaro, Clizia Giorgia
;
Palma, Alessio
Software
;
2024

Abstract

VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.
2024
Advanced Visual Interfaces
visual interfaces; mobile application; generative ai; visually impaired users assistance; visual question answering
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures / De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide. - (2024), pp. 1-5. (Intervento presentato al convegno Advanced Visual Interfaces tenutosi a Arenzano; Italy) [10.1145/3656650.3656677].
File allegati a questo prodotto
File Dimensione Formato  
DeMarsico_VQAsk-multimodal-Android_2024.pdf

accesso aperto

Note: https://dl.acm.org/doi/pdf/10.1145/3656650.3656677
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 794.24 kB
Formato Adobe PDF
794.24 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726579
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact