VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures

De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide

doi:10.1145/3656650.3656677

VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.

VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures / De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide. - (2024), pp. 1-5. (Intervento presentato al convegno Advanced Visual Interfaces tenutosi a Arenzano; Italy) [10.1145/3656650.3656677].

VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures

De Marsico, Maria;Giacanelli, Chiara;Manganaro, Clizia Giorgia;Palma, Alessio^Software;Santoro, Davide

2024

Abstract

VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				Advanced Visual Interfaces
			
	Parole chiave
	
				visual interfaces; mobile application; generative ai; visually impaired users assistance; visual question answering
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures / De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide. - (2024), pp. 1-5. (Intervento presentato al  convegno Advanced Visual Interfaces tenutosi a Arenzano; Italy) [10.1145/3656650.3656677].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
DeMarsico_VQAsk-multimodal-Android_2024.pdf accesso aperto Note: https://dl.acm.org/doi/pdf/10.1145/3656650.3656677 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 794.24 kB Formato Adobe PDF	794.24 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726579

Citazioni

ND

0

0

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Catalogo dei prodotti della ricerca