VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.
VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures / De Marsico, Maria; Giacanelli, Chiara; Manganaro, Clizia Giorgia; Palma, Alessio; Santoro, Davide. - (2024), pp. 1-5. (Intervento presentato al convegno Advanced Visual Interfaces tenutosi a Arenzano; Italy) [10.1145/3656650.3656677].
VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures
De Marsico, Maria
;Giacanelli, Chiara
;Manganaro, Clizia Giorgia
;Palma, Alessio
Software
;
2024
Abstract
VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.File | Dimensione | Formato | |
---|---|---|---|
DeMarsico_VQAsk-multimodal-Android_2024.pdf
accesso aperto
Note: https://dl.acm.org/doi/pdf/10.1145/3656650.3656677
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
794.24 kB
Formato
Adobe PDF
|
794.24 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.