Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.
Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.. - 13589:(2023), pp. 3-20. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-23480-4_1].
Unsupervised Pose Estimation by Means of an Innovative Vision Transformer
Brandizzi N.Co-primo
Methodology
;Fanti A.Co-primo
Software
;Russo S.Co-primo
Conceptualization
;Iocchi L.Validation
;Nardi D.Funding Acquisition
;Napoli C.
Ultimo
Supervision
2023
Abstract
Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.File | Dimensione | Formato | |
---|---|---|---|
Brandizzi_Unsupervised_2023.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.32 MB
Formato
Adobe PDF
|
1.32 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.