Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.. - 13589:(2023), pp. 3-20. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-23480-4_1].

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer

Brandizzi N.
Co-primo
Methodology
;
Fanti A.
Co-primo
Software
;
Russo S.
Co-primo
Conceptualization
;
Iocchi L.
Validation
;
Nardi D.
Funding Acquisition
;
Napoli C.
Ultimo
Supervision
2023

Abstract

Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.
2023
International Conference on Artificial Intelligence and Soft Computing
artificial intelligence and applications; computer vision; image understanding; pose estimation; visual transformers
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.. - 13589:(2023), pp. 3-20. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-23480-4_1].
File allegati a questo prodotto
File Dimensione Formato  
Brandizzi_Unsupervised_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.32 MB
Formato Adobe PDF
1.32 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1683239
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact