Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.. - 13589:(2023), pp. 3-20. (Intervento presentato al convegno 21st International Conference on Artificial Intelligence and Soft Computing, ICAISC 2022 tenutosi a Zakopane, Poland) [10.1007/978-3-031-23480-4_1].

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer

Brandizzi N.
Co-primo
Methodology
;
Fanti A.
Co-primo
Software
;
Russo S.
Co-primo
Conceptualization
;
Iocchi L.
Validation
;
Nardi D.
Funding Acquisition
;
Napoli C.
Ultimo
Supervision
2023

Abstract

Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.
2023
21st International Conference on Artificial Intelligence and Soft Computing, ICAISC 2022
Artificial intelligence and applications; Computer vision; Image understanding; Pose estimation; Visual transformers
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.. - 13589:(2023), pp. 3-20. (Intervento presentato al convegno 21st International Conference on Artificial Intelligence and Soft Computing, ICAISC 2022 tenutosi a Zakopane, Poland) [10.1007/978-3-031-23480-4_1].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1683239
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact