Unsupervised Pose Estimation by Means of an Innovative Vision Transformer

Brandizzi, N.; Fanti, A.; Gallotta, R.; Russo, S.; Iocchi, L.; Nardi, D.; Napoli, C.

doi:10.1007/978-3-031-23480-4_1

Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N., Fanti, A., Gallotta, R., Russo, S., Iocchi, L., Nardi, D., Napoli, C.. - 13589:(2023), pp. 3-20. (International Conference on Artificial Intelligence and Soft Computing Zakopane; Poland ) [10.1007/978-3-031-23480-4_1].

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer

Brandizzi N.^{Co-primo

Methodology};Fanti A.^{Co-primo

Software};Russo S.^{Co-primo

Conceptualization};Iocchi L.^Validation;Nardi D.^{Funding Acquisition};Napoli C.^{Ultimo

Supervision}

2023

Abstract

Attention-only Transformers [34] have been applied to solve Natural Language Processing (NLP) tasks and Computer Vision (CV) tasks. One particular Transformer architecture developed for CV is the Vision Transformer (ViT) [15]. ViT models have been used to solve numerous tasks in the CV area. One interesting task is the pose estimation of a human subject. We present our modified ViT model, Un-TraPEs (UNsupervised TRAnsformer for Pose Estimation), that can reconstruct a subject’s pose from its monocular image and estimated depth. We compare the results obtained with such a model against a ResNet [17] trained from scratch and a ViT finetuned to the task and show promising results.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				International Conference on Artificial Intelligence and Soft Computing
			
	Parole chiave
	
				artificial intelligence and applications; computer vision; image understanding; pose estimation; visual transformers
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Unsupervised Pose Estimation by Means of an Innovative Vision Transformer / Brandizzi, N., Fanti, A., Gallotta, R., Russo, S., Iocchi, L., Nardi, D., Napoli, C.. - 13589:(2023), pp. 3-20. (International Conference on Artificial Intelligence and Soft Computing Zakopane; Poland ) [10.1007/978-3-031-23480-4_1].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Brandizzi_Unsupervised_2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.32 MB Formato Adobe PDF Contatta l'autore	1.32 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1683239

Citazioni

ND

22

4

Catalogo dei prodotti della ricerca

Unsupervised Pose Estimation by Means of an Innovative Vision Transformer

Brandizzi N.^{Co-primo

Methodology};Fanti A.^{Co-primo

Software};Russo S.^{Co-primo

Conceptualization};Iocchi L.^Validation;Nardi D.^{Funding Acquisition};Napoli C.^{Ultimo

Supervision}

Co-primo

Methodology

Co-primo

Software

Co-primo

Software

Co-primo

Conceptualization

Validation

Funding Acquisition

Ultimo

Supervision

2023