On the robustness of vision transformers for in-flight monocular depth estimation

Ercolino, Simone; Devoto, Alessio; Monorchio, Luca; Santini, Matteo; Mazzaro, Silvio; Scardapane, Simone

doi:10.1007/s44244-023-00005-3

Monocular depth estimation (MDE) has shown impressive performance recently, even in zero-shot or few-shot scenarios. In this paper, we consider the use of MDE on board low-altitude drone flights, which is required in a number of safety-critical and monitoring operations. In particular, we evaluate a state-of-the-art vision transformer (ViT) variant, pre-trained on a massive MDE dataset. We test it both in a zero-shot scenario and after fine-tuning on a dataset of flight records, and compare its performance to that of a classical fully convolutional network. In addition, we evaluate for the first time whether these models are susceptible to adversarial attacks, by optimizing a small adversarial patch that generalizes across scenarios. We investigate several variants of losses for this task, including weighted error losses in which we can customize the design of the patch to selectively decrease the performance of the model on a desired depth range. Overall, our results highlight that (a) ViTs can outperform convolutive models in this context after a proper fine-tuning, and (b) they appear to be more robust to adversarial attacks designed in the form of patches, which is a crucial property for this family of tasks.

On the robustness of vision transformers for in-flight monocular depth estimation / Ercolino, Simone; Devoto, Alessio; Monorchio, Luca; Santini, Matteo; Mazzaro, Silvio; Scardapane, Simone. - 1:1(2023). [10.1007/s44244-023-00005-3]

On the robustness of vision transformers for in-flight monocular depth estimation

Ercolino, Simone;Devoto, Alessio;Monorchio, Luca;Santini, Matteo;Mazzaro, Silvio;Scardapane, Simone

2023

Abstract

Monocular depth estimation (MDE) has shown impressive performance recently, even in zero-shot or few-shot scenarios. In this paper, we consider the use of MDE on board low-altitude drone flights, which is required in a number of safety-critical and monitoring operations. In particular, we evaluate a state-of-the-art vision transformer (ViT) variant, pre-trained on a massive MDE dataset. We test it both in a zero-shot scenario and after fine-tuning on a dataset of flight records, and compare its performance to that of a classical fully convolutional network. In addition, we evaluate for the first time whether these models are susceptible to adversarial attacks, by optimizing a small adversarial patch that generalizes across scenarios. We investigate several variants of losses for this task, including weighted error losses in which we can customize the design of the patch to selectively decrease the performance of the model on a desired depth range. Overall, our results highlight that (a) ViTs can outperform convolutive models in this context after a proper fine-tuning, and (b) they appear to be more robust to adversarial attacks designed in the form of patches, which is a crucial property for this family of tasks.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				depth estimation; monocular; drone
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				On the robustness of vision transformers for in-flight monocular depth estimation / Ercolino, Simone; Devoto, Alessio; Monorchio, Luca; Santini, Matteo; Mazzaro, Silvio; Scardapane, Simone. - 1:1(2023). [10.1007/s44244-023-00005-3]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
s44244-023-00005-3 (1).pdf accesso aperto Note: Ercolino_On the robustness of vision transformers_2023 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 2.41 MB Formato Adobe PDF	2.41 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1693645

Citazioni

ND

ND

ND

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Catalogo dei prodotti della ricerca