Catalogo dei prodotti della ricerca

Spatio-Temporal predictive Learning is a self-supervised learning paradigm that enables models to identify spatial and temporal patterns by predicting future frames based on past frames. Traditional methods, which use recurrent neural networks to capture temporal patterns, have proven their effectiveness but come with high system complexity and computational demand. Convolutions could offer a more efficient alternative but are limited by their characteristic of treating all previous frames equally, resulting in poor temporal characterization, and by their local receptive field, limiting the capacity to capture distant correlations among frames. In this paper, we propose STLight, a novel method for spatiotemporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers. STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together, using a single convolution to mix both types of features into a comprehensive spatiotemporal patch representation. This representation is then processed in a purely convolutional framework, capable of focusing simultaneously on the interaction among near and distant patches, and subsequently allowing for efficient reconstruction of the predicted frames. Our architecture achieves state-of-the-art performance on STL benchmarks across different datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs. The code is publicly available11https://github.com/AlfaranoAndrea/STLight/.

STLight: A Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal Joint Processing / Alfarano, A., Alfarano, A., Friso, L., Bacciu, A., Amerini, I., Silvestri, F.. - (2025), pp. 1090-1100. (2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 Tucson; USA ) [10.1109/wacv61041.2025.00114].

STLight: A Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal Joint Processing

Alfarano, Andrea;Alfarano, Alberto;Friso, Linda;Bacciu, Andrea;Amerini, Irene;Silvestri, Fabrizio

2025

Abstract

Spatio-Temporal predictive Learning is a self-supervised learning paradigm that enables models to identify spatial and temporal patterns by predicting future frames based on past frames. Traditional methods, which use recurrent neural networks to capture temporal patterns, have proven their effectiveness but come with high system complexity and computational demand. Convolutions could offer a more efficient alternative but are limited by their characteristic of treating all previous frames equally, resulting in poor temporal characterization, and by their local receptive field, limiting the capacity to capture distant correlations among frames. In this paper, we propose STLight, a novel method for spatiotemporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers. STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together, using a single convolution to mix both types of features into a comprehensive spatiotemporal patch representation. This representation is then processed in a purely convolutional framework, capable of focusing simultaneously on the interaction among near and distant patches, and subsequently allowing for efficient reconstruction of the predicted frames. Our architecture achieves state-of-the-art performance on STL benchmarks across different datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs. The code is publicly available11https://github.com/AlfaranoAndrea/STLight/.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
			
	Parole chiave
	
				spatio-temporal learning; video prediction
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				STLight: A Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal Joint Processing / Alfarano, A., Alfarano, A., Friso, L., Bacciu, A., Amerini, I., Silvestri, F.. - (2025), pp. 1090-1100. (2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 Tucson; USA ) [10.1109/wacv61041.2025.00114].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Alfarano_postprint_STLight_2025.pdf accesso aperto Note: DOI 10.1109/WACV61041.2025.00114 Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.02 MB Formato Adobe PDF	1.02 MB	Adobe PDF
Alfarano_STLight_2025.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 839.65 kB Formato Adobe PDF Contatta l'autore	839.65 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1739450

Citazioni

ND

3

3

social impact