CNNViT. A robust deep neural network for video anomaly detection

Garuda, Nikhil; Prasad, Gokul; Prabhu Prasad Dev,; Das, Pranesh; Ghaderpour, Ebrahim

doi:10.1049/icp.2024.0461

Detecting anomalies in videos poses a significant challenge due to the unbounded, infrequent, ambiguous, and irregular nature of abnormal events in real-world scenes. Recently, transformers have shown remarkable modeling capabilities for sequential data. As a result, we endeavor to leverage transformers for video anomaly detection. This paper presents a novel prediction-based method for video anomaly detection called CNNViT by integrating the architectural elements of Convolutional Neural Network (CNN) and Vision Transformer (ViT). The purpose of this fusion is to effectively capture enhanced spatial-temporal information and global features. The effectiveness of the proposed method is evaluated on UCSD Ped2 and CUHK Avenue benchmark datasets. Experimental results demonstrate that the proposed method attains considerably superior performance compared to state-of-the-art techniques.

CNNViT. A robust deep neural network for video anomaly detection / Garuda, Nikhil; Prasad, Gokul; Prasad Dev, Prabhu; Das, Pranesh; Ghaderpour, Ebrahim. - 2023:39(2023), pp. 13-22. (Intervento presentato al convegno 4th International Conference on Distributed Sensing and Intelligent Systems (ICDSIS 2023) tenutosi a Dubai, UAE) [10.1049/icp.2024.0461].

CNNViT. A robust deep neural network for video anomaly detection

Nikhil Garuda^Primo;Gokul Prasad;Prabhu Prasad Dev;Pranesh Das;Ebrahim Ghaderpour^Ultimo

2023

Abstract

Detecting anomalies in videos poses a significant challenge due to the unbounded, infrequent, ambiguous, and irregular nature of abnormal events in real-world scenes. Recently, transformers have shown remarkable modeling capabilities for sequential data. As a result, we endeavor to leverage transformers for video anomaly detection. This paper presents a novel prediction-based method for video anomaly detection called CNNViT by integrating the architectural elements of Convolutional Neural Network (CNN) and Vision Transformer (ViT). The purpose of this fusion is to effectively capture enhanced spatial-temporal information and global features. The effectiveness of the proposed method is evaluated on UCSD Ped2 and CUHK Avenue benchmark datasets. Experimental results demonstrate that the proposed method attains considerably superior performance compared to state-of-the-art techniques.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				4th International Conference on Distributed Sensing and Intelligent Systems (ICDSIS 2023)
			
	Parole chiave
	
				Anomaly Detection; Deep learning; Convolutional Neural Network; Transformer
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04c Atto di convegno in rivista
			
	Citazione
	
				CNNViT. A robust deep neural network for video anomaly detection / Garuda, Nikhil; Prasad, Gokul; Prasad Dev, Prabhu; Das, Pranesh; Ghaderpour, Ebrahim. - 2023:39(2023), pp. 13-22. (Intervento presentato al  convegno 4th International Conference on Distributed Sensing and Intelligent Systems (ICDSIS 2023) tenutosi a Dubai, UAE) [10.1049/icp.2024.0461].
			
	Appartiene alla tipologia:
	
				04c Atto di convegno in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Garuda_CNNViT_2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.72 MB Formato Adobe PDF Contatta l'autore	2.72 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1723771

Citazioni

ND

0

ND

Catalogo dei prodotti della ricerca