Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.

A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection / Alfarano, A.; De Magistris, G.; Mongelli, L.; Russo, S.; Starczewski, J.; Napoli, C.. - 14126:(2023), pp. 3-16. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-42508-0_1].

A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection

De Magistris G.
Co-primo
Investigation
;
Russo S.
Co-primo
Conceptualization
;
Napoli C.
Ultimo
Supervision
2023

Abstract

Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.
2023
International Conference on Artificial Intelligence and Soft Computing
action recognition; ConvMixer; SuperImage; violence detection
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection / Alfarano, A.; De Magistris, G.; Mongelli, L.; Russo, S.; Starczewski, J.; Napoli, C.. - 14126:(2023), pp. 3-16. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-42508-0_1].
File allegati a questo prodotto
File Dimensione Formato  
Alfarano_A-novel_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 624.55 kB
Formato Adobe PDF
624.55 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1691853
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 0
social impact