Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.

A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection / Alfarano, A.; De Magistris, G.; Mongelli, L.; Russo, S.; Starczewski, J.; Napoli, C.. - 14126:(2023), pp. 3-16. (Intervento presentato al convegno 22nd International Conference on Artificial Intelligence and Soft Computing, ICAISC 2023 tenutosi a pol) [10.1007/978-3-031-42508-0_1].

A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection

De Magistris G.
Co-primo
Investigation
;
Russo S.
Co-primo
Conceptualization
;
Napoli C.
Ultimo
Supervision
2023

Abstract

Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.
2023
22nd International Conference on Artificial Intelligence and Soft Computing, ICAISC 2023
Action Recognition; ConvMixer; SuperImage; Violence Detection
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection / Alfarano, A.; De Magistris, G.; Mongelli, L.; Russo, S.; Starczewski, J.; Napoli, C.. - 14126:(2023), pp. 3-16. (Intervento presentato al convegno 22nd International Conference on Artificial Intelligence and Soft Computing, ICAISC 2023 tenutosi a pol) [10.1007/978-3-031-42508-0_1].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1691853
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact