Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.
A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection / Alfarano, A.; De Magistris, G.; Mongelli, L.; Russo, S.; Starczewski, J.; Napoli, C.. - 14126:(2023), pp. 3-16. (Intervento presentato al convegno International Conference on Artificial Intelligence and Soft Computing tenutosi a Zakopane; Poland) [10.1007/978-3-031-42508-0_1].
A Novel ConvMixer Transformer Based Architecture for Violent Behavior Detection
De Magistris G.Co-primo
Investigation
;Russo S.Co-primo
Conceptualization
;Napoli C.
Ultimo
Supervision
2023
Abstract
Nowadays most of the streets, squares and buildings are monitored by a large number of surveillance cameras. Nevertheless, these cameras are used only to record scenes to be analyzed after crimes or thefts, and not to prevent violent actions in an automatic way. In few cases there may be a guard who checks the videos manually in real-time, but it is a very inefficient and expensive process. In this paper we proposes a novel approach to Violence Detection task using a recent architecture named ConvMixer, a simple CNN which uses patch-based embeddings in order to obtain superior performance with fewer parameters and computation resources. We also use an interesting technique that consists in arranging frames into super images to encode the temporal information into the spatial dimensions. Our tests on popular “Real Life Violence Situations” dataset highlight a remarkable accuracy of 0.95, placing our proposed model at the second position of the leader board on the same dataset.File | Dimensione | Formato | |
---|---|---|---|
Alfarano_A-novel_2023.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
624.55 kB
Formato
Adobe PDF
|
624.55 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.