Action recognition in video sequences is an inter-esting field for many computer vision applications, includingbehaviour analysis, event recognition, and video surveillance.In this work, a method based on 2D skeleton and two-branchstacked Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) cells is proposed. Unlike 3D skeletons,usually generated by RGB-D cameras, the 2D skeletons adoptedin this work are reconstructed starting from RGB video streams,therefore allowing the use of the proposed approach in bothindoor and outdoor environments. Moreover, any case of missingskeletal data is managed by exploiting 3D-Convolutional NeuralNetworks (3D-CNNs). Comparative experiments with severalkey works on KTH and Weizmann datasets show that themethod described in this paper outperforms the current state-of-the-art. Additional experiments on UCF Sports and IXMASdatasets demonstrate the effectiveness of our method in thepresence of noisy data and perspective changes, respectively.Further investigations on UCF Sports, HMDB51, UCF101, andKinetics400 highlight how the combination between the proposedtwo-branch stacked LSTM and the 3D-CNN-based network canmanage missing skeleton information, greatly improving theoverall accuracy. Moreover, additional tests on KTH and UCFSports datasets also show the robustness of our approach in thepresence of partial body occlusions. Finally, comparisons on UT-Kinect and NTU-RGB+D datasets show that the accuracy of theproposed method is fully comparable to that of works based on3D skeletons.

2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs / Avola, Danilo; Cascio, Marco; Cinque, Luigi; Foresti, Gian Luca; Massaroni, Cristiano; Rodola, Emanuele. - In: IEEE TRANSACTIONS ON MULTIMEDIA. - ISSN 1520-9210. - (2020), pp. 1-1. [10.1109/TMM.2019.2960588]

2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs

Avola, Danilo;Cascio, Marco;Cinque, Luigi;Foresti, Gian Luca;Massaroni, Cristiano;Rodola, Emanuele
2020

Abstract

Action recognition in video sequences is an inter-esting field for many computer vision applications, includingbehaviour analysis, event recognition, and video surveillance.In this work, a method based on 2D skeleton and two-branchstacked Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) cells is proposed. Unlike 3D skeletons,usually generated by RGB-D cameras, the 2D skeletons adoptedin this work are reconstructed starting from RGB video streams,therefore allowing the use of the proposed approach in bothindoor and outdoor environments. Moreover, any case of missingskeletal data is managed by exploiting 3D-Convolutional NeuralNetworks (3D-CNNs). Comparative experiments with severalkey works on KTH and Weizmann datasets show that themethod described in this paper outperforms the current state-of-the-art. Additional experiments on UCF Sports and IXMASdatasets demonstrate the effectiveness of our method in thepresence of noisy data and perspective changes, respectively.Further investigations on UCF Sports, HMDB51, UCF101, andKinetics400 highlight how the combination between the proposedtwo-branch stacked LSTM and the 3D-CNN-based network canmanage missing skeleton information, greatly improving theoverall accuracy. Moreover, additional tests on KTH and UCFSports datasets also show the robustness of our approach in thepresence of partial body occlusions. Finally, comparisons on UT-Kinect and NTU-RGB+D datasets show that the accuracy of theproposed method is fully comparable to that of works based on3D skeletons.
2020
action recognition; 2D skeleton; recurrent neu-ral networks (RNNs); long short-term memory (LSTM)
01 Pubblicazione su rivista::01a Articolo in rivista
2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs / Avola, Danilo; Cascio, Marco; Cinque, Luigi; Foresti, Gian Luca; Massaroni, Cristiano; Rodola, Emanuele. - In: IEEE TRANSACTIONS ON MULTIMEDIA. - ISSN 1520-9210. - (2020), pp. 1-1. [10.1109/TMM.2019.2960588]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1350658
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 54
  • ???jsp.display-item.citation.isi??? 41
social impact