Action recognition in video sequences is an inter-esting field for many computer vision applications, includingbehaviour analysis, event recognition, and video surveillance.In this work, a method based on 2D skeleton and two-branchstacked Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) cells is proposed. Unlike 3D skeletons,usually generated by RGB-D cameras, the 2D skeletons adoptedin this work are reconstructed starting from RGB video streams,therefore allowing the use of the proposed approach in bothindoor and outdoor environments. Moreover, any case of missingskeletal data is managed by exploiting 3D-Convolutional NeuralNetworks (3D-CNNs). Comparative experiments with severalkey works on KTH and Weizmann datasets show that themethod described in this paper outperforms the current state-of-the-art. Additional experiments on UCF Sports and IXMASdatasets demonstrate the effectiveness of our method in thepresence of noisy data and perspective changes, respectively.Further investigations on UCF Sports, HMDB51, UCF101, andKinetics400 highlight how the combination between the proposedtwo-branch stacked LSTM and the 3D-CNN-based network canmanage missing skeleton information, greatly improving theoverall accuracy. Moreover, additional tests on KTH and UCFSports datasets also show the robustness of our approach in thepresence of partial body occlusions. Finally, comparisons on UT-Kinect and NTU-RGB+D datasets show that the accuracy of theproposed method is fully comparable to that of works based on3D skeletons.
2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs / Avola, Danilo; Cascio, Marco; Cinque, Luigi; Foresti, Gian Luca; Massaroni, Cristiano; Rodola, Emanuele. - In: IEEE TRANSACTIONS ON MULTIMEDIA. - ISSN 1520-9210. - (2020), pp. 1-1. [10.1109/TMM.2019.2960588]