In recent years, the spread of video sensor networks both in public and private areas has grown considerably. Smart algorithms for video semantic content understanding are increasingly developed to support human operators in monitoring different activities, by recognizing events that occur in the observed scene. With the term event, we refer to one or more actions performed by one or more subjects (e.g., people or vehicles) acting within the same observed area. When these actions are performed by subjects that do not interact with each other, the events are usually classified as simple. Instead, when any kind of interaction occurs among subjects, the involved events are typically classified as complex. This survey starts by providing the formal definitions of both scene and event, and the logical architecture for a generic event recognition system. Subsequently, it presents two taxonomies based on features and machine learning algorithms, respectively, which are used to describe the different approaches for the recognition of events within a video sequence. This paper also discusses key works of the current state-of-the-art of event recognition, providing the list of datasets used to evaluate the performance of reported methods for video content understanding.
Machine learning for video event recognition / Avola, D.; Cascio, M.; Cinque, L.; Foresti, G. L.; Pannone, D.. - In: INTEGRATED COMPUTER-AIDED ENGINEERING. - ISSN 1069-2509. - 28:3(2021), pp. 309-332. [10.3233/ica-210652]
Machine learning for video event recognition
Avola D.Primo
;Cascio M.;Cinque L.;Foresti G. L.;Pannone D.
2021
Abstract
In recent years, the spread of video sensor networks both in public and private areas has grown considerably. Smart algorithms for video semantic content understanding are increasingly developed to support human operators in monitoring different activities, by recognizing events that occur in the observed scene. With the term event, we refer to one or more actions performed by one or more subjects (e.g., people or vehicles) acting within the same observed area. When these actions are performed by subjects that do not interact with each other, the events are usually classified as simple. Instead, when any kind of interaction occurs among subjects, the involved events are typically classified as complex. This survey starts by providing the formal definitions of both scene and event, and the logical architecture for a generic event recognition system. Subsequently, it presents two taxonomies based on features and machine learning algorithms, respectively, which are used to describe the different approaches for the recognition of events within a video sequence. This paper also discusses key works of the current state-of-the-art of event recognition, providing the list of datasets used to evaluate the performance of reported methods for video content understanding.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.