Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information's specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.

On the Evaluation of Video-Based Crowd Counting Models / Ledda, E.; Putzu, L.; Delussu, R.; Fumera, G.; Roli, F.. - 13233:(2022), pp. 301-311. ( 21st International Conference on Image Analysis and Processing Lecce; Italy ) [10.1007/978-3-031-06433-3_26].

On the Evaluation of Video-Based Crowd Counting Models

Ledda E.
;
2022

Abstract

Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information's specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.
2022
21st International Conference on Image Analysis and Processing
Video-based crowd counting and density estimation; Spatial-temporal information
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
On the Evaluation of Video-Based Crowd Counting Models / Ledda, E.; Putzu, L.; Delussu, R.; Fumera, G.; Roli, F.. - 13233:(2022), pp. 301-311. ( 21st International Conference on Image Analysis and Processing Lecce; Italy ) [10.1007/978-3-031-06433-3_26].
File allegati a questo prodotto
File Dimensione Formato  
Ledda_On-The-Evaluation_2022.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 894.91 kB
Formato Adobe PDF
894.91 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1671729
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 2
social impact