Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.

MS-faster R-CNN: multi-stream backbone for improved faster R-CNN object detection and aerial tracking from UAV images / Avola, D.; Cinque, L.; Diko, A.; Fagioli, A.; Foresti, G. L.; Mecca, A.; Pannone, D.; Piciarelli, C.. - In: REMOTE SENSING. - ISSN 2072-4292. - 13:9(2021), pp. 1-18. [10.3390/rs13091670]

MS-faster R-CNN: multi-stream backbone for improved faster R-CNN object detection and aerial tracking from UAV images

Avola D.
Primo
;
Cinque L.;Diko A.;Fagioli A.;Foresti G. L.;Mecca A.;Pannone D.;
2021

Abstract

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.
2021
aerial images; deep learning; object detection; tracking; UAV
01 Pubblicazione su rivista::01a Articolo in rivista
MS-faster R-CNN: multi-stream backbone for improved faster R-CNN object detection and aerial tracking from UAV images / Avola, D.; Cinque, L.; Diko, A.; Fagioli, A.; Foresti, G. L.; Mecca, A.; Pannone, D.; Piciarelli, C.. - In: REMOTE SENSING. - ISSN 2072-4292. - 13:9(2021), pp. 1-18. [10.3390/rs13091670]
File allegati a questo prodotto
File Dimensione Formato  
Avola_MS-faster__2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553510
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 48
  • ???jsp.display-item.citation.isi??? 40
social impact