Field robots operating in agricultural and urban environments must interpret the world from noisy and partial observations acquired under changing appearance, occlusions, and platform motion. This thesis investigates how to design spatio- temporal perception pipelines that extend beyond frame-wise recognition, with particular emphasis on multi-object tracking and on representations that can be directly consumed by downstream decision-making modules. The dissertation is organized around three fundamental gaps adressed in spatio- temporal perception for robotics: the Data Gap, arising from limited supervision and covariate shift across domains; the Association Gap, concerning identity preservation under ambiguity and prolonged occlusion; and the Decision Gap, referring to the difficulty of transforming perception outputs into action-ready state representations. To address the Data Gap, the thesis develops target-domain adaptation strategies that exploit geometric and temporal consistency to reduce manual annotation requirements. These include automatic label propagation across time and geometry- aware synthetic data generation pipelines for agricultural imagery. To address the Association Gap, two complementary directions are pursued. First, motion-centric and layout-aware 2D tracking methods are introduced for visually homogeneous crop environments. Second, a probabilistic 3D multi-object tracker is formulated as a factor-graph optimization problem, where geometric constraints and appearance embeddings are integrated within a unified inference objective, enabling globally consistent data association across time. To bridge perception and action, the thesis proposes a semantic mapping and decision-making solution for selective table grape harvesting. Tracked observations are fused into persistent object hypotheses enriched with geometric and quality attributes. A reachability-graph formulation supports non-greedy action selection under accessibility constraints and time-varying quality estimates. The proposed methodologies are validated through extensive field campaigns in precision agriculture and through cross-domain evaluation in urban mapping scenarios, enhancing the spatio-temporal perception capabilities of robotic systems operating in unstructured environments.

Spatio-temporal perception in field robotics: robust tracking and mapping under data scarcity / Saraceni, L.. - (2026 May 21).

Spatio-temporal perception in field robotics: robust tracking and mapping under data scarcity

SARACENI, LEONARDO
21/05/2026

Abstract

Field robots operating in agricultural and urban environments must interpret the world from noisy and partial observations acquired under changing appearance, occlusions, and platform motion. This thesis investigates how to design spatio- temporal perception pipelines that extend beyond frame-wise recognition, with particular emphasis on multi-object tracking and on representations that can be directly consumed by downstream decision-making modules. The dissertation is organized around three fundamental gaps adressed in spatio- temporal perception for robotics: the Data Gap, arising from limited supervision and covariate shift across domains; the Association Gap, concerning identity preservation under ambiguity and prolonged occlusion; and the Decision Gap, referring to the difficulty of transforming perception outputs into action-ready state representations. To address the Data Gap, the thesis develops target-domain adaptation strategies that exploit geometric and temporal consistency to reduce manual annotation requirements. These include automatic label propagation across time and geometry- aware synthetic data generation pipelines for agricultural imagery. To address the Association Gap, two complementary directions are pursued. First, motion-centric and layout-aware 2D tracking methods are introduced for visually homogeneous crop environments. Second, a probabilistic 3D multi-object tracker is formulated as a factor-graph optimization problem, where geometric constraints and appearance embeddings are integrated within a unified inference objective, enabling globally consistent data association across time. To bridge perception and action, the thesis proposes a semantic mapping and decision-making solution for selective table grape harvesting. Tracked observations are fused into persistent object hypotheses enriched with geometric and quality attributes. A reachability-graph formulation supports non-greedy action selection under accessibility constraints and time-varying quality estimates. The proposed methodologies are validated through extensive field campaigns in precision agriculture and through cross-domain evaluation in urban mapping scenarios, enhancing the spatio-temporal perception capabilities of robotic systems operating in unstructured environments.
21-mag-2026
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Saraceni.pdf

accesso aperto

Note: tesi completa
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 28.97 MB
Formato Adobe PDF
28.97 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1769985
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact