A deep understanding of pedestrian intention and crossing behaviors is crucial in applications like pedestrian attribute recognition and autonomous driving. While vehicles need to predict the movements of pedestrians accurately for safety, the recognition and re-identification systems rely on behavioral cues that help them enhance identity tracking and attribute analysis. Traditional trajectory-based methods for pedestrian intention estimation evaluate the future positions of pedestrians based on their past movements but may fail to capture their true intentions. A more effective approach will anticipate actions by analyzing underlying intent, improving the precision of pedestrian recognition and the motion prediction. Current research on estimating pedestrian intentions primarily depends on supervised learning methods. In contrast, this work introduces an unsupervised learning approach to learn intention representations. This method is based on the idea that similar intentions lead to comparable behaviors among pedestrians, and, therefore, they can be clustered. To achieve this, this paper introduces UnPIE, an unsupervised method for predicting pedestrian intentions. It utilizes Spatio-Temporal Graph Convolutional Networks to encode intentions from videos and map them into a D-dimensional latent space. The training phase incorporates Instance Recognition to increase separation between embeddings from different classes and Local Aggregation to form soft clusters of related embeddings. A supervised non-parametric classifier is used to evaluate the performance of the method. The results demonstrate that UnPIE has comparable performance with respect to supervised approaches and even surpasses them, achieving a higher Precision by about 7% on the Pedestrian Intention Estimation dataset.

Unsupervised pedestrian intention estimation through deep neural embeddings and spatio-temporal graph convolutional networks / Scaccia, S.; Pro, F.; Amerini, I.. - In: PATTERN ANALYSIS AND APPLICATIONS. - ISSN 1433-7541. - 28:2(2025). [10.1007/s10044-025-01483-0]

Unsupervised pedestrian intention estimation through deep neural embeddings and spatio-temporal graph convolutional networks

Pro F.
;
Amerini I.
2025

Abstract

A deep understanding of pedestrian intention and crossing behaviors is crucial in applications like pedestrian attribute recognition and autonomous driving. While vehicles need to predict the movements of pedestrians accurately for safety, the recognition and re-identification systems rely on behavioral cues that help them enhance identity tracking and attribute analysis. Traditional trajectory-based methods for pedestrian intention estimation evaluate the future positions of pedestrians based on their past movements but may fail to capture their true intentions. A more effective approach will anticipate actions by analyzing underlying intent, improving the precision of pedestrian recognition and the motion prediction. Current research on estimating pedestrian intentions primarily depends on supervised learning methods. In contrast, this work introduces an unsupervised learning approach to learn intention representations. This method is based on the idea that similar intentions lead to comparable behaviors among pedestrians, and, therefore, they can be clustered. To achieve this, this paper introduces UnPIE, an unsupervised method for predicting pedestrian intentions. It utilizes Spatio-Temporal Graph Convolutional Networks to encode intentions from videos and map them into a D-dimensional latent space. The training phase incorporates Instance Recognition to increase separation between embeddings from different classes and Local Aggregation to form soft clusters of related embeddings. A supervised non-parametric classifier is used to evaluate the performance of the method. The results demonstrate that UnPIE has comparable performance with respect to supervised approaches and even surpasses them, achieving a higher Precision by about 7% on the Pedestrian Intention Estimation dataset.
2025
Graph convolutional networks; Pedestrian intention estimation; Unsupervised learning; Video classification
01 Pubblicazione su rivista::01a Articolo in rivista
Unsupervised pedestrian intention estimation through deep neural embeddings and spatio-temporal graph convolutional networks / Scaccia, S.; Pro, F.; Amerini, I.. - In: PATTERN ANALYSIS AND APPLICATIONS. - ISSN 1433-7541. - 28:2(2025). [10.1007/s10044-025-01483-0]
File allegati a questo prodotto
File Dimensione Formato  
Scaccia_Unsupervised_2025.pdf

accesso aperto

Note: DOI 10.1007/s10044-025-01483-0
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 2.46 MB
Formato Adobe PDF
2.46 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1741900
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact