Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain.
Visual Localization Domain for Accurate V-SLAM from Stereo Cameras / Di Salvo, E.; Bellucci, S.; Celidonio, V.; Rossini, I.; Colonnese, S.; Cattai, T.. - In: SENSORS. - ISSN 1424-8220. - 25:3(2025). [10.3390/s25030739]
Visual Localization Domain for Accurate V-SLAM from Stereo Cameras
Di Salvo E.;Colonnese S.;Cattai T.
2025
Abstract
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain.| File | Dimensione | Formato | |
|---|---|---|---|
|
Di Salvo_Visual Localization Domain_2025.pdf
accesso aperto
Note: Manuscript
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Creative commons
Dimensione
6.43 MB
Formato
Adobe PDF
|
6.43 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


