This paper presents a novel approach for multi-view hand pose recognition through image embeddings and hand landmarks. The method integrates raw image data with structural hand landmarks derived from the Leap Motion Controller 2. A Vision Transformer (ViT) pretrained model was used to extract visual features from dual-view grayscale images, which were fused with the corresponding Leap 2 hand landmarks, creating a multimodal representation that encapsulates both visual and landmark data for each sample. These fused embeddings were then classified using a multi-layer perceptron to distinguish among 17 distinct hand poses from the Multi-view Leap2 Hand Pose Dataset, which includes data from 21 subjects. Using a Leave-OneSubject-Out Cross-Validation (LOSO-CV) strategy, we demonstrate that this fusion approach offers a robust recognition performance (F1 Score of 79.33 ± 0.09 %), particularly in scenarios where hand occlusions or challenging angles may limit the utility of single-modality data.

Towards Multi-View Hand Pose Recognition Using a Fusion of Image Embeddings and Leap 2 Landmarks / Esteban-Romero, Sergio; Lanzino, Romeo; Marini, Marco; Gil-Martín, Manuel. - 3:(2025), pp. 918-925. ( 17th International Conference on Agents and Artificial Intelligence, ICAART 2025 Porto; Portugal ) [10.5220/0013234300003890].

Towards Multi-View Hand Pose Recognition Using a Fusion of Image Embeddings and Leap 2 Landmarks

Lanzino, Romeo;Marini, Marco;
2025

Abstract

This paper presents a novel approach for multi-view hand pose recognition through image embeddings and hand landmarks. The method integrates raw image data with structural hand landmarks derived from the Leap Motion Controller 2. A Vision Transformer (ViT) pretrained model was used to extract visual features from dual-view grayscale images, which were fused with the corresponding Leap 2 hand landmarks, creating a multimodal representation that encapsulates both visual and landmark data for each sample. These fused embeddings were then classified using a multi-layer perceptron to distinguish among 17 distinct hand poses from the Multi-view Leap2 Hand Pose Dataset, which includes data from 21 subjects. Using a Leave-OneSubject-Out Cross-Validation (LOSO-CV) strategy, we demonstrate that this fusion approach offers a robust recognition performance (F1 Score of 79.33 ± 0.09 %), particularly in scenarios where hand occlusions or challenging angles may limit the utility of single-modality data.
2025
17th International Conference on Agents and Artificial Intelligence, ICAART 2025
Deep Learning; Leap Motion Controller 2; Multi-View Hand Pose Recognition; Multimodal Data; Multimodal Fusion
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Towards Multi-View Hand Pose Recognition Using a Fusion of Image Embeddings and Leap 2 Landmarks / Esteban-Romero, Sergio; Lanzino, Romeo; Marini, Marco; Gil-Martín, Manuel. - 3:(2025), pp. 918-925. ( 17th International Conference on Agents and Artificial Intelligence, ICAART 2025 Porto; Portugal ) [10.5220/0013234300003890].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1760643
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact