The target duration of a synthesized human motion is a critical attribute that requires modeling control over the motion dynamics and style. Speeding up an action performance is not merely fast-forwarding it. However, state-of-the-art techniques for human behavior synthesis have limited control over the target sequence length. We introduce the problem of generating length-aware 3D human motion sequences from textual descriptors, and we propose a novel model to synthesize motions of variable target lengths, which we dub "Length-Aware Latent Diffusion” (LADiff). LADiff consists of two new modules: 1) a length-aware variational auto-encoder to learn motion representations with length-dependent latent codes; 2) a length-conforming latent diffusion model to generate motions with a richness of details that increases with the required target sequence length. LADiff significantly improves over the state-of-the-art across most of the existing motion synthesis metrics on the two established benchmarks of HumanML3D and KIT-ML. The code will be open-sourced.

Length-Aware Motion Synthesis via Latent Diffusion / Sampieri, Alessio; Palma, Alessio; Spinelli, Indro; Galasso, Fabio. - 15111 LNCS:(2025), pp. 107-124. ( 18th European Conference on Computer Vision, ECCV 2024 Milan; Italy ) [10.1007/978-3-031-73668-1_7].

Length-Aware Motion Synthesis via Latent Diffusion

Alessio Sampieri
;
Alessio Palma;Indro Spinelli;Fabio Galasso
2025

Abstract

The target duration of a synthesized human motion is a critical attribute that requires modeling control over the motion dynamics and style. Speeding up an action performance is not merely fast-forwarding it. However, state-of-the-art techniques for human behavior synthesis have limited control over the target sequence length. We introduce the problem of generating length-aware 3D human motion sequences from textual descriptors, and we propose a novel model to synthesize motions of variable target lengths, which we dub "Length-Aware Latent Diffusion” (LADiff). LADiff consists of two new modules: 1) a length-aware variational auto-encoder to learn motion representations with length-dependent latent codes; 2) a length-conforming latent diffusion model to generate motions with a richness of details that increases with the required target sequence length. LADiff significantly improves over the state-of-the-art across most of the existing motion synthesis metrics on the two established benchmarks of HumanML3D and KIT-ML. The code will be open-sourced.
2025
18th European Conference on Computer Vision, ECCV 2024
computer vision, machine learning, generative artificial intelligence, motion synthesis
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Length-Aware Motion Synthesis via Latent Diffusion / Sampieri, Alessio; Palma, Alessio; Spinelli, Indro; Galasso, Fabio. - 15111 LNCS:(2025), pp. 107-124. ( 18th European Conference on Computer Vision, ECCV 2024 Milan; Italy ) [10.1007/978-3-031-73668-1_7].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726561
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact