In this paper, we present LJ-TTS, a large-scale single-speaker dataset of real and synthetic speech designed to support research in text-to-speech (TTS) synthesis and analysis. The dataset builds upon high-quality recordings of a single English speaker, alongside outputs generated by 11 state-of-the-art TTS models, including both autoregressive and non-autoregressive architectures. By maintaining a controlled single-speaker setting, LJ-TTS enables precise comparison of speech characteristics across different generative models, isolating the effects of synthesis methods from speaker variability. Unlike multi-speaker datasets lacking alignment between real and synthetic samples, LJ-TTS provides exact utterance-level correspondence, allowing fine-grained analyses that are otherwise impractical. The dataset supports systematic evaluation of synthetic speech across multiple dimensions, including deepfake detection, source tracing, and phoneme-level analyses. LJ-TTS provides a standardized resource for benchmarking generative models, assessing the limits of current TTS systems, and developing robust detection and evaluation methods. The dataset is publicly available to the research community to foster reproducible and controlled studies in speech synthesis and synthetic speech detection.

LJ-TTS: A Paired Real and Synthetic Speech Dataset for Single-Speaker TTS Analysis / Negroni, Viola; Salvi, Davide; Comanducci, Luca; Wani, Taiba Majid; Uecker, Madleen; Amerini, Irene; Tubaro, Stefano; Bestagini, Paolo. - In: ELECTRONICS. - ISSN 2079-9292. - 15:1(2026). [10.3390/electronics15010169]

LJ-TTS: A Paired Real and Synthetic Speech Dataset for Single-Speaker TTS Analysis

Wani, Taiba Majid;Amerini, Irene;
2026

Abstract

In this paper, we present LJ-TTS, a large-scale single-speaker dataset of real and synthetic speech designed to support research in text-to-speech (TTS) synthesis and analysis. The dataset builds upon high-quality recordings of a single English speaker, alongside outputs generated by 11 state-of-the-art TTS models, including both autoregressive and non-autoregressive architectures. By maintaining a controlled single-speaker setting, LJ-TTS enables precise comparison of speech characteristics across different generative models, isolating the effects of synthesis methods from speaker variability. Unlike multi-speaker datasets lacking alignment between real and synthetic samples, LJ-TTS provides exact utterance-level correspondence, allowing fine-grained analyses that are otherwise impractical. The dataset supports systematic evaluation of synthetic speech across multiple dimensions, including deepfake detection, source tracing, and phoneme-level analyses. LJ-TTS provides a standardized resource for benchmarking generative models, assessing the limits of current TTS systems, and developing robust detection and evaluation methods. The dataset is publicly available to the research community to foster reproducible and controlled studies in speech synthesis and synthetic speech detection.
2026
audio forensics; deepfake detection; generative AI; speech processing; synthetic speech; text-to-speech
01 Pubblicazione su rivista::01a Articolo in rivista
LJ-TTS: A Paired Real and Synthetic Speech Dataset for Single-Speaker TTS Analysis / Negroni, Viola; Salvi, Davide; Comanducci, Luca; Wani, Taiba Majid; Uecker, Madleen; Amerini, Irene; Tubaro, Stefano; Bestagini, Paolo. - In: ELECTRONICS. - ISSN 2079-9292. - 15:1(2026). [10.3390/electronics15010169]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1765781
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact