The distinction between synthetic and human voice uses the techniques of the current biometric voice recognition systems, which prevent that a person’s voice, no matter if with good or bad intentions, can be confused with someone else’s. Steganography gives the possibility to hide in a file without a particular value (usually audio, video or image files) a hidden message in such a way as to not rise suspicion to any external observer. This article suggests two methods, applicable in a VoIP hypothetical scenario, which allow us to distinguish a synthetic speech from a human voice, and to insert within the Comfort Noise a text message generated in the pauses of a voice conversation. The first method takes up the studies already carried out for the Modulation Features related to the temporal analysis of the speech signals, while the second one proposes a technique that derives from the Direct Sequence Spread Spectrum, which consists in distributing the signal energy to hide on a wider band transmission. Due to space limits, this paper is only an extended abstract. The full version will contain further details on our research.
Synthetic speech detection and audio steganography in VoIP scenarios / Capolupo, Daniele; D'Amore, Fabrizio. - STAMPA. - 9569:(2016), pp. 145-159. (Intervento presentato al convegno 14th International Workshop on Digital-Forensics and Watermarking, IWDW 2015 tenutosi a Tokyo; Japan nel 2015) [10.1007/978-3-319-31960-5_13].
Synthetic speech detection and audio steganography in VoIP scenarios
D'AMORE, Fabrizio
2016
Abstract
The distinction between synthetic and human voice uses the techniques of the current biometric voice recognition systems, which prevent that a person’s voice, no matter if with good or bad intentions, can be confused with someone else’s. Steganography gives the possibility to hide in a file without a particular value (usually audio, video or image files) a hidden message in such a way as to not rise suspicion to any external observer. This article suggests two methods, applicable in a VoIP hypothetical scenario, which allow us to distinguish a synthetic speech from a human voice, and to insert within the Comfort Noise a text message generated in the pauses of a voice conversation. The first method takes up the studies already carried out for the Modulation Features related to the temporal analysis of the speech signals, while the second one proposes a technique that derives from the Direct Sequence Spread Spectrum, which consists in distributing the signal energy to hide on a wider band transmission. Due to space limits, this paper is only an extended abstract. The full version will contain further details on our research.File | Dimensione | Formato | |
---|---|---|---|
Capolupo_Postprint_Synthetic speech_2016.pdf
accesso aperto
Note: https://link.springer.com/chapter/10.1007/978-3-319-31960-5_13
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
3.78 MB
Formato
Adobe PDF
|
3.78 MB | Adobe PDF | |
Capolupo_Synthetic speech_2016.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
5.56 MB
Formato
Adobe PDF
|
5.56 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.