Voice Spoofing Detection Corpus (VSDC)

Dataset Description

Maintained and created by Roland Baumann, Khalid Malik, Ali Javed, Andersen Ball, Brandon Kujawa and Hafiz Malik.

This dataset is created to be a standard dataset containing original audio files from varying environments and microphones, as well as spoofed audio files generated through diverse controlled environments, mocking realistic scenarios. This data can be used to analyze and generate counter measures to prevent audio spoofing attacks of the replay category. The rapidly growing field of Internet of Things (IoT) devices has made audio replay attacks more relevant, as an IoT device can be used as a point of replay to another device. This creates an ideal environment for replay attacks to be carried out. This data set was created with a focus of imitating these attacks in a controlled setup. While the focus of this dataset is on detecting multihop scenarios, it is not limited to this purpose. For example it can be used as a means to study traditional replay attacks, the effect different microphones and environments have on an audio, or how an individual's vocal range affects the accuracy of a voice control system.
Download Dataset