Data Augmentation with ECAPA-TDNN Architecture for Automatic Speaker Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper focuses on seven data augmentation methods based on the Emphasized Channel Attention Propagation and Aggregation-Time Delay Neural Network (ECAPA-TDNN) model for increasing the diversity of training data to improve model accuracy and true positive rate (TPR/recall). We propose a method to improve classification performance by replacing and reducing the datasets. We also verified the effect of the number of layers on the classification performance by modifying the number of layers of the SE-Res2Block in the ECAPA-TDNN model. The proposed method is validated with the ZhVoice and VoxCeleb datasets, and the results show that the best model accuracy and classification performance can be obtained by using ZhVoice with seven data augmentations on a 3-layer SE-Res2Block. The accuracy reached 0.9477, the TPR reached 0.8945, and the EER was 0.1278. We also used the diagonal cosine algorithm to determine the similarity between two speakers, validating the classification performance of the model.

Original languageEnglish
Title of host publication12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages414-420
Number of pages7
ISBN (Electronic)9798350337938
DOIs
Publication statusPublished - 2023
Event12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023 - Oshawa, Canada
Duration: 29 Aug 20231 Sept 2023

Publication series

Name12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023

Conference

Conference12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023
Country/TerritoryCanada
CityOshawa
Period29/08/231/09/23

Keywords

  • ECAPA-TDNN
  • automatic speaker recognition
  • data augmentation

Fingerprint

Dive into the research topics of 'Data Augmentation with ECAPA-TDNN Architecture for Automatic Speaker Recognition'. Together they form a unique fingerprint.

Cite this