Data Augmentation with ECAPA-TDNN Architecture for Automatic Speaker Recognition

研究成果: Conference contribution同行評審

摘要

This paper focuses on seven data augmentation methods based on the Emphasized Channel Attention Propagation and Aggregation-Time Delay Neural Network (ECAPA-TDNN) model for increasing the diversity of training data to improve model accuracy and true positive rate (TPR/recall). We propose a method to improve classification performance by replacing and reducing the datasets. We also verified the effect of the number of layers on the classification performance by modifying the number of layers of the SE-Res2Block in the ECAPA-TDNN model. The proposed method is validated with the ZhVoice and VoxCeleb datasets, and the results show that the best model accuracy and classification performance can be obtained by using ZhVoice with seven data augmentations on a 3-layer SE-Res2Block. The accuracy reached 0.9477, the TPR reached 0.8945, and the EER was 0.1278. We also used the diagonal cosine algorithm to determine the similarity between two speakers, validating the classification performance of the model.

原文English
主出版物標題12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023
發行者Institute of Electrical and Electronics Engineers Inc.
頁面414-420
頁數7
ISBN(電子)9798350337938
DOIs
出版狀態Published - 2023
事件12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023 - Oshawa, Canada
持續時間: 29 8月 20231 9月 2023

出版系列

名字12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023

Conference

Conference12th IEEE International Conference on Renewable Energy Research and Applications, ICRERA 2023
國家/地區Canada
城市Oshawa
期間29/08/231/09/23

指紋

深入研究「Data Augmentation with ECAPA-TDNN Architecture for Automatic Speaker Recognition」主題。共同形成了獨特的指紋。

引用此