Text-Independent Speaker Identification using Mel-Frequency Energy Coefficients and Convolutional Neural Networks

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Automatic Speaker Identification (ASI) is a biometric technique, which had achieved reliability in real applications, with standard feature extraction methods such as Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP), and modeling methods such as Gaussian mixture model (GMM), etc. However, the success of these manual approaches was quickly hampered by the emergence of big data, and the inability of scientists to manipulate large amounts of data, which led researchers to move towards automatic methods such as deep neural networks. In this work, a Convolutional Neural Network (CNN) is suggested for speaker identification in text-independent mode. Mel-Frequency Energy Coefficients (MFEC) method was used for extracting the characteristics of audio signals and the obtained coefficients were injected into the convolutional neural network model for classification (identification). In addition, a comparison was made between the proposed method and the existing traditional methods. Experimental results show that the proposed structure resulted in a speaker identification rate of 97.89%, which is much higher than the rates obtained in the old state of the art methods.

Description

Keywords

Automatic Speaker Identification (ASI), Mel-Frequency Energy Coefficients (MFEC), Convolutional Neural Network (CNN)

Citation

Endorsement

Review

Supplemented By

Referenced By