Text-Independent Speaker Identification using Mel-Frequency Energy Coefficients and Convolutional Neural Networks
No Thumbnail Available
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Automatic Speaker Identification (ASI) is a biometric technique, which had achieved reliability in real applications, with standard feature extraction methods such as Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP), and modeling methods such as Gaussian mixture model (GMM), etc. However, the success of these manual approaches was quickly hampered by the emergence of big data, and the inability of scientists to manipulate large amounts of data, which led researchers to move towards automatic methods such as deep neural networks. In this work, a Convolutional Neural Network (CNN) is suggested for speaker identification in text-independent mode. Mel-Frequency Energy Coefficients (MFEC) method was used for extracting the characteristics of audio signals and the obtained coefficients were injected into the convolutional neural network model for classification (identification). In addition, a comparison was made between the proposed method and the existing traditional methods. Experimental results show that the proposed structure resulted in a speaker identification rate of 97.89%, which is much higher than the rates obtained in the old state of the art methods.
Description
Keywords
Automatic Speaker Identification (ASI), Mel-Frequency Energy Coefficients (MFEC), Convolutional Neural Network (CNN)
