Improved voice-based biometrics using multi-channel transfer learning
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Digital Library
Abstract
Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since
most personal and professional appliances rely on voice commands or speech in general terms to operate.
These systems need to discern the identity of the speaker rather than just the words that have been said to
be both smart and safe. Especially if we consider the numerous advanced methods that have been
developed to generate fake speech segments. The objective of this paper is to improve upon the existing
voice-based biometrics to keep up with these synthesizers.
The proposed method focuses on defining a novel and more speaker adapted features by implying artificial
neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from
two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics
features are paired to provide both information about how the speech segments are perceived
(type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large
closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach
to extract speaker adapted features whether for classical machine learning algorithms or advanced neural
structures such as LSTM or CNN
Description
Keywords
Speech Analysis, Transfer Learning, Pattern Recognition, Speaker Recognition, Feature Extraction
