Single Channel Speech Denoising
No Thumbnail Available
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech enhancement and noise reduction have wide applications in speech processing. They are often employed as pre-processing stage in various applications. The work to be done in this project is: 1. Denoising a single-channel speech signal in the presence of a highly non-stationary background noise in order to improve the perceptible quality and intelligibility of the speech. 2. Blind Multi-speaker speech separation of an arbitrary number of speakers (without knowledge about the actual speakers as speech sources) given just two anechoic mixtures provided the assumption which we call Approximate W-disjoint Orthogonality. Two points are often required to be considered in signal denoising applications: eliminating the undesired noise to improve the signal to noise ratio (SNR) and preserving the shape and characteristics of the original signal. Real world noise is mostly highly non-stationary and does not affect the speech signal uniformly over the spectrum. This project explores a set of DFT-based algorithms as single-channel preprocessing techniques which are as follows: · Spectral Subtraction using over-subtraction and spectral floor. · Multi-Band Spectral Subtraction (MBSS). · Wiener Filter. · MMSE of Short-Time Spectral Amplitude (MMSE-STSA) estimator with, and without using SPU modifier. · MMSE Log-Spectral Amplitude Estimator with, and without using SPU modifier. · Optimally-Modified Log-Spectral Amplitude estimator (OM-LSA). All the implemented algorithms provide considerable, different degrees of flexibility and control on noise elimination levels that reduces artifacts in the enhanced speech, resulting in the improved quality, and intelligibility. The comparison study results based on subjective and objective tests showed that the Optimally Modified Log-Spectral Amplitude Estimator (OMLSA) method outperforms all the implemented DFT-based single-channel speech enhancement algorithms. The technique used for the Blind Multi-speaker speech separation is based on the Degenerate Unmixing and Estimation Technique (DUET) which constructs estimates of the relative mixing parameters associated with each signal by taking the ratio of time-frequency representations of two mixtures. If the sources in each mixture are W-disjoint orthogonal, that means only one signal is active in the time-frequency plane at a given time-frequency or no signal at all. We have implemented and tested the behavior of the DUET technique on artificial instantaneous speech mixtures of different number of speakers, and they could be separated perfectly according to the performed evaluation tests (objective and subjective)
Description
Keywords
Bruit électrique, Démence présénile, Presenile dementia, Electric noise
