TY - JOUR
T1 - An Auditory Motivated Asymmetric Compression Technique for Speech Recognition
AU - Haque, Serajul
AU - Togneri, Roberto
AU - Zaknich, Anthony
PY - 2011
Y1 - 2011
N2 - The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.
AB - The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.
U2 - 10.1109/TASL.2011.2112646
DO - 10.1109/TASL.2011.2112646
M3 - Article
SN - 1558-7916
VL - 19
SP - 2111
EP - 2124
JO - IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
JF - IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
IS - 7
ER -