An Auditory Motivated Asymmetric Compression Technique for Speech Recognition

Serajul Haque, Roberto Togneri, Anthony Zaknich

    Research output: Contribution to journalArticlepeer-review

    3 Citations (Scopus)

    Abstract

    The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.
    Original languageEnglish
    Pages (from-to)2111-2124
    JournalIEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
    Volume19
    Issue number7
    DOIs
    Publication statusPublished - 2011

    Fingerprint

    Dive into the research topics of 'An Auditory Motivated Asymmetric Compression Technique for Speech Recognition'. Together they form a unique fingerprint.

    Cite this