Perceptual features for automatic speech recognition in noisy environments

Serajul Haque, Roberto Togneri, Anthony Zaknich

    Research output: Contribution to journalArticlepeer-review

    32 Citations (Scopus)

    Abstract

    The performances of two perceptual properties of the peripheral auditory system, synaptic adaptation and two-tone suppression, are compared for automatic speech recognition (ASR) in an additive noise environment. A simple method of synaptic adaptation as determined by psychoacoustic observations was implemented with temporal processing of speech utilizing a zero-crossing auditory model as a pre-processing front end. The concept is similar to RASTA processing, but instead of bandpass filters, a high-pass infinite impulse response (IIR) filter is used. It is shown that rapid synaptic adaptation may be implemented by temporal processing using the zero-crossing algorithm, not otherwise implementable in the spectral domain implementation. The two-tone suppression was implemented in the zero-crossing auditory model using a companding strategy. Recognition performances with the two perceptual features were evaluated on isolated digits (TIDIGITS) corpus using continuous density HMM recognizer in white, factory, babble and Volvo noise. It is observed that synaptic adaptation performs better in stationary white Gaussian noise. In presence of non-stationary non-Gaussian noise, however, no improvements or a degradation is observed. Moreover, a reciprocal effect is observed with two-tone suppression, with better performance in non-Gaussian real-world noise and degradation in stationary white Gaussian noise.
    Original languageEnglish
    Pages (from-to)58-75
    JournalSpeech Communication
    Volume51
    Issue number1
    DOIs
    Publication statusPublished - 2009

    Fingerprint

    Dive into the research topics of 'Perceptual features for automatic speech recognition in noisy environments'. Together they form a unique fingerprint.

    Cite this