Phoneme-based vector quantization in a discrete HMM speech recognizer

Y. Zhang, Roberto Togneri, Michael Alder

    Research output: Contribution to journalArticlepeer-review

    7 Citations (Scopus)

    Abstract

    The quantization distortion of vector quantization (VQ) is a key element that affects the performance of a discrete hidden Markov modeling (DHMM) system. Many researchers have realized this problem and tried to use integrated feature or multiple codebook in their systems to offset the disadvantage of the conventional VQ, However the computational complexity of those systems is then increased.Investigations have shown that the speech signal space consists of finite clusters that represent phoneme data sets from male and female speakers and reveal Gaussian distributions. In this paper we propose an alternative VQ method in which the phoneme is treated as a cluster in the speech space and a Gaussian model is estimated for each phoneme. A Gaussian mixture model (GMM) is generated by the expectation-maximization (EM) algorithm for the whole speech space and used as a codebook in which each code word is a Gaussian model and represents a certain cluster. An input utterance would be classified as a certain phoneme or a set of phonemes only when the phoneme or phonemes gave highest likelihood. A typical discrete HMM system was used for both phoneme and isolated word recognition. The results show that the phoneme-based Gaussian modeling vector quantization classifies the speech space more effectively and significant improvements in the performance of the DHMM system have been achieved.
    Original languageEnglish
    Pages (from-to)26-32
    JournalIEEE Transactions on Speech and Audio Processing
    Volume5
    Issue number1
    Publication statusPublished - 1997

    Fingerprint

    Dive into the research topics of 'Phoneme-based vector quantization in a discrete HMM speech recognizer'. Together they form a unique fingerprint.

    Cite this