The quantization distortion of vector quantization (VQ) is a key element that affects the performance of a discrete hidden Markov modeling (DHMM) system. Many researchers have realized this problem and tried to use integrated feature or multiple codebook in their systems to offset the disadvantage of the conventional VQ, However the computational complexity of those systems is then increased.Investigations have shown that the speech signal space consists of finite clusters that represent phoneme data sets from male and female speakers and reveal Gaussian distributions. In this paper we propose an alternative VQ method in which the phoneme is treated as a cluster in the speech space and a Gaussian model is estimated for each phoneme. A Gaussian mixture model (GMM) is generated by the expectation-maximization (EM) algorithm for the whole speech space and used as a codebook in which each code word is a Gaussian model and represents a certain cluster. An input utterance would be classified as a certain phoneme or a set of phonemes only when the phoneme or phonemes gave highest likelihood. A typical discrete HMM system was used for both phoneme and isolated word recognition. The results show that the phoneme-based Gaussian modeling vector quantization classifies the speech space more effectively and significant improvements in the performance of the DHMM system have been achieved.
|Journal||IEEE Transactions on Speech and Audio Processing|
|Publication status||Published - 1997|