Model-based time-frequency mask enhancement for robust missing data speaker identification

Daniel Pullella

    Research output: ThesisDoctoral Thesis

    207 Downloads (Pure)


    [Truncated abstract] The performance of speaker recognition systems degrades significantly when distortions affect the input speech signals. Missing data compensation has been demonstrated as an effective technique for increasing the robustness of speaker identification systems to environmental noise. The robustness provided by missing data strategies is critically dependent on the accuracy of the time-frequency reliability mask, which labels each time-frequency component as speech or noise dominant. Conventional approaches to missing data identification have focused on accurately estimating the a priori ‘oracle’ reliability mask by using the received speech signal. The weakness of these source-based approaches is the difficulty of producing accurate mask estimates in non-stationary environments, and the lack of protection offered to the recognizer when the mask estimate is poor. In this thesis a new approach to missing data speaker identification is presented, where noise robustness is improved by combining information from the received signal and the speaker models. Binary reliability masking is used to integrate source and model-based processing, with the goal of increasing the identification rate. A model-based mask enhancement framework is proposed to achieve this, with implementations according to two different sub-paradigms. In the first approach, conventional mask estimates are enhanced using model information extracted prior to evaluation. The Feature Selection Mask Enhancement (FSME) algorithm implements this framework by using discriminative analysis to produce feature subsets, allowing the removal of mask errors in non-discriminative spectral bands. Using FSME significantly outperforms conventional missing data identification for digit-based input speech.
    Original languageEnglish
    QualificationDoctor of Philosophy
    Publication statusUnpublished - 2011


    Dive into the research topics of 'Model-based time-frequency mask enhancement for robust missing data speaker identification'. Together they form a unique fingerprint.

    Cite this