Traditionally, voice activity detection algorithms are based on any combination of general speech properties such as temporal energy variations, periodicity, and spectrum. This paper describes a novel statistical method for voice activity detection using a signal-to-noise ratio measure. The method employs a low-variance spectrum estimate and determines' an optimal threshold based on the estimated noise statistics. A possible implementation is presented and evaluated over a large test set and compared to current modern standardized algorithms. The evaluations indicate promising results with the proposed scheme being comparable or favorable over the whole test set.
|Journal||IEEE ACM Transactions on Audio, Speech, and Language Processing|
|Publication status||Published - 2006|
Davis, A., Nordholm, S. E., & Togneri, R. (2006). Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold. IEEE ACM Transactions on Audio, Speech, and Language Processing, 14(2), 412-424. https://doi.org/10.1109/TSA.2005.855842