Abstract
Traditionally, voice activity detection algorithms are based on any combination of general speech properties such as temporal energy variations, periodicity, and spectrum. This paper describes a novel statistical method for voice activity detection using a signal-to-noise ratio measure. The method employs a low-variance spectrum estimate and determines' an optimal threshold based on the estimated noise statistics. A possible implementation is presented and evaluated over a large test set and compared to current modern standardized algorithms. The evaluations indicate promising results with the proposed scheme being comparable or favorable over the whole test set.
Original language | English |
---|---|
Pages (from-to) | 412-424 |
Journal | IEEE ACM Transactions on Audio, Speech, and Language Processing |
Volume | 14 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2006 |