Abstract
In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.
Original language | English |
---|---|
Article number | 8410481 |
Pages (from-to) | 2112-2121 |
Number of pages | 10 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 26 |
Issue number | 11 |
DOIs | |
Publication status | Published - 1 Nov 2018 |