Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification

Shamsiah Abidin, Roberto Togneri, Ferdous Sohel

Research output: Contribution to journalArticlepeer-review

33 Citations (Scopus)
371 Downloads (Pure)

Abstract

In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.

Original languageEnglish
Article number8410481
Pages (from-to)2112-2121
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume26
Issue number11
DOIs
Publication statusPublished - 1 Nov 2018

Fingerprint

Dive into the research topics of 'Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification'. Together they form a unique fingerprint.

Cite this