Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification

Shamsiah Abidin, Roberto Togneri, Ferdous Sohel

Research output: Contribution to journalArticle

1 Citation (Scopus)
69 Downloads (Pure)

Abstract

In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.

Original languageEnglish
Article number8410481
Pages (from-to)2112-2121
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume26
Issue number11
DOIs
Publication statusPublished - 1 Nov 2018

Fingerprint

Acoustics
Transform
Binary
acoustics
Adjacent
Short-time Fourier Transform
evaluation
Evaluation
Fourier transforms

Cite this

@article{01728dc4abba418da376e5033a706e3e,
title = "Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification",
abstract = "In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2{\%} improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5{\%}, which outperforms one of the top performing systems.",
keywords = "Acoustic scene, feature extraction, fusion, local binary patterns, time-frequency analysis",
author = "Shamsiah Abidin and Roberto Togneri and Ferdous Sohel",
year = "2018",
month = "11",
day = "1",
doi = "10.1109/TASLP.2018.2854861",
language = "English",
volume = "26",
pages = "2112--2121",
journal = "IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING",
issn = "1558-7916",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "11",

}

Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification. / Abidin, Shamsiah; Togneri, Roberto; Sohel, Ferdous.

In: IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 26, No. 11, 8410481, 01.11.2018, p. 2112-2121.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification

AU - Abidin, Shamsiah

AU - Togneri, Roberto

AU - Sohel, Ferdous

PY - 2018/11/1

Y1 - 2018/11/1

N2 - In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.

AB - In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time-frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time-frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.

KW - Acoustic scene

KW - feature extraction

KW - fusion

KW - local binary patterns

KW - time-frequency analysis

UR - http://www.scopus.com/inward/record.url?scp=85049785767&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2018.2854861

DO - 10.1109/TASLP.2018.2854861

M3 - Article

VL - 26

SP - 2112

EP - 2121

JO - IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

JF - IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

SN - 1558-7916

IS - 11

M1 - 8410481

ER -