Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations

Shamsiah Abidin, Roberto Togneri, Ferdous Sohel

Research output: Chapter in Book/Conference paperConference paper

Abstract

The classification of acoustic scenes is important in emerging applications such as automatic audio surveillance, machine listening and multimedia content analysis. In this paper, we present an approach for acoustic scene classification by using joint time-frequency image-based feature representations. In acoustic scene classification, joint time-frequency representation (TFR) is shown to better represent important information across a wide range of low and middle frequencies in the audio signal. The audio signal is converted to Constant-Q Transform (CQT) and Mel-spectrum TFRs and local binary patterns (LBP) are used to extract the features from these TFRs. To ensure localized spectral information is not lost, the TFRs are divided into a number of zones. Then, we perform score level fusion to further improve the classification performance accuracy. Our technique achieves a competitive performance with a classification accuracy of 83.4% on the DCASE 2016 development dataset compared to the existing current state of the art.

Original languageEnglish
Title of host publicationProceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Electronic)9781538692943
DOIs
Publication statusPublished - 11 Feb 2019
Event15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018 - Auckland, New Zealand
Duration: 27 Nov 201830 Nov 2018

Publication series

NameProceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance

Conference

Conference15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018
CountryNew Zealand
CityAuckland
Period27/11/1830/11/18

Fingerprint

Acoustics
Fusion reactions
Mathematical transformations

Cite this

Abidin, S., Togneri, R., & Sohel, F. (2019). Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations. In Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance [8639164] (Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance). IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/AVSS.2018.8639164
Abidin, Shamsiah ; Togneri, Roberto ; Sohel, Ferdous. / Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations. Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance. IEEE, Institute of Electrical and Electronics Engineers, 2019. (Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance).
@inproceedings{30c4e2963d9b47309d7ce1329f933b8a,
title = "Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations",
abstract = "The classification of acoustic scenes is important in emerging applications such as automatic audio surveillance, machine listening and multimedia content analysis. In this paper, we present an approach for acoustic scene classification by using joint time-frequency image-based feature representations. In acoustic scene classification, joint time-frequency representation (TFR) is shown to better represent important information across a wide range of low and middle frequencies in the audio signal. The audio signal is converted to Constant-Q Transform (CQT) and Mel-spectrum TFRs and local binary patterns (LBP) are used to extract the features from these TFRs. To ensure localized spectral information is not lost, the TFRs are divided into a number of zones. Then, we perform score level fusion to further improve the classification performance accuracy. Our technique achieves a competitive performance with a classification accuracy of 83.4{\%} on the DCASE 2016 development dataset compared to the existing current state of the art.",
author = "Shamsiah Abidin and Roberto Togneri and Ferdous Sohel",
year = "2019",
month = "2",
day = "11",
doi = "10.1109/AVSS.2018.8639164",
language = "English",
series = "Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
booktitle = "Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance",
address = "United States",

}

Abidin, S, Togneri, R & Sohel, F 2019, Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations. in Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance., 8639164, Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, IEEE, Institute of Electrical and Electronics Engineers, 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018, Auckland, New Zealand, 27/11/18. https://doi.org/10.1109/AVSS.2018.8639164

Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations. / Abidin, Shamsiah; Togneri, Roberto; Sohel, Ferdous.

Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance. IEEE, Institute of Electrical and Electronics Engineers, 2019. 8639164 (Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance).

Research output: Chapter in Book/Conference paperConference paper

TY - GEN

T1 - Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations

AU - Abidin, Shamsiah

AU - Togneri, Roberto

AU - Sohel, Ferdous

PY - 2019/2/11

Y1 - 2019/2/11

N2 - The classification of acoustic scenes is important in emerging applications such as automatic audio surveillance, machine listening and multimedia content analysis. In this paper, we present an approach for acoustic scene classification by using joint time-frequency image-based feature representations. In acoustic scene classification, joint time-frequency representation (TFR) is shown to better represent important information across a wide range of low and middle frequencies in the audio signal. The audio signal is converted to Constant-Q Transform (CQT) and Mel-spectrum TFRs and local binary patterns (LBP) are used to extract the features from these TFRs. To ensure localized spectral information is not lost, the TFRs are divided into a number of zones. Then, we perform score level fusion to further improve the classification performance accuracy. Our technique achieves a competitive performance with a classification accuracy of 83.4% on the DCASE 2016 development dataset compared to the existing current state of the art.

AB - The classification of acoustic scenes is important in emerging applications such as automatic audio surveillance, machine listening and multimedia content analysis. In this paper, we present an approach for acoustic scene classification by using joint time-frequency image-based feature representations. In acoustic scene classification, joint time-frequency representation (TFR) is shown to better represent important information across a wide range of low and middle frequencies in the audio signal. The audio signal is converted to Constant-Q Transform (CQT) and Mel-spectrum TFRs and local binary patterns (LBP) are used to extract the features from these TFRs. To ensure localized spectral information is not lost, the TFRs are divided into a number of zones. Then, we perform score level fusion to further improve the classification performance accuracy. Our technique achieves a competitive performance with a classification accuracy of 83.4% on the DCASE 2016 development dataset compared to the existing current state of the art.

UR - http://www.scopus.com/inward/record.url?scp=85063297998&partnerID=8YFLogxK

U2 - 10.1109/AVSS.2018.8639164

DO - 10.1109/AVSS.2018.8639164

M3 - Conference paper

T3 - Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance

BT - Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance

PB - IEEE, Institute of Electrical and Electronics Engineers

ER -

Abidin S, Togneri R, Sohel F. Acoustic Scene Classification Using Joint Time-Frequency Image-Based Feature Representations. In Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance. IEEE, Institute of Electrical and Electronics Engineers. 2019. 8639164. (Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance). https://doi.org/10.1109/AVSS.2018.8639164