Auxiliary Classifier Generative Adversarial Network with Soft Labels in Imbalanced Acoustic Event Detection

Xianjun Xia, Roberto Togneri, Ferdous Sohel, David Huang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In acoustic event detection, the training data size of some acoustic events are often small and imbalanced. To deal with these, this paper proposes to generate the virtual training data categorically using the auxiliary classifier generative adversarial networks. Soft labels of acoustic events are first calculated to represent the acoustic event localization information. The closer the current frame is to the middle of the manually labeled acoustic event, the higher the soft label will be, which makes the soft labels positively correlated to the acoustic event localization. Then the acoustic event class and the quantized soft labels are used as the input condition to the auxiliary classifier generative adversarial networks to generate arbitrary number of training samples. Experimental results on the TUT Sound Event 2016 under the home environment and TUT Sound Event 2017 under the street environment demonstrate the improved performance of the proposed technique compared to existing acoustic event detection systems.

Original languageEnglish
Pages (from-to)1359-1371
JournalIEEE Transactions on Multimedia
Volume21
Issue number6
Early online date1 Jan 2018
DOIs
Publication statusPublished - Jun 2019

Fingerprint

Labels
Classifiers
Acoustics
Acoustic waves

Cite this

@article{47f98bbfd4bd4512b75990d9a8ffb5d3,
title = "Auxiliary Classifier Generative Adversarial Network with Soft Labels in Imbalanced Acoustic Event Detection",
abstract = "In acoustic event detection, the training data size of some acoustic events are often small and imbalanced. To deal with these, this paper proposes to generate the virtual training data categorically using the auxiliary classifier generative adversarial networks. Soft labels of acoustic events are first calculated to represent the acoustic event localization information. The closer the current frame is to the middle of the manually labeled acoustic event, the higher the soft label will be, which makes the soft labels positively correlated to the acoustic event localization. Then the acoustic event class and the quantized soft labels are used as the input condition to the auxiliary classifier generative adversarial networks to generate arbitrary number of training samples. Experimental results on the TUT Sound Event 2016 under the home environment and TUT Sound Event 2017 under the street environment demonstrate the improved performance of the proposed technique compared to existing acoustic event detection systems.",
keywords = "Acoustic event detection, auxiliary classifier generative adversarial networks, quantized confidence measure",
author = "Xianjun Xia and Roberto Togneri and Ferdous Sohel and David Huang",
year = "2019",
month = "6",
doi = "10.1109/TMM.2018.2879750",
language = "English",
volume = "21",
pages = "1359--1371",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "6",

}

Auxiliary Classifier Generative Adversarial Network with Soft Labels in Imbalanced Acoustic Event Detection. / Xia, Xianjun; Togneri, Roberto; Sohel, Ferdous; Huang, David.

In: IEEE Transactions on Multimedia, Vol. 21, No. 6, 06.2019, p. 1359-1371.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Auxiliary Classifier Generative Adversarial Network with Soft Labels in Imbalanced Acoustic Event Detection

AU - Xia, Xianjun

AU - Togneri, Roberto

AU - Sohel, Ferdous

AU - Huang, David

PY - 2019/6

Y1 - 2019/6

N2 - In acoustic event detection, the training data size of some acoustic events are often small and imbalanced. To deal with these, this paper proposes to generate the virtual training data categorically using the auxiliary classifier generative adversarial networks. Soft labels of acoustic events are first calculated to represent the acoustic event localization information. The closer the current frame is to the middle of the manually labeled acoustic event, the higher the soft label will be, which makes the soft labels positively correlated to the acoustic event localization. Then the acoustic event class and the quantized soft labels are used as the input condition to the auxiliary classifier generative adversarial networks to generate arbitrary number of training samples. Experimental results on the TUT Sound Event 2016 under the home environment and TUT Sound Event 2017 under the street environment demonstrate the improved performance of the proposed technique compared to existing acoustic event detection systems.

AB - In acoustic event detection, the training data size of some acoustic events are often small and imbalanced. To deal with these, this paper proposes to generate the virtual training data categorically using the auxiliary classifier generative adversarial networks. Soft labels of acoustic events are first calculated to represent the acoustic event localization information. The closer the current frame is to the middle of the manually labeled acoustic event, the higher the soft label will be, which makes the soft labels positively correlated to the acoustic event localization. Then the acoustic event class and the quantized soft labels are used as the input condition to the auxiliary classifier generative adversarial networks to generate arbitrary number of training samples. Experimental results on the TUT Sound Event 2016 under the home environment and TUT Sound Event 2017 under the street environment demonstrate the improved performance of the proposed technique compared to existing acoustic event detection systems.

KW - Acoustic event detection

KW - auxiliary classifier generative adversarial networks

KW - quantized confidence measure

UR - http://www.scopus.com/inward/record.url?scp=85056147978&partnerID=8YFLogxK

U2 - 10.1109/TMM.2018.2879750

DO - 10.1109/TMM.2018.2879750

M3 - Article

VL - 21

SP - 1359

EP - 1371

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 6

ER -