TY - JOUR
T1 - Sound Event Detection Using Multiple Optimized Kernels
AU - Xia, Xianjun
AU - Togneri, Roberto
AU - Sohel, Ferdous
AU - Zhao, Yuanjun
AU - Huang, Defeng
PY - 2020/5/28
Y1 - 2020/5/28
N2 - Sound event detection (SED) has been widely applied in real world applications. Convolutional recurrent neural network based SED approaches have achieved state-of-the-art performance. However, the convolution process is typically performed by using a fixed sized kernel, which adversely affects the detection accuracy especially when the acoustic features of different event classes are characterized by high variations. To deal with this, this article proposes a sound event detection technique using a convolutional recurrent neural network framework with multiple convolutional kernels of different sizes. The top performing kernels are selected from a kernel pool based on the unsupervised clustering errors and the accuracies of the temporarily trained models. Afterwards, the selected kernels are fed to multiple convolution layers to deal with the acoustic feature variations. Experimental results on different subsets of AudioSet, namely the DCASE Challenge 2017 Task 4 and DCASE Challenge 2018 Task 4, demonstrate the performance of the proposed approach compared to state-of-the-art systems.
AB - Sound event detection (SED) has been widely applied in real world applications. Convolutional recurrent neural network based SED approaches have achieved state-of-the-art performance. However, the convolution process is typically performed by using a fixed sized kernel, which adversely affects the detection accuracy especially when the acoustic features of different event classes are characterized by high variations. To deal with this, this article proposes a sound event detection technique using a convolutional recurrent neural network framework with multiple convolutional kernels of different sizes. The top performing kernels are selected from a kernel pool based on the unsupervised clustering errors and the accuracies of the temporarily trained models. Afterwards, the selected kernels are fed to multiple convolution layers to deal with the acoustic feature variations. Experimental results on different subsets of AudioSet, namely the DCASE Challenge 2017 Task 4 and DCASE Challenge 2018 Task 4, demonstrate the performance of the proposed approach compared to state-of-the-art systems.
KW - clustering errors
KW - kernel optimization
KW - multiple convolution layers
KW - Sound event detection
UR - http://www.scopus.com/inward/record.url?scp=85088013982&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.2998298
DO - 10.1109/TASLP.2020.2998298
M3 - Article
AN - SCOPUS:85088013982
SN - 2329-9290
VL - 28
SP - 1745
EP - 1754
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 9103031
ER -