Applications of Deep Learning to Audio Generation

Yuanjun Zhao, Xianjun Xia, Roberto Togneri

Research output: Contribution to journalArticle

Abstract

In the recent past years, deep learning based machine learning systems have demonstrated remarkable success for a wide range of learning tasks in multiple domains such as computer vision, speech recognition and other pattern recognition based applications. The purpose of this article is to contribute a timely review and introduction of state-of-the-art deep learning techniques and their effectiveness in speech/acoustic signal processing. Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model. A comprehensive overview of applications in audio generation is highlighted. Based on understandings from these approaches, we discuss how deep learning methods can benefit the field of speech/acoustic signal synthesis and the potential issues that need to be addressed for prospective real-world scenarios. We hope this survey provides a valuable reference for practitioners seeking to innovate in the usage of deep learning approaches for speech/acoustic signal generation.

Original languageEnglish
Article number8873418
Pages (from-to)19-38
Number of pages20
JournalIEEE Circuits and Systems Magazine
Volume19
Issue number4
DOIs
Publication statusPublished - 1 Oct 2019

Fingerprint

Learning systems
Acoustic signal processing
Acoustics
Speech recognition
Computer vision
Pattern recognition
Deep learning

Cite this

Zhao, Yuanjun ; Xia, Xianjun ; Togneri, Roberto. / Applications of Deep Learning to Audio Generation. In: IEEE Circuits and Systems Magazine. 2019 ; Vol. 19, No. 4. pp. 19-38.
@article{92922fcda7c94f1298dfca6a980c66f1,
title = "Applications of Deep Learning to Audio Generation",
abstract = "In the recent past years, deep learning based machine learning systems have demonstrated remarkable success for a wide range of learning tasks in multiple domains such as computer vision, speech recognition and other pattern recognition based applications. The purpose of this article is to contribute a timely review and introduction of state-of-the-art deep learning techniques and their effectiveness in speech/acoustic signal processing. Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model. A comprehensive overview of applications in audio generation is highlighted. Based on understandings from these approaches, we discuss how deep learning methods can benefit the field of speech/acoustic signal synthesis and the potential issues that need to be addressed for prospective real-world scenarios. We hope this survey provides a valuable reference for practitioners seeking to innovate in the usage of deep learning approaches for speech/acoustic signal generation.",
author = "Yuanjun Zhao and Xianjun Xia and Roberto Togneri",
year = "2019",
month = "10",
day = "1",
doi = "10.1109/MCAS.2019.2945210",
language = "English",
volume = "19",
pages = "19--38",
journal = "IEEE Circuits and Systems Magazine",
issn = "1531-636X",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "4",

}

Applications of Deep Learning to Audio Generation. / Zhao, Yuanjun; Xia, Xianjun; Togneri, Roberto.

In: IEEE Circuits and Systems Magazine, Vol. 19, No. 4, 8873418, 01.10.2019, p. 19-38.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Applications of Deep Learning to Audio Generation

AU - Zhao, Yuanjun

AU - Xia, Xianjun

AU - Togneri, Roberto

PY - 2019/10/1

Y1 - 2019/10/1

N2 - In the recent past years, deep learning based machine learning systems have demonstrated remarkable success for a wide range of learning tasks in multiple domains such as computer vision, speech recognition and other pattern recognition based applications. The purpose of this article is to contribute a timely review and introduction of state-of-the-art deep learning techniques and their effectiveness in speech/acoustic signal processing. Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model. A comprehensive overview of applications in audio generation is highlighted. Based on understandings from these approaches, we discuss how deep learning methods can benefit the field of speech/acoustic signal synthesis and the potential issues that need to be addressed for prospective real-world scenarios. We hope this survey provides a valuable reference for practitioners seeking to innovate in the usage of deep learning approaches for speech/acoustic signal generation.

AB - In the recent past years, deep learning based machine learning systems have demonstrated remarkable success for a wide range of learning tasks in multiple domains such as computer vision, speech recognition and other pattern recognition based applications. The purpose of this article is to contribute a timely review and introduction of state-of-the-art deep learning techniques and their effectiveness in speech/acoustic signal processing. Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model. A comprehensive overview of applications in audio generation is highlighted. Based on understandings from these approaches, we discuss how deep learning methods can benefit the field of speech/acoustic signal synthesis and the potential issues that need to be addressed for prospective real-world scenarios. We hope this survey provides a valuable reference for practitioners seeking to innovate in the usage of deep learning approaches for speech/acoustic signal generation.

UR - http://www.scopus.com/inward/record.url?scp=85077801909&partnerID=8YFLogxK

U2 - 10.1109/MCAS.2019.2945210

DO - 10.1109/MCAS.2019.2945210

M3 - Article

VL - 19

SP - 19

EP - 38

JO - IEEE Circuits and Systems Magazine

JF - IEEE Circuits and Systems Magazine

SN - 1531-636X

IS - 4

M1 - 8873418

ER -