Applications of Deep Learning to Audio Generation

Yuanjun Zhao, Xianjun Xia, Roberto Togneri

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


In the recent past years, deep learning based machine learning systems have demonstrated remarkable success for a wide range of learning tasks in multiple domains such as computer vision, speech recognition and other pattern recognition based applications. The purpose of this article is to contribute a timely review and introduction of state-of-the-art deep learning techniques and their effectiveness in speech/acoustic signal processing. Thorough investigations of various deep learning architectures are provided under the categories of discriminative and generative algorithms, including the up-to-date Generative Adversarial Networks (GANs) as an integrated model. A comprehensive overview of applications in audio generation is highlighted. Based on understandings from these approaches, we discuss how deep learning methods can benefit the field of speech/acoustic signal synthesis and the potential issues that need to be addressed for prospective real-world scenarios. We hope this survey provides a valuable reference for practitioners seeking to innovate in the usage of deep learning approaches for speech/acoustic signal generation.

Original languageEnglish
Article number8873418
Pages (from-to)19-38
Number of pages20
JournalIEEE Circuits and Systems Magazine
Issue number4
Publication statusPublished - 1 Oct 2019


Dive into the research topics of 'Applications of Deep Learning to Audio Generation'. Together they form a unique fingerprint.

Cite this