Multimodal Soundscape Synthesis

Research output: Contribution to conferenceConference presentation/ephemerapeer-review


In this work in progress, we study various generative probabilistic models as a critical media for producing artificial soundscape elements from multimodal input, mainly natural language. This is motivated by the lack of generative environmental audio models in the deep learning literature and their potential in sound synthesis frameworks. On a technical level, we use off the shelf models such as multimodal autoencoders to find semantically adequate sound vectors in the latent space of generative adversarial networks. By controlling raw audio adversarial synthesis engines with multimodal interfaces, we flesh out the connections between abstract semantic manifolds and latent sound design spaces. At this point our results lack the quality and resolution of natural soundscapes, but we propose technical improvements. Ultimately, the models will be evaluated in terms of the degree of conceptual resemblance between generated sounds and semantic contents of the conditioning inputs. As such, this work is not concerned with reconstructing causal or physical processes underlying soundscape generation but seeks to leverage crossmodal correlates in humanly annotated audio distributions for creative purposes. More broadly, by interweaving creative practices in soundscape composition and multimodal learning techniques, we contribute to the discussion on the effects of the automation of creative labour.
Original languageEnglish
Publication statusPublished - 26 Aug 2021
EventAustralasian Computer Music Conference 2021 - Sydney and Melbourne, Australia
Duration: 26 Aug 202127 Aug 2021


ConferenceAustralasian Computer Music Conference 2021
Abbreviated titleACMC2021
CitySydney and Melbourne


Dive into the research topics of 'Multimodal Soundscape Synthesis'. Together they form a unique fingerprint.

Cite this