In this work in progress, we study various generative probabilistic models as a critical media for producing artificial soundscape elements from multimodal input, mainly natural language. This is motivated by the lack of generative environmental audio models in the deep learning literature and their potential in sound synthesis frameworks. On a technical level, we use off the shelf models such as multimodal autoencoders to find semantically adequate sound vectors in the latent space of generative adversarial networks. By controlling raw audio adversarial synthesis engines with multimodal interfaces, we flesh out the connections between abstract semantic manifolds and latent sound design spaces. At this point our results lack the quality and resolution of natural soundscapes, but we propose technical improvements. Ultimately, the models will be evaluated in terms of the degree of conceptual resemblance between generated sounds and semantic contents of the conditioning inputs. As such, this work is not concerned with reconstructing causal or physical processes underlying soundscape generation but seeks to leverage crossmodal correlates in humanly annotated audio distributions for creative purposes. More broadly, by interweaving creative practices in soundscape composition and multimodal learning techniques, we contribute to the discussion on the effects of the automation of creative labour.
|Journal||Proceedings of the Australasian Computer Music Conference 2021|
|Publication status||Published - 26 Aug 2021|