Publications
2024
- A Deep Learning Framework for Musical Acoustics SimulationsJiafeng Chen , Kıvanç Tatar , and Victor ZappiIn AIMC 2024 , Aug 2024
The acoustic modeling of musical instruments is a heavy computational process, often bound to the solution of complex systems of partial differential equations (PDEs). Numerical models can achieve a high level of accuracy, but they may take up to several hours to complete a full simulation, especially in the case of intricate musical mechanisms. The application of deep learning, and in particular of neural operators that learn mappings between function spaces, has the potential to revolutionize how acoustics PDEs are solved and noticeably speed up musical simulations. However, extensive research is necessary to understand the applicability of such operators in musical acoustics; this requires large datasets, capable of exemplifying the relationship between input parameters (excitation) and output solutions (acoustic wave propagation) per each target musical instrument/configuration. With this work, we present an open-access, open-source framework designed for the generation of numerical musical acoustics datasets and for the training/benchmarking of acoustics neural operators. We first describe the overall structure of the framework and the proposed data generation workflow. Then, we detail the first numerical models that were ported to the framework. This work is a first step towards the gathering of a research community that focuses on deep learning applied to musical acoustics, and shares workflows and benchmarking tools.
- Sounding out extra-normal AI voice: Non-normative musical engagements with normative AI voice and speech technologiesKelsey Cotton , and Kıvanç TatarIn AIMC 2024 , Aug 2024
How do we challenge the norms of AI voice technologies? What would be a non-normative approach in finding novel artistic possibilities of speech synthesis and text-to-speech with Deep Learning? This paper delves into SpeechBrain, OpenAI and CoquiTTS voice and speech models with the perspective of an experimental vocal practitioner. Exploratory Research-through-Design guided an engagement with pre-trained speech synthesis models to reveal their musical affordances in an experimental vocal practice. We recorded this engagement with voice and speech Deep Learning technologies using auto-ethnography, a novel and recent methodology in Human-Computer Interaction. Our position in this paper actively subverts the normative function of these models, provoking nonsensical AI-mediation of human vocality. Emerging from a sense-making process of poetic AI nonsense, we uncover the generative potential of non-normative usage of normative speech recognition and synthesis models. We contribute with insights about the affordances of Research-through-Design to inform artistic working processes with AI models; how AI-mediations reform understandings of human vocality; and artistic perspectives and practice as knowledge-creation mechanisms for working with technology.
- Singing for the Missing: Bringing the Body Back to AI Voice and Speech TechnologiesKelsey Cotton , Katja Vries , and Kıvanç TatarIn Proceedings of the 9th International Conference on Movement and Computing , Utrecht, Netherlands, May 2024
Technological advancements in deep learning for speech and voice have contributed to a recent expansion in applications for voice cloning, synthesis and generation. Invisibilised stakeholders in this expansion are numerous absent bodies, whose voices and voice data have been integral to the development and refinement of these speech technologies. This position paper probes current working practices for voice and speech in machine learning and AI, in which the bodies of voices are “invisibilised". We examine the facts and concerns about the voice-Body in applications of AI-voice technology. We do this through probing the wider connections between voice data and Schaefferian listening; speculating on the consequences of missing Bodies in AI-Voice; and by examining how vocalists and artists working with synthetic Bodies and AI-voices are ‘bringing the Body back’ in their own practices. We contribute with a series of considerations for how practitioners and researchers may help to ‘bring the Body back’ into AI-voice technologies.
- Interfacing ErgoJr with Creative Coding PlatformsMatteo Caravati , and Kıvanç TatarIn Proceedings of the 9th International Conference on Movement and Computing , Utrecht, Netherlands, May 2024
This paper introduces a project enabling non-coders to control a Poppy Ergo Jr. robotic arm with Dynamixel servomotors. Originally using a Raspberry Pi and Pixl board, various constraints related to importation led to adopting a ROBOTIS OpenCM9.04 board. A client-server architecture was implemented for remote control, with creative coding platforms (p5.js, Processing, Pure Data, Python) as clients. The server, utilizing a two-layer architecture, manages communication and interfaces with the ROBOTIS OpenCM9.04 board. The OSC and WebSocket protocols were chosen for communication due to their flexibility and their ease of use. Clients were developed for each platform, leveraging compatibility layers.
- A Shift in Artistic Practices through Artificial IntelligenceKıvanç Tatar , Petter Ericson , Kelsey Cotton , and 6 more authorsLeonardo, Apr 2024
The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globally, how does this new paradigm shift challenge the status quo in artistic practices? What kind of changes will AI technology bring to music, arts, and new media?
2023
- Caring Trouble and Musical AI: Considerations towards a Feminist Musical AIKelsey Cotton , and Kıvanç TatarIn AIMC 2023 , Aug 2023
The ethics of AI as both material and medium for interaction remains in murky waters within the context of musical and artistic practice. The interdisciplinarity of the field is revealing matters of concern and care, which necessitate interdisciplinary methodologies for evaluation to trouble and critique the inheritance of ‘residue-laden’ AI-tools in musical applications. Seeking to unsettle these murky waters, this paper critically examines the example of Holly+, a deep neural network that generates raw audio in the likeness of its creator Holly Herndon. Drawing from theoretical concerns and considerations from speculative feminism and care ethics, we care-fully trouble the structures, frameworks and assumptions that oscillate within and around Holly+. We contribute with several considerations and contemplate future directions for integrating speculative feminism and care into musical-AI agent and system design, derived from our critical feminist examination.
- Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning ArchitecturesKıvanç Tatar , Kelsey Cotton , and Daniel BisigIn , Aug 2023
The research in Deep Learning applications in sound and music computing have gathered an interest in the recent years; however, there is still a missing link between these new technologies and on how they can be incorporated into real-world artistic practices. In this work, we explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs). These architectures have been used in many areas for generating latent spaces where data points are organized so that similar data points locate closer to each other. Previously, VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts. Applying VAE to audio features of timbre requires a vocoder to transform the timbre generated by the network to an audio signal, which is computationally expensive. In this work, we apply VAEs to raw audio data directly while bypassing audio feature extraction. This approach allows the practitioners to use any audio recording while giving flexibility and control over the aesthetics through dataset curation. The lower computation time in audio signal generation allows the raw audio approach to be incorporated into real-time applications. In this work, we propose three strategies to explore latent spaces of audio and timbre for sound design applications. By doing so, our aim is to initiate a conversation on artistic approaches and strategies to utilize latent audio spaces in sound and music practices.
- On the importance of AI research beyond disciplinesVirginia Dignum , Donal Casey , Teresa Cerratto-Pargman , and 17 more authorsFeb 2023arXiv:2302.06655 [cs]
As the impact of AI on various scientific fields is increasing, it is crucial to embrace interdisciplinary knowledge to understand the impact of technology on society. The goal is to foster a research environment beyond disciplines that values diversity and creates, critiques and develops new conceptual and theoretical frameworks. Even though research beyond disciplines is essential for understanding complex societal issues and creating positive impact it is notoriously difficult to evaluate and is often not recognized by current academic career progression. The motivation for this paper is to engage in broad discussion across disciplines and identify guiding principles fir AI research beyond disciplines in a structured and inclusive way, revealing new perspectives and contributing to societal and human wellbeing and sustainability.
2021
- Raw Music from Free Movements: Early Experiments in Using Machine Learning to Create Raw Audio from Dance MovementsDaniel Bisig , and Kıvanç TatarIn Proceedings of the 2nd AI Music Creativity Conference (AIMC 2021) , Feb 2021
Raw Music from Free Movements is a deep learning architecture that translates pose sequences into audio waveforms. The architecture combines a sequence-to-sequence model generating audio encodings and an adversarial autoencoder that generates raw audio from audio encodings. Experiments have been conducted with two datasets: a dancer improvising freely to a given music, and music created through simple movement sonification. The paper presents preliminary results. These will hopefully lead closer towards a model which can learn from the creative decisions a dancer makes when translating music into movement and then follow these decisions reversely for the purpose of generating music from movement.
2020
- Latent Timbre SynthesisKıvanç Tatar , Daniel Bisig , and Philippe PasquierNeural Computing and Applications, Oct 2020
We present the Latent Timbre Synthesis, a new audio synthesis method using deep learning. The synthesis method allows composers and sound designers to interpolate and extrapolate between the timbre of multiple sounds using the latent space of audio frames. We provide the details of two Variational Autoencoder architectures for the Latent Timbre Synthesis and compare their advantages and drawbacks. The implementation includes a fully working application with a graphical user interface, called interpolate_two, which enables practitioners to generate timbres between two audio excerpts of their selection using interpolation and extrapolation in the latent space of audio frames. Our implementation is open source, and we aim to improve the accessibility of this technology by providing a guide for users with any technical background. Our study includes a qualitative analysis where nine composers evaluated the Latent Timbre Synthesis and the interpolate_two application within their practices.