AudioCraft: A Guide to text-to-music AI

Lukas Görög

Introduction

AudioCraft is an open-source AI music generator developed by Meta AI. It is a collection of generative AI tools designed to produce music and audio based on text prompts. AudioCraft simplifies the overall design of generative models for audio, making it easier for users to create music and sounds entirely through generative AI. Meta AI has recently released AudioCraft as part of their efforts to explore the potential of AI in music and audio generation. With AudioCraft, users can input text prompts and generate music and audio compositions using the power of AI algorithms. This tool opens up new possibilities for musicians, producers, and creators to experiment with AI-generated music and explore unique compositions.

Components of AudioCraft:

MusicGen:

MusicGen generates music based on text-based user inputs. It has been trained using Meta-owned dataset of 10K high-quality music tracks, ShutterStock and Pond5 music data. Users can provide text prompts, and MusicGen utilizes generative AI techniques to create music that aligns with the given input. We will talk about MusicGen in detail in the next chapters since it is our main objective.

AudioGen:

AudioGen focuses on generating audio. It allows users to input text-based prompts and generates realistic audio based on those prompts. AudioGen leverages single stage auto-regressive Transformer model to produce high-quality audio outputs. Here are some example prompts for AudioGen – “Sirens and a humming engine approach and pass”, “A duck quacking as birds chirp and a pigeon cooing”, “Whistling with wind blowing”

EnCodec:

EnCodec deals with audio encoding and decoding. It is responsible for the conversion of audio signals into a digital format that can be processed by the AI models. Thus, allowing anyone to train their own compression model tailored for their applications. EnCodec Promises High-fidelity (hi-fi) Neural Audio Compression. Which means high level of accuracy and realism in the reproduction of audio content. AudioCraft has also introduced diffusion-based EnCodec model which produces higher quality audio output than the EnCodec output at different bitrates.

Audio craft has published all the information in their Git Hub repository and in a YouTube video. And this is the way to proceed.

How to deploy MusicGen in Google Colab?

It is actually a simple process, constingin of these Steps:

Visit Audio craft GitHub public repository.

Scroll down to the Models and select MusicGen model.

You will have all the information about MusicGen including, introduction, Model Card, installation, instructions for different usages, API information and examples and custom model training instructions. You should click on “open in colab” icon in the introduction part.

If you already have a Google Colab setup in your account it will open in a new note book named “MusicGen Gradio Demo v1.0.0.ipynb”.

Then following the instructions given, you just have to run the code block in the cell and do not restart the run time when asked to do so.

Then Colab will prompt a warning message, which you will have to selecet “Run Anyway” option in order to deploy MusicGen. Then It will start cloning the code from the Git hub and installing all the requirement files in to the Colab environment.

After successful installation, you will be provided with a “Public URL”

When you click on the public URL, probably it will be in this format (“https://bb542b5b3f26abb89d.gradio.live/”) , you will be directed to a page like below.

Now you are just one step away form generating a music clip. We suppose you should have some sort of knowledge about music, because final output depends on how you write the prompt. Be sure to include the Musical genre, what type of instruments, how those should be played, what impressions should the audience feel. If you have a particular melody in your mind, uploa the audio file in to the melody section or simply select the mic option and sing it.

Once you are happy with the prompt and the melody if you uploaded it, click the Submit Icon and wait until it create the audio clip. By default your music clip will be 10s long. If you need a longer clip, change it from the “duration” bar.

You have other feature selection options to play with as well as per your requirement.

Finally you can listen to the Music Clip that you created and download it in to your computer. In the formats (MP3, MP4, WAV) accordingly.

Why Prompt Engineering is important:

Prompt engineering is important in MusicGen because it directly affects the quality of the generated music. A well-crafted prompt can help the model understand the desired style, genre, and mood of the music to be generated.

To write a good MusicGen prompt, be specific, Use descriptive words to convey the desired style, genre, and mood of the music. Use examples provided. Avoid ambiguity by using terms that can be interpreted in different ways. For example, instead of using “happy music,” use “upbeat music with major chords and fast tempo.” Keep it simple and more experiment.

Example MusicGen prompts:

Rock and Roll Revival:

Imagine a high-energy rock and roll anthem that kicks off with a gritty guitar riff, followed by a catchy vocal melody that embodies rebellious spirit. The drums drive the rhythm with relentless force, while the bass adds a pulsating groove. The chorus erupts into a powerful anthem, with raw vocals and electric guitar solos that capture the essence of youthful vigor and defiance.

Ethereal Ambient Journey:

Visualize an ambient masterpiece that transports listeners to an otherworldly realm. Soft, atmospheric synths blend seamlessly with delicate piano notes, creating a soothing soundscape. Gentle chimes and distant echoes evoke a sense of tranquility, while subtle electronic textures add depth. The music unfolds like a serene journey through celestial landscapes, inviting introspection and relaxation.

Funky Urban Groove:

Envision a funky urban track infused with groove and rhythm. The bassline is funky and infectious, driving the foundation of the composition. Tight and crisp drums provide a rhythmic backdrop for syncopated guitar riffs that encourage head-nodding. Horns punctuate the arrangement with bursts of energy, while a charismatic vocal line adds an element of swagger and streetwise cool.

Classical Elegance and Grace:

Picture an exquisite classical composition performed by a full orchestra. Delicate strings open the piece with a graceful melody, while woodwinds and brass add depth and richness. The music flows seamlessly through different sections, with moments of crescendo and decrescendo that convey emotions ranging from melancholy to triumph. The composition culminates in a breathtaking symphonic climax.

Upbeat Electronic Fusion:

Envision a dynamic electronic fusion piece that seamlessly blends elements of electronic dance music with world influences. Upbeat synthesizers introduce a catchy melody, while pulsating electronic beats drive the rhythm. Ethnic percussion and sampled instruments add a global flavor, creating a diverse sonic palette. The music builds to euphoric drops, inviting listeners to dance and celebrate.

Wrapping up

And that wraps up this guide to deploying Meta AI’s MusicGen text-to-music model on Google Colab! With just a few simple steps, you can now start generating original AI-powered music compositions from text prompts.

The key is getting the model installed and running in your Colab environment. Once set up, the intuitive interface makes it easy to experiment with different genres, instruments, melodies, and more to produce unique musical creations.While writing compelling prompts is an art that takes practice, MusicGen opens up endless possibilities. You can create 10-second clips to full instrumental tracks with just a few clicks and lines of text.

This transformative technology allows anyone to tap into advanced generative audio, no musical skills required. All you need is some creativity and inspiration to fuel MusicGen. We’ve only scratched the surface of what this text-to-music AI can produce.So go ahead, explore your inner musical genius and see what captivating compositions you can generate with MusicGen! The world of AI-assisted music creation awaits.