Meta open sources an AI-powered music generator

Not to be outdone by Google, Meta has released its own AI-powered music generator — and, unlike Google, open-sourced it.

Called MusicGen, Meta’s music-generating tool, a demo of which can be found here, can turn a text description (e.g. “An ’80s driving pop song with heavy drums and synth pads in the background”) into about 12 seconds of audio, give or take. MusicGen can optionally be “steered” with reference audio, like an existing song, in which case it’ll try to follow both the description and melody.

We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody.
We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community: https://t.co/OkYjL4xDN7 pic.twitter.com/h1l4LGzYgf

— Felix Kreuk (@FelixKreuk) June 9, 2023

Meta says that MusicGen was trained on 20,000 hours of music, including 10,000 “high-quality” licensed music tracks and 390,000 instrument-only tracks from ShutterStock and Pond5, a large stock media library. The company hasn’t provided the code it used to train the model, but it has made available pre-trained models that anyone with the right hardware — chiefly a GPU with around 16GB of memory — can run.

So how does MusicGen perform? Well, I’d say — though certainly not well enough to put human musicians out of a job. Its songs are reasonably melodic, at least for basic prompts like “ambient chiptunes music,” and — to my ears — on par (if not slightly better) with the results from Google’s AI music generator, MusicLM. But they won’t win any awards.

Here’s the output from MusicGen for “jazzy elevator music”:

And here’s MusicLM’s take:

Next, I gave a more complicated prompt to attempt to throw MusicGen for a loop: “Lo-fi slow BPM electro chill with organic samples.” MusicGen surprisingly outshined MusicLM in terms of musical coherence, producing something that’d easily find a home on Lofi Girl.

Here’s MusicGen’s sample:

And here’s MusicLM’s:

To switch things up a bit, I tried using both tools to generate a piano ditty in the style of George Gershwin. I say “tried” because, in an effort to forestall the copyright issues around generative music tools, Google implemented a filter in the public version of MusicLM that blocks prompts mentioning specific artists.

MusicGen has no such filter. But the results for “Background piano music in the style of Gershwin,” left something to be desired, I must say:

Generative music is improving, clearly (see Riffusion, Dance Diffusion and OpenAI’s Jukebox). But major ethical and legal issues have yet to be ironed out. AI like MusicGen “learns” from existing music to produce similar effects, a fact with which not all artists — or generative AI users — are comfortable.

Increasingly, homemade tracks that use generative AI to conjure familiar sounds that can be passed off as authentic, or at least close enough, have been going viral. Music labels have been quick to flag them to streaming partners, citing intellectual property concerns — and they’ve generally been victorious. But there’s still a lack of clarity on whether “deepfake” music violates the copyright of artists, labels and other rights holders.

It might not be long before there’s guidance on the matter. Several lawsuits making their way through the courts will likely have a bearing on music-generating AI, including one pertaining to the rights of artists whose work is used to train AI systems without their knowledge or consent.

For its part, Meta, which isn’t imposing restrictions on how MusicGen can be used, says that all the music MusicGen was trained on was “covered by legal agreements with the right holders,” including a deal with Shutterstock.