What's new

MusicLM, Google's AI for music generation

Cross posting from neighboring discussion. Some are still not taking in the eyeful that is right in front of them.

--------

There’s some self-deception going on here in the citations of Gutenberg, calculators, printers and other bouts of automation in our past. There is either self-deception or dissembling in the exhortation to just embrace the new tools and soar atop them. One had better do it, or try, for certain. But that optimism belies the depth of the problem.

E.g., no printer driver ever in history accepted input from how the final output to the printer was received by the user in order to recode itself to better anticipate user acceptance of its own future output. This is categorically different from human enablement by reducing drudgery, because drudgery and enablement themselves will, going forward, be continually redefined at a faster rate than ever before. Rate matters. Human minds have fixed clock speeds. We are already members of a society where most don’t have the capacity to “keep up.” Let’s make that still worse with no foresight or oversight, shall we?

Appeals to the idea of ML being “mere” interpolation and recombination - which it is, in very great quantities - assume isotropy in cognitive/performance capacity material progression which is everywhere refuted by the evolutionary record. Thresholds are real things and geometric coefficients matter. Adding enough to a system eventually yields greater-than-additive complexity in its output. Our minds themselves are nothing more than interpolation/recombination engines — evolved, and at a rate fixed by selective pressure and our own biology.

Do not be fooled by ML’s present clunkiness into thinking that it is under any such similar constraints. We lack the memory space and clock speed for rapid, iterative mimesis at arbitrary complexity that neural networks are designed for. Human beings for decades sought to solve the highly complex protein folding problem - how primary 2d sequences of amino acids yield the 3d structures of proteins. This is now realistically within reach due to ML and it’s merely one example of many. Think there’s some qualitative difference between that and your creativity? Make sure you at least challenge that assumption.

Your originality - past and future - is mere grist for this new mill, whether you recognize it or not. Fantasies about craft, intuition, creativity, classicism etc. ignore the fact that complex and time-dependent patterns will eventually be quantifiable and mimicked, even if they are not yet today. Your output will set the weights and biases of future ML systems. You will structure and build your own replacement.

I’m going full-bore scaremonger on this for a very specific reason. While I have no illusions that development of the tech can be halted or its deployment limited, the socioeconomic expectations around it can be altered, either with or without foresight, and that matters tremendously.

The preemptive surrender I see here in some cases - it’s the future! Embrace it! You can’t stop it! Just adapt! - is analogous to letting ourselves be Napstered all over again before the fight even happens.

The systems we are discussing are new because the relationship between output and input is new. Training, backpropagation and the schematics that employ and are altered by them mean that your output and feedback literally restructure the runtime code. The ones optimizing the networks are 1) their developers AND 2) the aggregate you, the arts community. You are stakeholders and code authors in a way that has never before been true in the case of mere digital duplication. That should be reflected in the way money changes hands. The Getty lawsuits should be just the beginning.
 
Library music = dead within 3 years. A hunt in which i personally have zero dogs, but keep in mind that the stuff in MusicLM is early days and the lowest-hanging fruit.


And the copyright holders of every track in the training set are owed royalties. If the law doesn't find that soon, then the law can't safeguard anyone's intellectual property over any timeframe of historical interest.
 
Library music = dead within 3 years. A hunt in which i personally have zero dogs, but keep in mind that the stuff in MusicLM is early days and the lowest-hanging fruit.


And the copyright holders of every track in the training set are owed royalties. If the law doesn't find that soon, then the law can't safeguard anyone's intellectual property over any timeframe of historical interest.
3 years? I think that's generous. And it won't just be production music. These things will make cues of every type better and faster than any of us... including the musical "geniuses" we all adore. And it will do it instantly. Lots of folks seem to think that we people aren't based on algos and that we're something "special". Our brains are most certainly algorithmic, easily hackable and easily imitated. We're seeing the baby-steps towards imitation happening now (marketing, religions, conspiracies, etc prove the hackable part). AI will be good at those things, too. :)

You can bet that all the tech we're seeing right now from Google, OpenAI and the like (even the latest "best" ones) are nothing compared to what they almost certainly have in the lab right now that we aren't privy to. From competition to public acceptance... they have the drib and drab this stuff out a little at a time. And based on how we're all already freaking out I can't imagine how people are going to cope with what's next. Facebook has been working on a prompted AI video creator like DALL-E does with images. Soon you'll be able to create motion pictures with prompted AI and not need to pay for actors, directors, CGI, studios, etc. It's just mind blowing.

My wife and I have been using ChatGPT this past week and it's just unreal. For my wife, for whom English is a second language, this thing is a dream for cleaning up emails, writing reviews of peers and simple daily emails. She now feels more empowered, like she can communicate without feeling any stigma from her imperfect English. It's a lot of pressure when you are trying to do a complex job, but with the daily interoffice crap being the most challenging and distracting thing (especially for foreign folks). AI is going to definitely help with all that drudgery. But at what cost to the rest?


**And on the subject of royalties... since these things are learning from datasets, is it actually infringement? I can also learn from datasets, and as long as I am not duplicating someone elses IP exactly, then it that infringement? Are influences infringement, or at least some type of infringement that we all accept somehow?

As a guitarist pretty much every lick I play came from someone else, or was influenced thusly. Do I owe Eric Johnson a bunch of royalties if I ever have a successful commercial release because I was influenced by him? (Eric and his lawyer can rest assured, that will never happen. lol).

I think people assume that these things are using little fragments of shredded paper to reconstruct things seemingly from scratch, but that is not how this works. This is a form of learning, not a series of xeroxed carbon copies being cleverly stiched together. That would be infringement, but this is not the case. It's sticky legal territory because these things are actually employing a form of thinking and learning in a very real sense, and are not simply copying and pasting the way most people seem to think.
 
Last edited:
Library music = dead within 3 years
Eh I don't know. That seems FUD to me.

MusicLM is like 60% there but getting to 100% is going to prove very difficult. We've seen this already in areas like self driving. Companies have invested billions into that and we're still not there. Google started working on self driving in 2009.

In its current state, MusicLM is really just a Google research project being used for marketing purposes amidst all the ChatGPT thing going on. It's really all a party trick to impress the audience. I seriously doubt they will decide to invest the billions needed to turn it into a business like they did with Waymo.

Self driving seems like a worthwhile pursuit for a number of reasons but what's the point of generating music with AI? There's already an overabundance of library music available and AI (in its current state) can't really generate new trends. There's no intelligence in AI. It's all a regurgitation of the data used to train the models.

At some point we will reach AGI (artificial general intelligence) which is really what people imagine when thinking of AI. I have no doubt this will happen but I doubt it will be during our lifetime. And whenever it happens the amount of CPU power and electricity needed to run it will be staggering. We won't use it to ask it to generate "tropical techno beat with a happy japanese singer".

My wife and I have been using ChatGPT this past week and it's just unreal.
Yeah I'm sure we will see AI flourish for very focused tools that will help humans do their job.

Stuff like AI mask selections for Photoshop, assisted translation, word suggestions like Gmail, Spectralayers, etc.

My wife is a translator and probably half her job these days is correcting AI translations which are still notoriously bad. We're still very far away from having an AI be as good as a human for translations. Again, we're like 60% there, but getting to 100% is going to be very difficult.
 
Music may return to the days before recording, when musicians' only way to earn a living was live performance. It pretty much already happened for most bands; the days of a Steely Dan making it on records alone are long gone. It is ticket and merch sales that pay the bills now.
 
At some point we will reach AGI (artificial general intelligence) which is really what people imagine when thinking of AI. I have no doubt this will happen but I doubt it will be during our lifetime. And whenever it happens the amount of CPU power and electricity needed to run it will be staggering. We won't use it to ask it to generate "tropical techno beat with a happy japanese singer".
Putting aside all compute and energy considerations aside, AGI requires being able to interact, change and measure the environment, which in turns requires a coupling between the AGI and the environment (so that it can be sampled). At least with the current forms of compute known to humanity, this won't happen.
 
Putting aside all compute and energy considerations aside, AGI requires being able to interact, change and measure the environment, which in turns requires a coupling between the AGI and the environment (so that it can be sampled). At least with the current forms of compute known to humanity, this won't happen.
What about the internet?

Hoomans are very diligent in filling it up with everything that's happening.
 
**And on the subject of royalties... since these things are learning from datasets, is it actually infringement? I can also learn from datasets, and as long as I am not duplicating someone elses IP exactly, then it that infringement? Are influences infringement, or at least some type of infringement that we all accept somehow?

As a guitarist pretty much every lick I play came from someone else, or was influenced thusly. Do I owe Eric Johnson a bunch of royalties if I ever have a successful commercial release because I was influenced by him? (Eric and his lawyer can rest assured, that will never happen. lol).

I think people assume that these things are using little fragments of shredded paper to reconstruct things seemingly from scratch, but that is not how this works. This is a form of learning, not a series of xeroxed carbon copies being cleverly stiched together. That would be infringement, but this is not the case. It's sticky legal territory because these things are actually employing a form of thinking and learning in a very real sense, and are not simply copying and pasting the way most people seem to think.
It is a mechanical process to produce a result. It is using your intellectual property (potentially) to specify the rules of that mechanical process.

So on the one hand, you are here saying it's really learning, like we do and on the other, @Olympum is invoking metaphysical difference between we and the machines to say it's not. New stuff befuddles our categories.
 
Not that impressive to me, tbh. Maybe the random selections I made of the examples just yielded not so great exemplars and there are better ones in the set. But in the ones I did listen to, I don't hear a qualitative leap from the rather mediocre examples available with other music generation AIs.
 
My wife is a translator and probably half her job these days is correcting AI translations which are still notoriously bad. We're still very far away from having an AI be as good as a human for translations. Again, we're like 60% there, but getting to 100% is going to be very difficult.

I encourage you (or your wife) to try ChatGPT for translations. I speak fluent Spanish (as does my wife) and ChatGPT is flawless. And I mean utterly FLAWLESS in translating anything to Spanish. But not only that... it can take an entire blog post in English, summarize it in Spanish (at any length you desire!) and nail all the important bits in the summary, in prefect Spanish.

If you haven't used this stuff yet and you're just postulating based on old experience and/or what you read months or years ago (and I was there in the same camp until last week), then you owe it to yourself to try it out and really understand just how over it all is. Over. Microsoft just put billions more into OpenAI and the next version of Word will have ChatGPT built in. That going to upend a lot of people's livelihoods.

As I was using it the first night, I asked it to write music in midi. It responded that it can't yet do that, but kindly offered me a summary of what midi music is. As amazing it was that night when I was throwing all kinds of stuff at it to do, I could easily see how simple it would be for it to write cues for me based on my given template with the particular libraries all loaded (having been trained on the libraries I have ahead of time, or course). If it can write and debug nearly any kind of coded language (and it does that already!) then it can easily write midi cues. It was like looking through the looking glass!

Makes the concept of look ahead and modeling libraries seem quaint and cute.
 
Last edited:
Music may return to the days before recording, when musicians' only way to earn a living was live performance. It pretty much already happened for most bands; the days of a Steely Dan making it on records alone are long gone. It is ticket and merch sales that pay the bills now.
Yup! "Live events" is where the monies at!
Even with event photography such as weddings can be quite lucrative.
 
This is so weird. It's like it's trying to figure out music but doesn't know what music is yet.
I'd be curious to find out how it's created. So far it seems like it's just drawing from a database of vocals loops, instrument loops, drum loops single shot drum machines and throwing it together and time stretching pitches shifting the loops so it all fits in tempo and key.

I wonder if this will become kind of an art form in its own right.
 
I think one of the keygame changers is not the writing of the music, it's the method of audio synthesis. It's not using "instruments" as synths or samples that it is triggering. It's literally creating the sounds from scratch in real time.

Think what this means... Instead of a massive inflexible sample library, instead the model knows, maybe, every recording of a solo violin ever made. And we give the model some notation of the melody we want, and say, please play this in the style of X violinist playing pop music, on Y violin, at Abbey road studio 2. And it will conjour up a performance of the melody, as good as a session violinist at your beck and call.
 
I think one of the keygame changers is not the writing of the music, it's the method of audio synthesis. It's not using "instruments" as synths or samples that it is triggering. It's literally creating the sounds from scratch in real time.
Is it really? Sounds like time stretched samples to me. I wondered if some of the vocals were synth created. They sound like Vocaloid.
 
I know these are rather early experiments, and such..
but major tech companies are getting awake, and start investing into ai, for all kinds of puposes, which means faster and faster improvements finding daylight and applications...

so....


i think.. we can guess this:

a few years later from now.....

Every professional musician/composer/producer loses projects and thus money because of ai music.

another few years later....

just a tiny hand full remain working.. the rest changed jobs.. (live performances will be done with ai characters and music, through holographics/vr.)


if nobody regulates this.. and keep human effort the most important part (ai should only help humans till so far, and not any further.. aka not being as good as).. above scenario is reality in say 5 to 10 years from now, mark my words.
 
Last edited:
Is it really? Sounds like time stretched samples to me. I wondered if some of the vocals were synth created. They sound like Vocaloid.
Yep. It's like the image AI tools. They're not cut and pasting other images together to make a collage, they are starting from Gaussian noise, and the picture coalesces in iterations to eventually become the final high fidelity image. Audio and images, it's all just 1s and zeros in patterns. The model just has to be able to associate the request with the patterns, it needs a model trained in that particular way.

So there could be models that specialise in melody writing, another one in orchestral arranging that spots out midi/notation ("orchestrate and arrange this melody and harmony sketch for a 60 piece string band in the style of John Williams adventure film scores")

And models that specialise in producing the audio.
 
Top Bottom