For Benoit CarrĂ©, the future revealed itself in six notes. In 2015, CarrĂ©, a cerebral, bespectacled songwriter then in his mid-forties, became the artist-in-residence at Sonyâs Paris-based Computer Science Laboratory, headed by his friend Francois Paçhet, a composer and leading artificial-intelligence researcher. Paçhet was developing some of the worldâs most advanced AI-music composition tools, and wanted to put them to use. The duoâs first released project was the mostly AI-composed Beatles pastiche âDaddyâs Car,â which ended up making worldwide headlines in 2016 as a technological milestone. But CarrĂ© was looking for something deeper, something new.
âIâve always been interested in music with unexpected chord changes, unexpected melodies,â CarrĂ© says. âIâve always searched for that kind of surprise in my work, too. And that means that I need to lose control at some point.â Shortly after he finished work on âDaddyâs Car,â CarrĂ© sat in the lab one day and fed the sheet music for 470 different jazz standards into artificial-intelligence software called Flow Machines. As it started generating new compositions based on that input, one short melody transfixed CarrĂ©, staying in his head for days. That sequence of notes became the core of a genuinely odd, novel, haunting song, âBallad of the Shadow,â and CarrĂ© decided the Flow Machines AI had fused with him into a new artist â a composite he named Skygge. The song became a track on the first-ever AI-composed album, 2018âs Hello World (credited to Skygge), which received respectful press, but didnât make it past the arty fringes of pop culture.
Five years later, AI is the hottest topic in music, largely thanks to a much bigger song, and a very different use of AI. In 2022, a group of researchers began work on an open-source tool formally known as SoftVC VITS Singing Voice Conversion. Building on more primitive software that allowed users to generate raps in, say, Eminemâs voice by typing words into an interface, SVC allowed for transformation of one voice into another, to increasingly convincing effect. Cover songs using the tool were floating around TikTok by early 2023. But every new musical technology needs a breakthrough hit, and for AI voice-cloning, that arrived on April 4, 2023.
From the beginning, âHeart on My Sleeve,â a song by the anonymous songwriter Ghostwriter-977, featuring synthetic Drake and the Weeknd vocals, was misunderstood and overhyped. One of the tweets that helped propel its virality described the song as âAI-generated,â a wild mischaracterization that major news outlets helped spread. The rise of ChatGPT and image-generating tools like MidJourney seemingly led to the delusion that AI already had limitless capabilities, that perhaps Ghostwriter977 had typed a description of a Drake-Weeknd collab into some magical tool that spat out a finished song.
https://www.youtube.com/watch?v=7HZ2ie2ErFI
That tool doesnât exist, and experts agree that âHeart on My Sleeveâ was written, produced, and sung in its entirety by a human, with the only AI intervention arriving via the transformation of the vocals into Drakeâs and the Weekndâs voices. Generative AI may be able to produce convincing C+-level college essays and movie-poster-style artwork without human assistance, but full, convincing pop songs with vocals and lyrics? Still not possible, largely because of the sheer number of subtle intricacies in a piece of recorded music, from the underlying composition to vocal inflections down to, say, the reverb tail on the snare drum. âIt is daunting,â says Paçhet, whose AI-music research dates back to the Nineties. âItâs incredible, the level of complexity.â
Many in the business found âHeart on My Sleeveâ unimpressive â the Weeknd segments arenât even convincing fakes â or even racist in its casual thievery of Black artistsâ likenesses. But the opening faux-Drake verse was catchy enough, in a If Youâre Reading This Itâs Too Late vein, to cause a sensation as a TikTok snippet. The song garnered more than half a million streams before Universal Music managed to take it down, using a technicality. Ghostwriter977 foolishly used one of Metro Boominâs producer tags (a copyright-protected snippet of Future rapping âIf young Metro donât trust youââŠâ), which prevented the label from having to wade into the fraught and still-unsettled territory of whether the mere sound of a singerâs voice is copyrighted. (Ghostwriter977, who claimed to be an undercompensated professional songwriter, has since gone quiet, and after some initial interest in giving an interview for this article, stopped replying to emails.)
Since then, the music industryâs AI conversation has been almost entirely focused on voice-cloning, overshadowing, for now, the considerable threat and promise of other forms of machine-made music. Even the May release to beta testers of MusicLM, a shockingly sophisticated, though still rickety, tool by Google, went largely unnoticed. MusicLM generates audio files of instrumental musical snippets from written descriptions, like music criticism in reverse, or a baby version of the fictional song-generating tool people imagined in the hands of Ghostwriter977.
Meanwhile, âAIâ has become an all-purpose, oft-confusing buzzword in music: When Paul McCartney announced in June that a final, âFree As A Birdâ-style Beatles song was coming, using neural-net-powered software developed by Peter Jacksonâs WETA to isolate John Lennonâs voice from a demo, countless news outlets ran controversy-stirring headlines about an âAIâ song from the band. But versions of the same track-isolating technology have been around for years â Serato features a version of it with its latest DJ software, apps like Moises offer it, and Giles Martin used it on his Revolver remix last year â and itâs worlds away from generative AI.
Some see a world of possibilities and concern in voice-cloning technology alone, but CarrĂ© and Paçhet arenât so sure. âItâs fun to make a song and to transfer your voice into Eminemâs timbre,â CarrĂ© says. âBut for musicians, for creation, I donât see what is interesting here.â
IT DIDNâT TAKE long for the amateur producers playing with voice-cloning to stumble upon one of its most enticing, if unnerving, uses: the resurrection of the dead. The unauthorized nature of the tech inevitably led users toward musical blasphemy, whether it was making the Notorious B.I.G. diss himself by taking over the vocals of Tupac Shakurâs âHit Emâ Up,â or the aesthetic crime of forcing Kurt Cobain to sing songs by Seether. And then thereâs Jeff Buckley singing Lana Del Rey, which turns out to be oddly compelling. No one seems to have tried doing much yet with the voice of Prince, who, before his death, called pre-AI posthumous manipulations âdemonic.â
Joel Weinshanker, managing partner of Elvis Presley Enterprises, sees a business there. Heâs not particularly interested in commissioning new AI-assisted Presley songs for commercial release, believing the result would lack any real soul. But if tracks along those lines do emerge, heâs more interested in finding a way to monetize them than in shutting them down. And what does excite him is something more prosaic: giving fans the ability to sing Elvis songs karaoke-style â with Elvisâ own voice. âWeâre gonna be first in line for when the technology is there, and when the system is in place to compensate rights holders,â he says. âEvery licensed karaoke machine is paying the intellectual-property owners. All this really is, is karaoke times 1,000.â
So far, only one major artist, living or dead, is allowing unlimited use of digital clones of her voice: Grimes. Following in the footsteps of the pioneering AI artist Holly Herndon, who has been offering up her voice for cloning since 2021, Grimes announced in April she would allow anyone to record songs with her voice, and split any revenue 50-50 with them. Her system, which has a dedicated website (Elf.tech), has led to several solid songs, but as even her own manager, Daouda Leonard, acknowledges, Grimes is in less danger than most big-name artists of compromising commercial viability or flooding the market â sheâs never had a radio hit. âYou donât hear Grimes that much anyway,â Leonard says. (CarrĂ© gave in to the trend, recording a remake of the Skygge song âOcean Noirâ with Grimesâ voice â her first French-language release.)
Grimes is also casually espousing some radical positions, including a belief that copyright, in general, shouldnât exist. Thatâs going too far even for some of the most adventurous AI-music enterprises, including the startup Uberduck, which at press time dared to give users easy access to models of famous voices from Ariana Grande to Drake to Justin Bieber â and ran a contest with a $10,000 prize for the best use of Grimesâ voice. âWe are pursuing avenues of partnership with basically every major label,â says Uberduckâs 31-year-old founder, Zach Wener, âand we have no interest whatsoever in being the Napster of this movement. Iâm only willing to go forth in this world if we can find a path amenable to, and profitable for, artists.â Uberduck has already received some legal threats, he acknowledges, and âwe comply with all takedown notices.â
Universal Music Group has described AI music as âfraudâ and said that it will âharm artists,â but for many in the industry, at least outside of major-label boardrooms, the idea of squashing voice-cloning seems as impossible as it was to shut down all of the file-sharing services of the early aughts. Instead, they envision a system of monitoring and monetization, ideally with the ability for artists to opt out. Rohan Paul, CEO of the startup Controlla, is one of many innovators working on algorithms to help labels track down, say, a Drake-voiced song that wasnât labeled as such.
âItâll be similar to the Napster era, where it gets to a point that we donât have a choice,â says Paul. âItâs unfortunate. But I think five years from now, itâll be so easy for anyone to steal someone elseâs voice that there has to be a way to make it open season and monetize it. So, I think, weâll see something like voice royalties.â Thereâs talk of compulsory licensing for voices, the same way anyone can cover a song without getting the writerâs permission. But itâs hard to see how that would play out in cases like the version of âAmerica Has a Problemâ floating around in which someone made an AI Ariana Grande sing a racial slur. Even Grimes has reserved the right to tank offensive content.
On the other hand, the lack of a viral follow-up to âHeart on My Sleeveâ in the weeks that followed suggests that â as long as human talent is still needed to write songs â the threat of voice-cloning couldâve been exaggerated in a post-Ghostwriter977 panic. âI think people think itâs going to open door for random people at home to write songs and get them out,â says music manager Trevor Patterson, whose clients include JetsonMade. âAnd some may slip through the cracks. But if it becomes a thing, itâs gonna be the same songwriters who are dominating now, just using new tools to expand their craft.â
Thereâs a world where voice-cloned songs could stay mostly underground, as a fun, non-monetizable gimmick that could benefit from the hands-off approach labels had toward rap mixtapes. âEssentially, weâre giving the world the ability to create mixtapes,â says copyright attorney Ateara Garrison. Or maybe itâs even more faddish. âVoice substitution is likely going to play its course a little bit,â says Uberduckâs Wener. âI donât think itâll ever go away. But like mash-ups, itâll probably have its heyday and then get old. But, you know, mash-ups still exist, right?â
Still, AI is moving quickly, and the most consequential use cases for voice-cloning may have not yet emerged. Already, online amateurs are replacing Paul McCartneyâs more weathered older vocals on his latter-day songs with clones of his Beatles-era voice, with promising results. Tech-savvy fans have given the same rejuvenating treatment to recordings of Axl Roseâs recent live performances with Guns Nâ Roses. Right now, thereâs a slight delay in even the best processing scenario, but once that goes away, artists could start using AI clones of their own voices live onstage, as a sort of supercharged AutoTune.
In the studio, they could use Melodyne to tune their voice up to notes they canât quite hit, and then bring in the vocal clone to smooth it out. Producers could fill in a missing phrase or two on days artists leave a bit too early, and songwriters are already talking about pitching songs to artists using their own voices. And who would really notice if producers buried layers of Brian Wilsonâs and Marvin Gayeâs voices deep in their stacks of backing vocals?
There are even wilder possibilities, as Rob Abelow, founder of the consultancy firm Where Musicâs Going, suggests: âWhat if, instead of creating a deepfake of somebody, they make the most beautiful synthetic voice from a combination of all these others?â
WELL BEFORE ANY flood of AI songs, the major labels saw reason to fear being drowned out. Back in January, Universal Music CEO Lucian Grainge noted that streaming services were seeing some 100,000 uploads a day, and that âconsumers are increasingly being guided by algorithms to lower-quality functional content that in some cases can barely pass for âmusic.âââ Anyone whoâs heard the generic fodder on something like Spotifyâs popular âPeaceful Pianoâ playlist â often music that Spotify conveniently doesnât have to pay royalties for â would have to agree.
Humans, again, have already done a fine job of pumping out âlower-qualityâ content without AI assistance, but thereâs one company that even some AI proponents have eyed with suspicion. Boomy, which lets users quickly and easily make simple songs on their phones with AI tools and upload them to streaming services, boasts on its website that its users have recorded â15,477,480 songs, around 14.7 percent of the worldâs recorded music.â In May, Spotify pulled down some Boomy songs and temporarily barred the company from posting new songs, citing evidence of suspicious, artificial âstreaming farmâ-type activity. For some, that raised the specter of a dystopian future where computers streamed music made by computers, a perfect circle of inhumanity.
But Boomyâs CEO, Alex Mitchell, insists his company had nothing to do with the irregular streaming, notes that thereâs a âheartbeatâ â a human user â âbehind every Boomy song,â and clarifies that only a âsingle-digit percentageâ of those 15 million songs have made it to streaming services, since the company screens for quality. Still, he says, âwhether anybody likes it or not, or whether Boomy is in this market or not, you have an inevitability that there will be millions of musicians and hundreds of millions, if not billions, of songs per day. More people are going to be making more music with AI tools as the barriers to entry to create and participate in the market have gone down dramatically.â
Boomyâs product, in any case, wasnât exactly what Grainge was referring to. His concern was largely over âfunctional musicâ â sound meant for a purpose, often relaxation or studying. In its purest form, this content does transcend music altogether: âClean white noise â Loopable with no fade,â from the album Best White Noise for Baby Sleep, has more than a billion plays on Spotify. The most sophisticated iteration of functional music may well be from Endel, which uses generative AI to spawn endless musical beds that the company says are scientifically designed to aid sleep, exercise, and other human necessities (thereâs no bathroom or sex settings yet).
Endel has also worked with artists including James Blake to make versions of their music that self-generate via AI into functional soundscapes, and the companyâs co-founder and CEO, Oleg Stavitsky, sees that technology as a way for labels to reclaim market share. âLucienâs comment was, essentially, weâre losing the war to white noise,â Stavitsky says. âNaturally, labelsâ market share is shrinking, because more and more people are turning to these types of sounds. Part of the problem is, yes, the DSPs are diverting listener attention to this type of content, because the underlying economics for them is better. But labels [also] donât have enough functional content. So my whole message to UMG is âYou can leverage generative AI today, to turn your catalog to create mass-produced, functional soundscape versions of your existing catalog. You can win back market share using generative AI today.âââ
The prospect of reversing declining market share turned out to be irresistible even to an AI-averse company: Within two weeks of Stavitskyâs interview with Rolling Stone, Universal announced a deal with Endel to âenhance listenersâ wellnessâ with AI-derived soundscapes built from the companyâs catalog.
IN EARLY 2023, EDM veteran David Guetta stood in front of a vast festival crowd and debuted a new song snippet, featuring an unwitting collaborator. Over thudding synth bass, Eminemâs voice blared, spitting dopey lyrics impossible to imagine from the artist who once rapped that ânobody listens to technoâ: âThis is the future rave sound/Iâm getting awesome and underground.â The crowd loved it.
But heâs deeply excited about voice-cloning, and AI in general. âA lot of people are kind of freaking out,â Guetta says. âI just see this as another tool for us to make better records, make better demos.â He agrees with Boomyâs Mitchell that AI will continue the decades-long, computer-driven democratization of music-making. âBut if you have terrible taste, your music is still gonna be terrible, even with AI,â he adds. âYou can use the voice of Drake and the Weeknd, and Michael [Jackson], and Prince at the same time. If your song sucks, itâs still going to be a bad song.â
Guetta thinks AI-fueled songwriting is inevitable, especially because he sees songwriting as simply the result of musicians rearranging the corpus of music theyâve encountered in their lives. âWe all do what weâve learned,â he says. âThe difference is that AI is going to be able to learn everything. So, of course, the AI is going to win in the end. One day, I hope, youâre gonna say, âI want to make a soul record,â and AI will have all the soul chord progressions in history, and with the exact percentage of the ones that have been the most successful, and the key that is the most favorable for this chord progression. You cannot fight with this. It is impossible. So again, I think that more and more is going to be about taste, and not only technical abilities.â
As it happens, AI researcher Paçhet directly rejects that view. âThis argument of the finiteness of the vocabulary of music was put forward by the Vienna school in the beginning of the 20th century,â he says. âIn 1910, they said, âYou know, tonal music is dead, because everything has been done.â That was before jazz, before the Beatles, before AntĂŽnio Carlos Jobim, before everything that happened after that.â
Paçhetâs AI-music work is now in its fourth decade, most recently at Spotify, where he worked on music-making tools that may never see public release. Despite dedicating his life to the technology, he canât help feeling disappointed so far. Like his friend CarrĂ©, what he wants from AI is something truly novel: âIf you canât say, âOh, my God, how could they do that?â you miss the point. You need new technology that creates something where people donât understand how you could do this without some kind of magic. And I donât think itâs the case today.â
THE CLOSEST THING to magic in AI music right now is a tool created by one of the worldâs biggest tech companies. Google announced the existence of MusicLM, which attempts to do for music what ChatGPT does for text, back in January. The accompanying white paper said Google had âno plansâ to release it to the public, citing, in part, concerns that one out of every 100 pieces of music it generated could be traced to copyrighted sources. (Universal has expressed alarm about the prospect of having its song catalog scraped for AI training, suggesting massive legal battles to come.) But as public interest exploded in AI this year, and Google began to look like it had fallen behind ChatGPT creator OpenAI, it began to move with less caution. In May, Google released MusicLM in a beta version, open to test users only. (OpenAI released its own music-generation tool, Jukebox, in 2020, but it only runs locally and reportedly takes 12 hours to do what MusicLM does in seconds.)
In its current state, MusicLM is astonishing and awful, sometimes both at once. (Google declined an interview on the software.) The mere fact of what itâs doing â generating fresh, high-fidelity audio files from thin air â is awe-inspiring, even when it sounds terrible. Ask it to approximate 1930s Delta blues or British Invasion rock and it will give you something fractured and bizarre, as if sourced from vinyl thatâs not just warped but possibly melted on a stove. But itâs decent at generating generic trap beats, and nearly spectacular at conjuring funky blasts of retro-futuristic dance music in the vein of Daft Punk. That said, if you actually use the words Daft Punk â or any other artist name â in your prompt, it gives you an âOops, canât generate audio for thatâ message, almost certainly an attempt at copyright protection.
And then there are the voices. In its released version, MusicLM doesnât generate prominent lead vocals, and nothing like lyrics. But on many clips, even when you donât request them, ghostly vocals are beginning to emerge. In one case, a frontman who never lived seems to count off a nonexistent band. Other times, you can hear the whispers gathering themselves into something louder, more melodic. Somewhere deep in some distant server farm, something incalculably smart and eerily powerful is readying itself to sing.
_____
About the illustration: For this yearâs Future of Music issue, âRolling Stoneâ collaborated with BRIA AI â a revolutionary startup in the visual generative artificial-intelligence space that provides a fully licensed repository that compensates artists â and illustrator Tomer Hanuka. In tandem with generative AI, Hanuka designed an almost 100 percent synthetic image of Elvis Presley for âHit Machines? The Messy Rise of AI Music.â The goal was to reimagine the King as a cyborg. âYouâre praying for these happy accidents,â Hanuka says of the unusual and unpredictable process of working with a robot. But Hanuka learned to lean into the weirdness that AI came up with, like giving the cyborg Elvis three faces. As for the ethical and moral dilemma that comes with visual generative AI, a legally sound solution like BRIA AI gives Hanuka peace of mind that heâs not infringing on other artists and their work. âI donât think [AI] is going to replace artists,â Hanuka makes clear. âI think itâs going to enhance them.â After all, it took several rounds of hand-drawn sketches and back-and-forth discussions about the desired atmosphere to create the Elvis illustration. âCreativity, visualizing, conceptualizing, and having an intent,â Hanuka says, âis still something that humans have.â âMaya Georgi
From Rolling Stone US