A ChatGPT for Music Is Here. Inside Suno, the Startup Changing Everything

By wazup on March 17, 2024 Comments Closed / 167 views

I
’m just a soul trapped in this circuitry.” The voice singing those lyrics is raw and plaintive, dipping into blue notes. A lone acoustic guitar chugs behind it, punctuating the vocal phrases with tasteful runs. But there’s no human behind the voice, no hands on that guitar. There is, in fact, no guitar. In the space of 15 seconds, this credible, even moving, blues song was generated by the latest AI model from a startup named Suno. All it took to summon it from the void was a simple text prompt: “solo acoustic Mississippi Delta blues about a sad AI.” To be maximally precise, the song is the work of two AI models in collaboration: Suno’s model creates all the music itself, while calling on OpenAI’s ChatGPT to generate the lyrics and even a title: “Soul of the Machine.”

Online, Suno’s creations are starting to generate reactions like “How the fuck is this real?” As this particular track plays over a Sonos speaker in a conference room in Suno’s temporary headquarters, steps away from the Harvard campus in Cambridge, Massachusetts, even some of the people behind the technology are ever-so-slightly unnerved. There’s some nervous laughter, alongside murmurs of “Holy shit” and “Oh, boy.” It’s mid-February, and we’re playing with their new model, V3, which is still a couple of weeks from public release. In this case, it took only three tries to get that startling result. The first two were decent, but a simple tweak to my prompt — co-founder Keenan Freyberg suggested adding the word “Mississippi” — resulted in something far more uncanny.

Over the past year alone, generative AI has made major strides in producing credible text, images (via services like Midjourney), and even video, particularly with OpenAI’s new Sora tool. But audio, and music in particular, has lagged. Suno appears to be cracking the code to AI music, and its founders’ ambitions are nearly limitless — they imagine a world of wildly democratized music making. The most vocal of the co-founders, Mikey Shulman, a boyishly charming, backpack-toting 37-year-old with a Harvard Ph.D. in physics, envisions a billion people worldwide paying 10 bucks a month to create songs with Suno. The fact that music listeners so vastly outnumber music-makers at the moment is “so lopsided,” he argues, seeing Suno as poised to fix that perceived imbalance.

Editor’s picks

Most AI-generated art so far is, at best, kitsch, à la the hyperrealistic sci-fi junk, heavy on form-fitting spacesuits, that so many Midjourney users seem intent on generating. But “Soul of the Machine” feels like something different — the most powerful and unsettling AI creation I’ve encountered in any medium. Its very existence feels like a fissure in reality, at once awe-inspiring and vaguely unholy, and I keep thinking of the Arthur C. Clarke quote that seems made for the generative-AI era: “Any sufficiently advanced technology is indistinguishable from magic.” A few weeks after returning from Cambridge, I send the song off to Living Colour guitarist Vernon Reid, who’s been outspoken about the perils and possibilities of AI music. He notes his “wonder, shock, horror” at the song’s “disturbing verisimilitude.” “The long-running dystopian ideal of separating difficult, messy, undesirable, and despised humanity from its creative output is at hand,” he writes, pointing out the problematic nature of an AI singing the blues, “an African American idiom, deeply tied to historical human trauma, and enslavement.”

Suno is barely two years old. Co-founders Shulman, Freyberg, Georg Kucsko, and Martin Camacho, all machine-learning experts, worked together until 2022 at another Cambridge company, Kensho Technologies, which focused on finding AI solutions to complex business problems. Shulman and Camacho are both musicians who used to jam together in their Kensho days. At Kensho, the foursome worked on a transcription technology for capturing public companies’ earnings calls, a tricky task given the combination of poor audio quality, abundant jargon, and various accents.

Along the way, Shulman and his colleagues fell in love with the unexplored possibilities of AI audio. In AI research, he says, “audio in general is so far behind images and text. There’s so much that we learn from the text community and how these models work and how they scale.”

The same interests could have led Suno’s founders to a very different place. Though they always intended to end up with a music product, their earliest brainstorming included an idea for a hearing aid and even the possibility of finding malfunctioning machinery through audio analysis. Instead, their first release was a text-to-speech program called Bark. When they surveyed early Bark users, it became clear that what they really wanted was a music generator. “So we started to run some initial experiments, and they seemed promising,” Shulman says.

Suno uses the same general approach as large language models like ChatGPT, which break down human language into discrete segments known as tokens, absorb its millions of usages, styles, and structures, and then reconstruct it on demand. But audio, particularly music, is almost unfathomably more complex, which is why, just last year, AI-music experts told Rolling Stone that a service as capable as Suno’s might take years to arrive. “Audio is not a discrete thing like words,” Shulman says. “It’s a wave. It’s a continuous signal.” High-quality audio’s sampling rate is generally 44khz or 48hz, which means “48,000 tokens a second,” he adds. “That’s a big problem, right? And so you need to figure out how to kind of smoosh that down to something more reasonable.” How, though? “A lot of work, a lot of heuristics, a lot of other kinds of tricks and models and stuff like that. I don’t think we’re anywhere close to done.” Eventually, Suno wants to find alternatives to the text-to-music interface, adding more advanced and intuitive inputs — generating songs based on users’ own singing is one idea.

OpenAI faces multiple lawsuits over ChatGPT’s use of books, news articles, and other copyrighted material in its vast corpus of training data. Suno’s founders decline to reveal details of just what data they’re shoveling into their own model, other than the fact that its ability to generate convincing human vocals comes in part because it’s learning from recordings of speech, in addition to music. “Naked speech will help you learn the characteristics of human voice that are difficult,” Shulman says.

One of Suno’s earliest investors is Antonio Rodriguez, a partner at the venture-capital firm Matrix. Rodriguez had only funded one previous music venture, the music-categorization firm EchoNest, which was purchased by Spotify to fuel its algorithm. With Suno, Rodriguez got involved before it was even clear what the product would be. “I backed the team,” says Rodriguez, who exudes the confidence of a man who’s made more than his share of successful bets. “I’d known the team, and I’d especially known Mikey, and so I would have backed him to do almost anything that was legal. He’s that creative.”

We’re trying to get a billion people much more engaged with music than they are now. We’re not trying to replace artists.

Rodriguez is investing in Suno with the full knowledge that music labels and publishers could sue, which he sees as “the risk we had to underwrite when we invested in the company, because we’re the fat wallet that will get sued right behind these guys.… Honestly, if we had deals with labels when this company got started, I probably wouldn’t have invested in it. I think that they needed to make this product without the constraints.” (A spokesperson for Universal Music Group, which has taken an aggressive stance on AI, didn’t return a request for comment.)

Suno says it’s in communication with the major labels, and professes respect for artists and intellectual property — its tool won’t allow you to request any specific artists’ styles in your prompts, and doesn’t use real artists’ voices. Many Suno employees are musicians; there’s a piano and guitars on hand in the office, and framed images of classical composers on the walls. The founders evince none of the open hostility to the music business that characterized, say, Napster before the lawsuits that destroyed it. “It doesn’t mean we’re not going to get sued, by the way,” Rodriguez adds. “It just means that we’re not going to have, like, a fuck-the-police kind of attitude.”

Rodriguez sees Suno as a radically capable and easy-to-use musical instrument, and believes it could bring music making to everyone much the way camera phones and Instagram democratized photography. The idea, he says, is to once again “move the bar on the number of people that are allowed to be creators of stuff as opposed to consumers of stuff on the internet.” He and the founders dare to suggest that Suno could attract a user base bigger than Spotify’s. If that prospect is hard to get your head around, that’s a good thing, Rodriguez says: It only means it’s “seemingly stupid” in the exact way that tends to attract him as an investor. “All of our great companies have that combination of excellent talent,” he says, “and then something that just seems stupid until it’s so obvious that it’s not stupid.”

Well before Suno’s arrival, musicians, producers, and songwriters were vocally concerned about AI’s business-shaking potential. “Music, as made by humans driven by extraordinary circumstances … those who have suffered and struggled to advance their craft, will have to contend with the wholesale automation of the very dear-bought art they have fought to achieve,” Reid writes. But Suno’s founders claim there’s little to fear, using the metaphor that people still read despite having the ability to write. “The way we think about this is we’re trying to get a billion people much more engaged with music than they are now,” Shulman says. “If people are much more into music, much more focused on creating, developing much more distinct tastes, this is obviously good for artists. The vision that we have of the future of music is one where it’s artist-friendly. We’re not trying to replace artists.”

Though Suno is hyperfocused only on reaching music fans who want to create songs for fun, it could still end up causing significant disruption along the way. In the short term, the segment of the market for human creators that seems most directly endangered is a lucrative one: songs created for ads and even TV shows. Lucas Keller, founder of the management firm Milk and Honey, notes that the market for placing well-known songs will remain unaffected. “But in terms of the rest of it, yeah, it could definitely put a dent in their business,” he says. “I think that ultimately, it allows a lot of ad agencies, film studios, networks, etc., to not have to go license stuff.”

In the absence of strict rules against AI-created content, there’s also the prospect of a world where users of models like Suno’s flood streaming services with their robo-creations by the millions. “Spotify may one day say ‘You can’t do that,’” Shulman says, noting that so far Suno users seem more interested in just texting their songs to a few friends.

Suno only has 12 or so employees right now, but they plan to expand, with a much larger permanent headquarters under construction on the top floor of the same building as their current temporary office. As we tour the still-unfinished floor, Schulman shows off an area that will become a full recording studio. Given what Suno can do, though, why do they even need it? “It’s mostly a listening room,” he acknowledges. “We want a good acoustic environment. But we all also enjoy making music — without AI.”

Suno’s biggest potential competitor so far seems to be Google’s Dream Track, which has obtained licenses that allow users to make their own songs using famous voices like Charlie Puth’s via a similar prompt-based interface. But Dream Track has only been released to a tiny test user base, and the samples released so far aren’t nearly as impressive-sounding as Suno’s, despite the famous voices attached. “I just don’t think that, like, making new Billy Joel songs is how people want to interact with music with the help of AI in the future,” Shulman says. “If I think about how we actually want people doing music in five years, it’s stuff that doesn’t exist. It’s the stuff that’s in their head.”

music