Voicemod tools up with $14.5M to ride the generative AI (sonic)boom


woman speaking into a microphone in a recording studio
Image Credits: Nicola Katie (opens in a new window) / Getty Images

The first thing we ask Voicemod‘s CEO and co-founder, Jamie Bosch, when he picks up the phone to talk about a new funding round is not something we’re accustomed to asking — but our question may become the norm in the generative AI future that’s fast-flying at us: Is this your real voice?

Bosch’s startup has been fiddling with audio effects for almost a decade, playing in the field of digital signal processing (DSP) — where its early focus was on creating fun ‘sound emoji’ effects and reactions for gamers to spice up their voice chats. And gamers do remain its main user-base (for now). But the audio field is being charged by developments in AI — which Voicemod’s team is hoping will lead to whole new use-cases and many more users for its tools.

So where DSP technology was about applying effects to a person’s (real) voice, developments in artificial intelligence are enabling startups like Voicemod to offer tools to create entirely synthesized (unreal) voices. And even the ability for users to ‘wear’ these voices in real-time — so they can speak with a voice that isn’t theirs. Think of it as the audio equivalent of a Snapchat lens or TikTok’s viral teenage filter or Reface’s celebrity face-swaps.

AI voice can even enable voice-shifting into another person’s (real) voice. And not just for talking about the weather or shooting the shit. But for what’s known as sing-to-sing voice conversion. Meaning you could get to sing in someone else’s voice — supercharging your karaoke game, say, by singing Bohemian Rhapsody as literally the voice of Freddie Mercury. And even switching between Mercury, May and Taylor, for the full mock opera effect if you have enough trained AI models (and microphones) on hand. Mamma-mia! 

Artificial intelligence makes all this possible — even if legal and ethical questions may create pause for thought about rushing to unleash real-time voice-shifting upon a world that still relies plenty upon fixed identities. (Banks pushing customers to record ‘a unique voiceprint’ to use as a password definitely need to sit up and start listening.)

Voicemod acquired another audio effects startup last year, called Voctro Labs, whose technology Bosch says it’s working to blend with its own to create an amped up hybrid platform. The combo has already allowed it to expand what it offers — launching a text-to-song feature last December which lets you turn your own lyrics into a vocal composition using generative AI. He tells us more is on the way — including the aforementioned sing-to-sing feature.

Voctro’s tech may be familiar as it was involved in the development of a voice clone of musician Holly Herndon which appeared in a viral Ted Talk last year — in which her AI voice could be heard duetting with another musician (Pher)’s real voice in real-time. Which, well, if you haven’t already seen it is quite the visual-audio spectacle, as well as being a mouthful to explain. It’s also a taster of what Voicemod has coming to a keyboard near you.

“We’re definitely going to launch more products and more ways for people to express themselves with the generative AI technology,” Bosch tells us. “Not all Voctro Labs’ technologies are related to music — but they have a lot of technology related to singing, from this text-to-song technology to sing-to-sing technology in real time. So we have a lot of new projects and new products of upcoming.

“We are going to strengthen our speech-to-speech AI real-time technology, because we are basically merging our technology with their technology. We’re basically creating an hybrid technology that will be better than ours — or there’s a mix of both… [So their sing-to-sing technology will be] combined with our DSP technology — that we could use to do autotune. So we could potentially help artists with their voice and on the tone. And so this is, this is gonna be really, really interesting.”

As well as providing direct-to-consumer/creator audio tools, it offers its technologies via SDK and APIs for third parties to integrate into their own products, from games and apps to hardware. So it’s set up to distribute its tech across the gamer-creator ecosystem and have demand come find it.

Generative AI-powered disruption in audio of course mirrors (in a non-exact fairground ‘crazy mirror’ kind of a way) developments we’re seeing happen elsewhere: Visually, to graphics and illustration, as a result of deep learning and the advent of prompt-based image generation interfaces (such as DALL-E and Stable Diffusion). Also to the written word, through the large language models that underpin generative AI chatbots like ChatGPT that can produce song lyrics or a whole essay on demand. And, indeed, in the case of musical composition — where Google recently showed off a prompt-based generative AI song composer which can apparently produce arrangements that match the musical vibe you describe (although it said it’s not releasing that particular generative AI model — but surely someone else will).

It’s clear that AI is bending the rules of what it’s possible for a single person to create. And, well, as with freedom, the open concept, this is both thrilling and terrifying. Because, it’s what you do with it that counts.

The coming years are going to be all about finding out what people do with such powerful AI tools at their fingertips.

Voicemod team photo
Image Credits: Voicemod

Voicemod is positioning itself to ride this wave by building a toolbox for creators to survive and thrive in a reality-bending future and across a range of use-cases — hence it’s talking in terms of sonic identity and voice avatars for the social metaverse (at the future-gaze-y end) but also just helping you sound your sparkling best on a work Zoom call. So a sort of audio make-up as it were. Apply as needed.

“Now suddenly everyone can become a creator,” predicts Bosch of the generative AI boon. “Everyone can come, basically, with no skill set. Or with no learnings on how to really craft those audios. They will be able to actually create those pieces of music. Songs. And this eventually evolves into into — probably — even voices. So the ability to create voices.”

“This could potentially be something really viral for platforms like TikTok, or YouTube Shorts or Instagram… And this could eventually evolve into things like karaoke, for example. And be, I don’t know, part of game consoles, or things like that, for people to use this to entertain. And, if we go a step further — and it’s the technology getting better and better as we think it will be — this could potentially be a professional tool for people who want to create music. Or for people who want to create voices for movies or voices for games characters.

“We have a strong belief in user-generated content, and we are building tools for our users to start creating sounds and creating voices. And we will be putting technology in the hands of the users to create those [sounds]. And, eventually in the future, hopefully, they will go even to a professional level.”

So while — currently — in order for the startup to synthesize a whole voice it does still involve a team of sound engineers and designers, Bosch suggests generative AI will put that power in the hands of the individual — and it’ll happen soon; “in the near future”.

“I don’t know if we’ll be prompting — now we’re in this wave of everything is done through prompts — I’m not sure if that will be the way or it will be more tools that will have AI technology embedded and we have user experiences that will make things a lot easier,” he adds. “But definitely what I see from generative AI in the audience but also in the management phase is that suddenly everyone’s can come become a creator, which I think is really interesting.”

The birth of AI voice may not sound like amazing news for the employment prospects of sound engineers and designers (albeit, tech advances may simply create new requirements that just shift where their expertise is needed). But Bosch reckons that voice actors, at least, will still have a key role to play — emoting for AI. Since robot voices aren’t good at getting the pitch and intonation, or indeed emotion, right. It’s a voice clone without a soul, basically. (Or as Nick Cave might put it, AI voice lacks ‘its own blood, its own struggle, its own suffering’ — it lacks humanness.)

“I think that you will always need a human factor in your sample with these voices,” suggests Bosch. “You could have the best voice — of even a famous person — but what really comes is the impression. You still need a human to do the cadence on the words. You still need a human to do the rhythm, the tone. So [it’s not just that] I can speak normally and I will sound like a famous person — no, you don’t — you still need to act a little bit. So… I think human factor for expression is key.”

Might generative AI not be able to be learn to emote as well, with the right human data-sets — and further dial up its mimickry so as to make us laugh or cry or love or hate on-demand too?

“Yeah. Well, we will see,” responds Bosch. “I’m not sure. I mean, as of today, for me AI is a tool to be used by humans. But yeah, we don’t know where this is going to evolve.”

Voicemod for Desktop
Voicemod for Desktop. Image Credits: Voicemod

Voicemod is gearing up for whatever phonic crazyiness lies ahead with a fresh tranche of funding. The 2014-founded startup has been revenue generating for years, via pro versions of its tools — its main product, Voicemod for Desktop, has had more than 40 million downloads to-date, while Bosch says it has 3.3 million monthly active users — but it’s just closed $14.5 million in expansion funding, following an $8M Series A back in summer 2020Madrid-based Kfund’s growth fund Leadwind, led the round, with participation from Minifund (Eros Resmini former CMO at Discord) and Bitkraft Ventures.

“We’re super excited by what generative AI can do to all creative industries and more specifically audio, especially when it comes to enhancing and augmenting the job that creative people already do,” Jamie Novoa, partner at Kfund, tells TechCrunch. “In the past few months there’s been an explosion in generative AI in general and more specifically in audio but we think this is a phenomenon that’s just starting.

“What many of the cool technologies being launched to market lack are concrete and scalable business models attached to them, and Voicemod differentiates itself from the pack by having built a product used by millions of people on a daily basis and with significant revenue traction. We’re super excited about what Jaime and the rest of the Voicemod team have in the pipeline and what’s to come.”

Voicemod says the extra funds will be used to enhance the development of its real-time AI voice identity capabilities — and dial up its proposition for Gen Z, gamers, content creators, and professionals of all skill levels wanting tools to help them express themselves vocally in digital spaces.

Per Bosch, part of the reason it’s taking more funding now relates to the acquisition of Voctro Labs. Beyond that, he says it’s about making the most of the opportunities sparking off the Cambrian explosion in generative AI tools.

“We are in the middle of tremendous revolution in AI,” he says. “We want to be well funding in order to be able to develop technology but also to be able to deliver technology to users. So I think one of our competitive advantages is that we already have the market and the traction and we basically are able to put this in the hands of the users. And I want to make sure to have enough runway, also due to market conditions, to be able to put all of this in place. So it will be mainly focused… on building the next generation AI technology and putting it in the hands of the users and also building these creation tools for the users to create content.”

The first new tool will be landing next month — with a launch of Voicemod’s desktop product on macOS (currently it’s PC only). The goal is to evolve into a multi-platform product spanning all devices. “We’re also working on a creation tool mobile app that hopefully will see the light towards the beginning of next quarter. And, and yeah, some more stuff to come, hopefully,” Bosch adds.

He also tells us the startup is working on a watermarking technology which it hopes to launch in Q2 this year — to give platforms a way to be able to spot AI-generated voices in the wild.

Such a feature is likely to be a vital tool to counter all the possible negative use-cases (scams, fraud, manipulation, abuse, bullying, trolling etc etc) one could imagine humans coming up with for voice-shifting tools that let you sound exactly like someone you’re not.

“It’s an algorithm to watermark the audio,” explains Bosch. “Moderation is is complicated because it really changes depending on the space… on which are the platforms where the audio is used — so we believe that the channel is the one that should own that moderation and what we are doing is we will be providing this watermarking system in order for them to be able to know if the audio is created via synthetic voice or is created by a real voice.”

“Every single new technology can be used for for the good or for the bad,” he adds. “So we are of course putting some technology some tools in place to be able to have more control around a misuse of this technology.”

On questions of licensing for training data, IP issues here are currently a grey area — as the law hasn’t caught up with developments in AI (let alone generative AI). That means startups operating in the space have to consider whether to make the most of total legal freedom to do whatever they want (and hope expensive consequences don’t come clanging down on them in short order), or tread more carefully and thoughtfully. (Other startups in the space include the likes of Voice AI, Koe and ElevenLabs.)

Bosch claims Voicemod is taking the latter approach — using (paid) voice actors to build up data-sets to train and hone its AI models. If it wants to make use of some original content he says the team will go to the IP provider and negotiate — and figure out what kind of licensing terms they’d be up for. (The generative AI boom is also a crazy-thrilling time to be an IP lawyer, clearly.)

“We are basically pioneering here,” he adds. “So a lot of things are without laws yet so we were trying to stick to our values, basically, and try to do the right thing. That’s our approach on the data [side]. But yeah, you’re completely, right — there’s no ‘legal attachment’ to your voice, as of today… We own our fingerprint. You don’t own, like, whatever the fingerprint of your voice [is]. As of today.

“It sounds a little bit like science fiction but maybe, in the future, we will ‘own’ something related to our voice.”

For the record, Bosch was talking to me with his actual voice. The company’s real-time voice-shifting technology doesn’t yet work over mobile. But he says that’s coming too. So buckle up: The synthesized future is gonna be a screaming wild ride.

As ChatGPT hype hits fever pitch, Neeva launches its generative AI search engine internationally

More TechCrunch

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

1 day ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

1 day ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava announced a slew of features, including AI to weed out leaderboard cheats, a new ‘family’ subscription plan, dark mode and more.

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

We all fall down sometimes. Astronauts are no exception. You need to be in peak physical condition for space travel, but bulky space suits and lower gravity levels can be…

Astronauts fall over. Robotic limbs can help them back up.

Microsoft will launch its custom Cobalt 100 chips to customers as a public preview at its Build conference next week, TechCrunch has learned. In an analyst briefing ahead of Build,…

Microsoft’s custom Cobalt chips will come to Azure next week

What a wild week for transportation news! It was a smorgasbord of news that seemed to touch every sector and theme in transportation.

Tesla keeps cutting jobs and the feds probe Waymo