Descript, Andrew Mason’s platform to edit audio by editing text, now lets you edit video, too

Descript, the latest startup from Groupon co-founder Andrew Mason, made a splash in the world of audio last year with a platform for easy audio editing based on how you edit written documents, adding features like an AI-based tool that uses a recording of you to let you create audio of any written text in your own voice.

Today the startup is moving into the next phase of its growth. It is launching Descript Video, with a set of tools to take screen recordings or videos and then create titles, transitions, images, video overlays or edits on them with no more effort than it takes to edit a Word document. It also features live collaboration links so that multiple people can work on a file at the same time — similar to a Google Doc — by way of links that you can share with others to the file itself.

You work with video on Descript in the same way you do audio: you upload the raw material onto the Descript platform, which then turns it into text. Then you add new features, or remove sections, or add in new parts, by adding in widgets or cutting out or adding in written words.

The video tools are launching today as part of Descript’s freemium service, with basic price tiers of free, $12 and $24 per month, depending on which features you take.

[gallery ids="2064136,2064137,2064138,2064139"]

Descript’s launch comes at a key moment in the world of tech. Before the COVID-19 pandemic, video was already king of the content hill, thanks to advances in streaming, broadband speeds, processors on devices, a proliferation of services and society’s inclination to lean back and watch things in their leisure time.

Yes, some people still read. And podcasts, recorded books and other formats have definitely led to a kind of renaissance for audio. But video cuts through all of that when it comes to time spent online and consumer engagement. Like cats, it seems we’re just attracted by moving objects.

Now we have another added twist. The pandemic has become the age of video in the worlds of work, learning and play, with platforms like Zoom, Meet, Teams and WebEx taking on the role of conference room, quick coffee, dinner party, pub and whatever other place you might have chosen to meet people before COVID-19 came along.

“We are increasingly living in a video-first world,” Mason said the other week from his house in the Bay Area, over a Zoom call. All of that means not just a ton of video, but a ton of video creators, counting not just the 50 million or so making content for Twitch, YouTube, Instagram, Snapchat and the rest, but also any one of us that is snapping a moving picture and posting it somewhere either for fun or for pay.

Video was always on the cards for Descript, Mason added, but it made sense first to focus on audio tools. That was in part because Descript itself was a spin-off from Detour (a detour from Detour, as it happens), an audio-guide business that was sold to Bose, and so sound was the focus.

“There is so much to build, so we wanted to start with some version of the product, and then add features in concentric circles of addressable markets,” said Mason. 

And that essentially is how the company sees the opportunity for selling a video editing product as an extension of an audio-editing tool. People who produce content for podcasts also often produce videos, and those who got their start on a platform like YouTube are now expanding their footprints with recorded word. Sometimes there is distinct material created for one platform or the other, but oftentimes there are excerpts repurposed, or full versions of audio from video turned into podcasts.

YouTubers or podcasters, meanwhile, have something in common with the average person: Everyone is using technology now to produce content, but not everyone knows how to work with it on a technical level if you need to cut, edit or manipulate it in any way.

Descript’s aimed at professionals and prosumers, but actually it also follows in the vein of tools that let people build websites without needing to know HTML or have special design experience; or use any piece of software without having to build the functionality before using it. With all of the advances in actual tech, that idea has come a long way in modern times.

“Before I got into tech I was a music major. I got a degree in music tech and worked in a recording studio. I’ve been using these tools since I was a kid and know them super well,” Mason said. “But our approach has been to think of us like Airtable. We want to be part of that modern class of SaaS products that don’t mean you need to make a trade-off between power and ease of use.”

Tools in this first build of the video include not just the ability to import video from anywhere that you can edit, but also a screen recorder that you can use to record excerpts from other places, or indeed your whole screen, which then can either be edited as standalone items, or as part of larger works. Things like this seem particularly aimed at the new class of “video producers” that are actually knowledge workers creating material to share with colleagues or customers.

While Overdub — the feature that uses natural language processing to let you create a “deepfake” of your own voice to overlay new audio into a recording by typing something out — works very smoothly on an audio recording, where you would be hard-pressed to notice where the changes have been made, on video, cuts work out as small jumps, and Overdubs simply come out as added audio in the video. While audio and video jumps are pretty commonplace these days in videos these days, I imagine that the company is likely working on a way to smooth that out to mirror the audio experience as it is today.

Descript today is used by a number of big-name content publishers, including NPR, Pushkin Industries, VICE, The Washington Post and The New York Times, although Mason declined to disclose how many users it has in total.

At some point, however, numbers will tell another kind of story: just how much traction Descript is getting among the masses of competition in the field. Platforms like Zoom and Google are also adding more editing tools, and there are a plethora of others building easy to use software to better work with audio and video, from through to Scribe, Vimeo, Adobe, Biteable and more.

In the meantime, Descript has caught the eye of some important backers, raising some $20 million to date from investors, including Andreessen Horowitz and Redpoint.