Convert audio to text

Descript’s audio-to-text capabilities transcribe audio with up to 95% accuracy to create transcripts, captions, subtitles, and text files. The best part? You can edit your audio by editing the text—just like a doc—to remove filler words and make cuts with just a few keystrokes.

Get started →

The Easiest Speech-to-Text Has Ever Been

Descript’s speech-to-text transcription tool uses advanced speech recognition technology to turn audio files into transcripts that can be edited in real-time, just like a Google Doc, to change the underlying audio. All you have to do is drag and drop your audio or video file, and Descript will immediately begin transcribing.

Download the app →

How to transcribe audio files to text

Experience the magic of Studio Sound on your audio clip. You just need an audio recording that’s no longer than 5 minutes and no more than 25mb.

Step 1

Upload your audio file to transcribe

Drag and drop an audio or video file into a new Descript project to upload it. A transcript will automatically generate and sync to your audio, including dialogue and even "wordless media" like sounds, and pauses. If there are multiple speakers in your audio, Descript will automatically identify and label them for you.

Step 2

Edit your transcript

By default, your new transcript will be synced to your editing timeline. You can delete or rearrange the text to edit your audio, letting you do stuff like remove filler words in one click. If you want to fix any transcription errors, like a misspelled name, highlight the text and enter Correct mode by pressing 'C' to fix your transcript without affecting the audio.

Step 3

Export in your desired format

Once your transcript is polished, head over to Publish > Export and choose an export option. You can export your transcript as plain text, rich text, markdown, HTML, Word doc, or even an SRT or VTT subtitle file. You can also publish it as a web link to share or embed your transcript alongside the audio with Descript's media player.

Try Studio Sound now →

The Easiest Speech-to-Text Has Ever Been

Descript’s speech-to-text transcription tool uses advanced speech recognition technology to turn audio files into transcripts that can be edited in real-time, just like a Google Doc, to change the underlying audio. All you have to do is drag and drop your audio or video file, and Descript will immediately begin transcribing.

Download the app →

A text converter that is as easy as drag and drop

Descript makes it easy to transcribe audio files into text. Simply create a project, select the audio file you want to transcribe, and wait a few seconds for your accurate transcription. Descript also makes it easy to correct any inaccuracies, so you can quickly take your transcript from highly accurate to perfect.Whether you're a YouTuber, vlogger, podcaster, or simply wanting to transcribe an audio file, Descript’s advanced speech recognition technology ensures precise and accurate transcriptions every time, and our simple, intuitive user interface makes it easy to get started.Sign up for free today and see how easy it is to create searchable transcripts of your audio files.

Digital Community Manager

“This tool saves literally thousands of hours. I can't believe how far this technology has come in such a short span of time.”

Freelance Writer & Content Developer

"Descript is an amazing timeline and text-based editor with some incredible features.I love the timeline editor and the ability to do multi-track editing very quickly.”

Chief Operating Officer

“I am new to video content creation and found narration and editing challenging. it cut my work time by more than half and significantly improved quality.”

Descript Audio Transcription is Better Than Ever

With our most recent updates, Descript’s transcription is better than ever.

Automatic transcription will save you a step when you’re importing media; rather than confirming that you want to transcribe, Descript just starts transcribing.

Other fixes & improvements:

How does Descript’s speech-to-text tool work?

Descript uses state-of-the-art artificial intelligence and machine learning to take your audio files and give you a highly accurate transcription of that audio in minutes.

Can I use Descript to make captions?

Yes, you can use Descript to create captions for videos. Simply select the video file you want to add text to, transcribe the audio, and then use Descript’s Fancy Captions feature to add the text to your video in a few clicks.

Is Descript just a transcription tool?

Far from it. With tools like automated Filler Word Removal, Overdub voice synthesis, Studio Sound voice enhancement, and text-to-speech editing, Descript uses AI and other advanced technological stuff to streamline your entire production workflow — so you spend more time creating content, and less on the technical drudgery.

Can Descript transcribe in different languages?

Yes! Descript supports transcription for 22 languages: Spanish, German, French, Italian, Portuguese, Romanian, Malay, Turkish, Polish, Dutch, Hungarian, Czech, Swedish, Croatian, Finnish, Danish, Norwegian, Slovak, Catalan, Lithuanian, Slovenian, Latvian, (and English).

What audio file formats does Descript transcribe?

Descript can read WAV audio formats from nearly every popular source. Whether you have an audio recording on a mobile device like an Android, an iOS device like an iPad or iPhone, or even something you recorded directly into Windows or Mac, Descript’s transcription software can take that audio and turn it into editable text for your project.