9 transcription tools for Podcasters and Video Creators

I create a lot of content with audio. Interviews that become final podcasts, final podcasts, videos, and more. In almost all cases, a transcript of that audio is useful. Often, I use it to create a blog post based on that audio, and I am also in the process of creating interactive transcripts for my backlog of podcasts. And that aside, accompanying transcriptions of shows are good for SEO and accessibility.

Transcription tools have existed for some time, but the recent wave of generative AI tools has vastly increased the options provided by existing and new services, also available in desktop offline applications.

On top of this, I have a lot of older shows. How can I automate the backlog of transcribing them?

With several hours of old shows to transcribe and a growing backlog of episodes to release, I thought it was time to test the several potential services piled up in my “to check out pile”.

Video version

Sample recording

I needed some sample audio to work with, so I used an interview I recently had with Alex from Respeecher. Alex is from Ukraine, so has an accent when he speaks English, which is a good test case, and well, with the company name, it seemed appropriate. The company name is also a portmanteau of real words, perfect for tripping up an AI. I used recordings of each of our inputs, and in a couple of cases, I had to combine these into one file. I didn’t apply any processing or do any editing.

What I was looking for

The ability to recognise common technical terms and jargon, ideally with the option for adding regular custom terms to a dictionary.
Something that works offline would be ideal, but if the tool is already an online tool, for example, the service I use for recording remote interviews offers transcription tools, I wouldn’t expect it to work offline.
A batch or automation option would be fantastic to cope with the backlog of podcasts I need to transcribe.

What I currently use

At the moment, I use a mixture of tools, which prompted me to write this post as I wanted to weigh up the options available to me.

The options I jump between in different use cases are Descript, Adobe Premiere, Google Recorder, Otter, and MacWhisper. Otter is increasingly less, but my Wife has an account, so I occasionally still use it.

Google Recorder

When recording interviews at events, I use an iRig Mic HD 2 plugged into the USB port of my Pixel 6. The built-in Pixel Recorder app has live transcription, and you can download a text transcript and audio recording.

The transcription is quite good, and the live aspect often impresses my guests. But it only works on Pixel phones, and you can’t upload other audio files, so it isn’t much use for editing purposes.

Price: Free with a Pixel phone
Transcription quality: Good
Features: Low

Otter

Aimed mostly at business meeting use cases, with Zoom, Google Workspace, and Office 365 integrations that reflect this. However, Otter does a reasonable job at transcriptions but could be better at allowing you to correct transcription errors. You have to manually upload each file if the recording doesn’t come from an integration source. It only accepts combined tracks, so you must combine any tracks before uploading.

Otter has many new AI features (of course) that let you ask questions about the interview and review and comment features, again showing that it’s mostly a tool for business users. Finally, if you weren’t already convinced, the audio quality it stores is very low, and there are few export options for the transcript. It’s a good tool, but it’s not suited to my use case.

Price: $0-20 per month
Transcription quality: Good
Features: High, but low for the use case

Descript

A tool optimised for text-based audio and video editing, Descript’s workflow is designed for content creators, with integrations to recording platforms, dropbox folders, and export options to editing tools.

Transcript quality is high, though the correction tools aren’t as comprehensive as I expected. It’s more like a search and replace, but if the error is different in each case, the search and replace doesn’t save much time. I haven’t yet found a way to build a library of common errors. For example, “Chris Chinchilla” is almost always wrong in every transcription. But it’s a big application with many features so that I might have missed it.

Price: $0-24 per month
Transcription quality: Good
Features: High

Riverside FM

I used to use Riverside FM for recording remote interviews until Descript acquired SquadCast, and the combined offering was too good to resist.

In the months before I left, RiversideFM added a handful of browser-based transcriptions and text-based editing tools that were good enough that they caused me to pause before cancelling my membership. The interface is similar to Descript and Audiate, with the text above and a timeline of waveforms below.

It automatically transcribes anything you record through Riverside, but you can also upload files recorded elsewhere to start the transcription process.

It’s the only service I tried that could correct all assumed incorrectly transcribed words at once. Or at least, it wasn’t obvious to me how to get that to work with other services, and with Riverside, “it just worked”. It offers standard export options for the transcript and auto chapter generation, which some other tools offer, but I have never found it that reliable.

Price: $0-26 per month
Transcription quality: High
Features: Medium

Techsmith Audiate

Techsmith is better known for its visual tools for screenshot and video generation, Audiate feels like a Descript competitor that hasn’t quite caught up. It doesn’t detect multiple speakers but allows for text-based editing, removing filler words, etc. You can export the transcript as text and SRT, as well as to Camtasia.

Similar to Audition, it makes mistakes on the transcription, but different mistakes to other services.

Price: €225 / year
Transcription quality: Good
Features: Low for the price

MacWhisper

A macOS-native wrapper around OpenAI’s Whisper, MacWhisper has many features in the free version and even more if you pay. It works offline, is quite fast (depending on your machine), is accurate, and is affordable.

The podcast transcription feature is still in beta but works well, with several different ways of wrapping Whisper for specific use cases. The beta podcast transcript feature is the most appropriate option for this blog post. In a basic way, it works similarly to Descript. You can upload multiple files, and it recognises each file as a different speaker and treats them as if they are from the same conversation.

However, it presents the transcription differently and, of course, has far fewer features for handling the transcript. You can edit the transcript by clicking into it. The main limitations are that the player and transcript don’t move completely in sync, and the way it displays the speaker pausing is space-consuming. There might be ways to change this behaviour that I missed. There are no audio export options, but there are many options for exporting the transcript and even more in the paid version.

Price: €0-29
Transcription quality: High
Features: High

Audio Hijack

More known as a long-running tool for capturing audio inputs, Audio Hijack recently added transcription provided by Open AI’s Whisper. It has limited usage, optionally dumping a text file of the input audio into a text file. It’s as accurate as MacWhisper, but offers little in terms of post-transcription tools.

Price: $77
Transcription quality: High
Features: Medium generally, but low for transcription

Adobe Premiere

I love the recent addition of text-based editing in Adobe Premiere. It saves a lot of time on rough edits, stays up to date with the current state of the timeline, and offers export options for different caption and transcription formats. I wish the feature would come to Adobe Audition, as whilst editing audio in Premiere is possible, it’s not optimised for it, which makes complete sense.

It shows multiple speakers, and you can remove silences and filler words by selecting and deleting them. Transcription accuracy is reasonable, but it makes mistakes in almost all the same places as other options here, but with a different incorrect word, so I wonder if Adobe uses different models from other tools. You can then export the transcription in different formats or add it as captions for video.

Price: $20-60 per month
Transcription quality: Medium
Features: Medium (for transcription)

macOS dictation

There’s dictation built right into macOS, but it only works with a microphone input. If you have an application such as Loopback or free alternatives such as Blackhole, you can reroute application audio into the “microphone input” and dump a transcript into any text field. The negative is that this happens in real-time, so it takes a while, but it is fun to watch. The transcription is probably the least accurate I saw in the roundup. Still, if you don’t want to spend any money (unless you buy or already have Loopback) or install anything else on your Mac, it’s a… possibility.

Price: Free with a Mac
Transcription quality: Poor
Features: Low

Bulk transcription

I opened this post with bulk transcription of older content as one of my requirements but have not mentioned it for each tool. None of them helped with that requirement aside from manually uploading files, exporting the transcript, etc. In theory, this process might be possible with Otter, but I stopped looking as it’s not generally optimised for my workflow.

So far, Descript works fairly well, as you can upload a bunch of files and leave it processing. I am also using it to generate the interactive transcripts I use on my website, so it feeds into my whole workflow. However, it limits transcription hours based on your payment tier, so I need to stagger my transcription backlogging.

I thought of experimenting with the Whisper API myself, either directly or via a service such as make.com, but by default, the API limits upload to 25 MB, so you need to plan how to chunk and handle larger files. In summary, I have decided to leave that topic for future research. Let me know if you have any ideas!

The future

I will not change my current tools and service selection, primarily using Descript for podcasts, Premiere for video, and MacWhisper for anything else. I intend to try using Descript more for video editing, so that may change things, and I also wait eagerly for text-based editing in Audition, which also may change things. However, this does mean I lack any cohesive, centralised dictionary of custom words. However, none of the tools I looked at offer particularly good features for this purpose anyway, so while it’s sometimes tedious to endlessly correct words, I am not missing anything… Yet.