Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

Podcast Transcription

AI transcription for podcast episodes in 99 languages, with automatic speaker labels for hosts and guests. Generate show notes, blog posts, and SRT subtitles from one upload — all included on every paid plan.

A typical 60-minute podcast episode contains approximately 9,000 words of speech that exists nowhere as searchable text — until you transcribe it. VexaScribe (formerly NovaScribe) transcribes podcast episodes using OpenAI's Whisper Large-v3 with automatic speaker diarization (host vs guests vs co-hosts), word-level timestamps, and exports to TXT, DOCX, SRT, and VTT. A 60-minute episode typically completes in 5–10 minutes with ~95% accuracy on clear podcast audio. Apple Podcasts auto-generated transcripts launched in March 2024 (English, French, Spanish, German only); Spotify added optional creator transcripts in 2023–24. But platform transcripts don't help you build show notes, repurpose episodes into blog posts, generate SRT subtitles for video versions, or feed AI search engines that increasingly cite indexable text. VexaScribe's free tier includes 30 minutes; paid plans start at $2/month. A weekly 1-hour podcast costs less than $3/month to transcribe.

Speaker labelsAI show notes99 languagesSRT for video
~9,000
Words per hour of podcast
Average conversational speaking rate (150 WPM × 60 min)
55%
US adults 12+ listened in last month
Edison Research Infinite Dial 2025
Mar 2024
Apple Podcasts auto-transcripts launched
EN/FR/ES/DE only, iOS 17.4+

How VexaScribe Transcribes Your Podcast

Three steps from upload to finished transcript. No software to install.

  1. 1

    Upload your episode

    MP3 (typically 28-56 MB/hr at 64-128 kbps), WAV, M4A, MP4 video. Up to 5 GB and 10 hours per file. Most podcast episodes are well under this.

  2. 2

    AI transcribes with speaker labels

    Whisper Large-v3 transcribes; diarization automatically separates speakers (Speaker 1, Speaker 2, Speaker 3). Rename to "Host", "Guest Name", actual names — labels carry through every export.

  3. 3

    Export everything you need

    Plain text for show notes, DOCX for editing, SRT/VTT for video versions, JSON for developers, AI summary with chapters and key topics.

What You Get From One Upload

Everything you need to ship an episode — show notes, subtitles, blog content, multilingual versions.

Speaker-labeled transcript

Auto-diarization separates host, co-host, and guests. Useful for interview shows and panel formats. Rename speakers to actual names in the editor.

AI-generated show notes draft

Chapter markers, key topics, episode summary. Use as a starting point for your show notes — saves 30-60 min of manual writing per episode.

SRT/VTT subtitles

For YouTube, Vimeo, or video podcast versions. Word-level timestamps; perfectly synced. Upload directly to YouTube using "Add language → Upload file with timing".

Translated transcripts

Optional: translate to any of 133 languages using built-in translation. Reach international audiences without re-recording — Spanish podcast → English subtitles in one workflow.

Why Podcasters Transcribe

Four specific business reasons. Most podcasters end up with all four, not just one.

1. SEO

Search engines can't index audio. A 60-minute episode adds ~9,000 words of indexable content per upload. Compounds across your back catalog. This American Life reportedly saw a 4.36% inbound traffic lift after transcribing their archive (2014, single datapoint — directional only).

2. AI search citation

ChatGPT, Perplexity, Claude, and Google AI Overviews can only cite text. If your episode discusses a topic but exists only as audio, AI engines won't surface it. Transcripts are the only path for podcast content to appear in AI answers.

3. Accessibility

WCAG 2.1 SC 1.2.1 (Level A) requires text alternatives for prerecorded audio. Public-entity podcasts will fall under ADA Title II — deadline extended to April 26, 2027 for entities serving 50,000+ people, April 26, 2028 for smaller. Private podcasts aren't legally required, but transcripts are still expected by deaf and hard-of-hearing listeners.

4. Content repurposing

One episode → show notes (typically 1,500–2,500 words from a 45–60 min episode) → blog post → social clips → newsletter → quote graphics. Most repurposing tools (Castmagic, Riverside Magic Clips, Opus Clips) start from a transcript.

Speaker Identification: Hosts, Co-hosts, Guests

VexaScribe automatically detects different speakers in your podcast and labels them Speaker 1, Speaker 2, Speaker 3. You can rename them in the editor — “Host”, “Guest: Jane Smith”, “Co-host: Dan” — and the labels carry through every export.

AI diarization accuracy depends on conditions

  • 2–3 speakers, separate microphones: very high accuracy (~90%+)
  • 4+ speakers: accuracy drops; occasional mislabeling
  • Overlapping speech: degrades significantly when people talk over each other
  • Similar voices: two male hosts close in pitch can occasionally be confused

State-of-the-art research benchmarks show pyannote 3.1 at 11–19% Diarization Error Rate (DER) on real-world audio. PyannoteAI commercial: 11.2% DER. (Sources: arXiv 2509.26177, picovoice 2025 benchmark.)

Recording on Riverside.fm or similar? Their separate-track recording (each speaker as their own uncompressed 24-bit/48 kHz WAV file) gives you near-perfect speaker ID — diarization isn't even needed because you've already separated speakers at the recording stage. Source: riverside.com.

Cost: AI vs Human vs Self-Transcribing

For most podcasters, AI accuracy at ~95% is good enough — podcasts are SEO and accessibility content, not legal records.

MethodCost per hourTime to deliverAccuracy
VexaScribe (AI)$0.20–$0.60/hr5–10 min~95% on clear audio
Otter.ai (AI)From $16.99/moReal-time~95%
Sonix (AI)~$10/hr pay-as-you-go5–15 min~95%
Rev AI (AI)$15/hr ($0.25/min)5–10 min~95%
Rev (human)~$90/hr ($1.50/min)12–24 hours99%+
Happy Scribe (human)~$102/hr ($1.70/min)4–24 hours99%+
Self-transcribeYour time (~4 hrs typing per 1 hr audio)DaysVariable

The cost difference between AI ($0.20–$0.60/hr on VexaScribe) and human ($90–$150/hr) is 150–450×. Most podcasters use AI and edit any errors manually (proper nouns, brand names, technical jargon).

From Transcript to Show Notes in 5 Minutes

A typical 45–60 minute interview produces around 7,000–9,000 words of transcript. Manually writing show notes from that takes 30–60 minutes. VexaScribe's AI summary feature does it automatically — you get:

  • Episode summary (2-3 paragraphs)
  • Chapter markers with timestamps (great for YouTube and Apple Podcasts chapters)
  • Key topics discussed
  • Notable quotes with timestamps

The AI output is a starting point, not final copy. Most podcasters edit for brand voice, add their own takes, and pull specific quotes manually. But it cuts the writing time roughly in half.

If You Have a Video Version: SRT Subtitles

YouTube version of your podcast? Spotify Video? VexaScribe exports SRT and VTT subtitle files with word-level timestamps from the same transcription. Upload directly to YouTube using Add language → Upload file with timing → select your .srt file.

Platforms that accept SRT/VTT for podcasts

  • BuzzsproutTXT, SRT, VTT
  • TransistorSRT, VTT, TXT, JSON
  • Spotify for CreatorsVTT, SRT
  • Apple PodcastsSRT/VTT via RSS <podcast:transcript> tag
  • YouTubeSRT

For the full SRT workflow, see the SRT generator page.

Transcribing Non-English and Multilingual Podcasts

VexaScribe transcribes in 99 languages. Important caveat about Apple Podcasts: their auto-transcripts only support English, French, Spanish, and German (as of iOS 17.4 launch in March 2024). Spanish, French, German podcasters can rely on Apple's auto-transcripts; everyone else needs to provide their own.

Transcripts can also be translated to any of 133 languages using VexaScribe's built-in translation. Useful for international episodes — record in Spanish, publish English-translated transcript and SRT for English-speaking audiences.

See the transcribe and translate workflow →

Less than $3/month for a Weekly Podcast

A weekly 1-hour podcast is ~4 hours of audio per month. Starter plan covers it with room to spare.

Free trial

$0

30 min total

No credit card

Starter

$2/month

200 min/month

Weekly 1-hr podcast

Basic

$5/month

1,000 min/month

2-3 episodes per week

Frequently Asked Questions

How do I transcribe a podcast episode?

Upload your podcast episode file (MP3, WAV, M4A, or other supported formats) to VexaScribe (formerly NovaScribe). The AI automatically transcribes the audio using OpenAI's Whisper Large-v3, detects different speakers (host, co-host, guests), and generates word-level timestamps. A typical 60-minute episode completes in 5-10 minutes.

Does podcast transcription identify different speakers?

Yes. VexaScribe includes automatic speaker diarization. When your podcast has multiple speakers, the system labels each one separately (Speaker 1, Speaker 2, Speaker 3...). You can rename them in the editor ("Host", "Guest: Jane Smith", "Co-host: Dan") and the labels carry through every export. Accuracy is highest for 2-3 speakers on separate microphones; degrades with overlapping speech and 4+ speakers.

What can I do with my podcast transcript?

Turn one episode into multiple pieces of content: create show notes for your website, repurpose as blog posts (1,500-2,500 words from a 45-60 min episode), extract quotes for social media, generate YouTube captions (SRT/VTT export), improve SEO with searchable text, and make your content accessible to deaf and hard-of-hearing listeners.

How long does it take to transcribe a 1-hour podcast?

A typical 1-hour podcast episode is transcribed in about 5-10 minutes. Processing time depends on audio length, quality, and current load. You can close your browser while processing — the transcript will be waiting in your dashboard when it's ready.

What audio formats work for podcast transcription?

VexaScribe supports all common podcast formats: MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio. Video formats (MP4, MOV, AVI, MKV, WebM) work too — we extract the audio automatically. Files can be up to 5 GB and 10 hours per upload, which covers virtually any podcast episode.

Can I transcribe podcasts in different languages?

Yes. VexaScribe supports 99 languages for transcription with automatic language detection. You can also translate transcripts to any of 133 languages using the built-in translation feature — useful for international podcasts that want English subtitles, or English podcasts reaching non-English audiences.

How much does podcast transcription cost?

VexaScribe plans start at $2/month for 200 minutes. A weekly 1-hour podcast (~4 hours/month) costs less than $3/month on the Starter plan. Compared to human transcription services like Rev ($1.50/min, ~$90/hr) or Happy Scribe ($1.70/min), AI is 150-450× cheaper. For most podcasters, AI accuracy at ~95% on clear audio is more than enough.

Can I export subtitles for YouTube from my podcast transcript?

Yes. VexaScribe exports SRT and VTT subtitle files with word-level timestamps. Upload directly to YouTube using "Add language → Upload file with timing" → select your .srt file. Same files work in Vimeo, Spotify Video, or any video editor (Premiere Pro, Final Cut Pro, DaVinci Resolve).

Does Apple Podcasts already transcribe my show automatically?

Apple Podcasts launched auto-generated transcripts in March 2024 with iOS 17.4 — but only for English, French, Spanish, and German. Auto-transcription rolled out to new episodes first, with the back catalog filling in over time. If you publish in another language, or want to control the exact transcript shown, you'll still need to provide your own. Spotify also added optional creator-uploaded transcripts (VTT/SRT) starting in 2023-24, but it's opt-in, not automatic for all podcasts.

How accurate is AI transcription for podcasts with multiple speakers?

On clear podcast audio with 2-3 speakers on separate microphones, AI transcription accuracy is around 95% (5% Word Error Rate). Speaker diarization is accurate when speakers have distinct voices and don't talk over each other. State-of-the-art research shows pyannote 3.1 achieves 11-19% Diarization Error Rate on real-world audio. Conditions that hurt accuracy: overlapping speech, similar voices, 4+ speakers, low-bitrate audio.

Transcribe Your Next Episode in 10 Minutes

30 min free, no credit card. Upload one episode and see speaker labels, AI show notes, and SRT export.

Related

Transcribe audio to text

All formats, 99 languages, 95% accuracy

MP3 to text

Most podcasts are MP3 — convert to text in 2 minutes

WAV to text

Studio masters from your DAW — straight to transcript

Best podcast transcription tools

Honest comparison of 10 tools — Descript, Castmagic, Otter, Rev, more

Bulk transcription

50 episodes per batch — for podcast agencies and back-catalog projects

Best subtitle generators 2026

12 tools compared — for audiogram subtitle production

Speaker labels — how they work

Multi-host podcast labeling — pipeline, accuracy, cross-file rename

Transcript to summary

Generate show notes & chapters from any transcript

SRT generator

Subtitle files for video podcasts

Video to SRT

Workflow for video podcast SRT generation

Video to text

Plain-text transcript from any video podcast file

M4A to text

For guest interviews recorded on iPhone Voice Memos

OGG to text

For guest interviews via WhatsApp voice notes (Android)

Transcribe Spanish audio

Spanish-language podcast workflows + EN translation

Otter.ai alternatives

8 alternatives for podcasters considering Otter

Captions vs subtitles

Picking the right caption type for video podcasts

How to add subtitles to a video

Step-by-step for YouTube, Premiere, CapCut, iPhone

Interview transcription

Workflow for guest-interview podcasts + speaker labels

Transcribe and translate

Reach international audiences in 133 languages

How accurate is Whisper?

WER benchmarks by language and condition

Whisper transcription

Hosted Whisper Large-v3 — the engine behind podcast transcripts