Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →
Podcast Transcription
AI transcription for podcast episodes in 99 languages, with automatic speaker labels for hosts and guests. Generate show notes, blog posts, and SRT subtitles from one upload — all included on every paid plan.
A typical 60-minute podcast episode contains approximately 9,000 words of speech that exists nowhere as searchable text — until you transcribe it. VexaScribe (formerly NovaScribe) transcribes podcast episodes using OpenAI's Whisper Large-v3 with automatic speaker diarization (host vs guests vs co-hosts), word-level timestamps, and exports to TXT, DOCX, SRT, and VTT. A 60-minute episode typically completes in 5–10 minutes with ~95% accuracy on clear podcast audio. Apple Podcasts auto-generated transcripts launched in March 2024 (English, French, Spanish, German only); Spotify added optional creator transcripts in 2023–24. But platform transcripts don't help you build show notes, repurpose episodes into blog posts, generate SRT subtitles for video versions, or feed AI search engines that increasingly cite indexable text. VexaScribe's free tier includes 30 minutes; paid plans start at $2/month. A weekly 1-hour podcast costs less than $3/month to transcribe.
How VexaScribe Transcribes Your Podcast
Three steps from upload to finished transcript. No software to install.
- 1
Upload your episode
MP3 (typically 28-56 MB/hr at 64-128 kbps), WAV, M4A, MP4 video. Up to 5 GB and 10 hours per file. Most podcast episodes are well under this.
- 2
AI transcribes with speaker labels
Whisper Large-v3 transcribes; diarization automatically separates speakers (Speaker 1, Speaker 2, Speaker 3). Rename to "Host", "Guest Name", actual names — labels carry through every export.
- 3
Export everything you need
Plain text for show notes, DOCX for editing, SRT/VTT for video versions, JSON for developers, AI summary with chapters and key topics.
What You Get From One Upload
Everything you need to ship an episode — show notes, subtitles, blog content, multilingual versions.
Speaker-labeled transcript
Auto-diarization separates host, co-host, and guests. Useful for interview shows and panel formats. Rename speakers to actual names in the editor.
AI-generated show notes draft
Chapter markers, key topics, episode summary. Use as a starting point for your show notes — saves 30-60 min of manual writing per episode.
SRT/VTT subtitles
For YouTube, Vimeo, or video podcast versions. Word-level timestamps; perfectly synced. Upload directly to YouTube using "Add language → Upload file with timing".
Translated transcripts
Optional: translate to any of 133 languages using built-in translation. Reach international audiences without re-recording — Spanish podcast → English subtitles in one workflow.
Why Podcasters Transcribe
Four specific business reasons. Most podcasters end up with all four, not just one.
1. SEO
Search engines can't index audio. A 60-minute episode adds ~9,000 words of indexable content per upload. Compounds across your back catalog. This American Life reportedly saw a 4.36% inbound traffic lift after transcribing their archive (2014, single datapoint — directional only).
2. AI search citation
ChatGPT, Perplexity, Claude, and Google AI Overviews can only cite text. If your episode discusses a topic but exists only as audio, AI engines won't surface it. Transcripts are the only path for podcast content to appear in AI answers.
3. Accessibility
WCAG 2.1 SC 1.2.1 (Level A) requires text alternatives for prerecorded audio. Public-entity podcasts will fall under ADA Title II — deadline extended to April 26, 2027 for entities serving 50,000+ people, April 26, 2028 for smaller. Private podcasts aren't legally required, but transcripts are still expected by deaf and hard-of-hearing listeners.
4. Content repurposing
One episode → show notes (typically 1,500–2,500 words from a 45–60 min episode) → blog post → social clips → newsletter → quote graphics. Most repurposing tools (Castmagic, Riverside Magic Clips, Opus Clips) start from a transcript.
Speaker Identification: Hosts, Co-hosts, Guests
VexaScribe automatically detects different speakers in your podcast and labels them Speaker 1, Speaker 2, Speaker 3. You can rename them in the editor — “Host”, “Guest: Jane Smith”, “Co-host: Dan” — and the labels carry through every export.
AI diarization accuracy depends on conditions
- ✓2–3 speakers, separate microphones: very high accuracy (~90%+)
- ●4+ speakers: accuracy drops; occasional mislabeling
- ✗Overlapping speech: degrades significantly when people talk over each other
- ✗Similar voices: two male hosts close in pitch can occasionally be confused
State-of-the-art research benchmarks show pyannote 3.1 at 11–19% Diarization Error Rate (DER) on real-world audio. PyannoteAI commercial: 11.2% DER. (Sources: arXiv 2509.26177, picovoice 2025 benchmark.)
Recording on Riverside.fm or similar? Their separate-track recording (each speaker as their own uncompressed 24-bit/48 kHz WAV file) gives you near-perfect speaker ID — diarization isn't even needed because you've already separated speakers at the recording stage. Source: riverside.com.
Cost: AI vs Human vs Self-Transcribing
For most podcasters, AI accuracy at ~95% is good enough — podcasts are SEO and accessibility content, not legal records.
| Method | Cost per hour | Time to deliver | Accuracy |
|---|---|---|---|
| VexaScribe (AI) | $0.20–$0.60/hr | 5–10 min | ~95% on clear audio |
| Otter.ai (AI) | From $16.99/mo | Real-time | ~95% |
| Sonix (AI) | ~$10/hr pay-as-you-go | 5–15 min | ~95% |
| Rev AI (AI) | $15/hr ($0.25/min) | 5–10 min | ~95% |
| Rev (human) | ~$90/hr ($1.50/min) | 12–24 hours | 99%+ |
| Happy Scribe (human) | ~$102/hr ($1.70/min) | 4–24 hours | 99%+ |
| Self-transcribe | Your time (~4 hrs typing per 1 hr audio) | Days | Variable |
The cost difference between AI ($0.20–$0.60/hr on VexaScribe) and human ($90–$150/hr) is 150–450×. Most podcasters use AI and edit any errors manually (proper nouns, brand names, technical jargon).
From Transcript to Show Notes in 5 Minutes
A typical 45–60 minute interview produces around 7,000–9,000 words of transcript. Manually writing show notes from that takes 30–60 minutes. VexaScribe's AI summary feature does it automatically — you get:
- ✓Episode summary (2-3 paragraphs)
- ✓Chapter markers with timestamps (great for YouTube and Apple Podcasts chapters)
- ✓Key topics discussed
- ✓Notable quotes with timestamps
The AI output is a starting point, not final copy. Most podcasters edit for brand voice, add their own takes, and pull specific quotes manually. But it cuts the writing time roughly in half.
If You Have a Video Version: SRT Subtitles
YouTube version of your podcast? Spotify Video? VexaScribe exports SRT and VTT subtitle files with word-level timestamps from the same transcription. Upload directly to YouTube using Add language → Upload file with timing → select your .srt file.
Platforms that accept SRT/VTT for podcasts
- ●Buzzsprout — TXT, SRT, VTT
- ●Transistor — SRT, VTT, TXT, JSON
- ●Spotify for Creators — VTT, SRT
- ●Apple Podcasts — SRT/VTT via RSS <podcast:transcript> tag
- ●YouTube — SRT
For the full SRT workflow, see the SRT generator page.
Transcribing Non-English and Multilingual Podcasts
VexaScribe transcribes in 99 languages. Important caveat about Apple Podcasts: their auto-transcripts only support English, French, Spanish, and German (as of iOS 17.4 launch in March 2024). Spanish, French, German podcasters can rely on Apple's auto-transcripts; everyone else needs to provide their own.
Transcripts can also be translated to any of 133 languages using VexaScribe's built-in translation. Useful for international episodes — record in Spanish, publish English-translated transcript and SRT for English-speaking audiences.
See the transcribe and translate workflow →Less than $3/month for a Weekly Podcast
A weekly 1-hour podcast is ~4 hours of audio per month. Starter plan covers it with room to spare.
Free trial
30 min total
No credit card
Starter
200 min/month
Weekly 1-hr podcast
Basic
1,000 min/month
2-3 episodes per week
Frequently Asked Questions
How do I transcribe a podcast episode?
Upload your podcast episode file (MP3, WAV, M4A, or other supported formats) to VexaScribe (formerly NovaScribe). The AI automatically transcribes the audio using OpenAI's Whisper Large-v3, detects different speakers (host, co-host, guests), and generates word-level timestamps. A typical 60-minute episode completes in 5-10 minutes.
Does podcast transcription identify different speakers?
Yes. VexaScribe includes automatic speaker diarization. When your podcast has multiple speakers, the system labels each one separately (Speaker 1, Speaker 2, Speaker 3...). You can rename them in the editor ("Host", "Guest: Jane Smith", "Co-host: Dan") and the labels carry through every export. Accuracy is highest for 2-3 speakers on separate microphones; degrades with overlapping speech and 4+ speakers.
What can I do with my podcast transcript?
Turn one episode into multiple pieces of content: create show notes for your website, repurpose as blog posts (1,500-2,500 words from a 45-60 min episode), extract quotes for social media, generate YouTube captions (SRT/VTT export), improve SEO with searchable text, and make your content accessible to deaf and hard-of-hearing listeners.
How long does it take to transcribe a 1-hour podcast?
A typical 1-hour podcast episode is transcribed in about 5-10 minutes. Processing time depends on audio length, quality, and current load. You can close your browser while processing — the transcript will be waiting in your dashboard when it's ready.
What audio formats work for podcast transcription?
VexaScribe supports all common podcast formats: MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio. Video formats (MP4, MOV, AVI, MKV, WebM) work too — we extract the audio automatically. Files can be up to 5 GB and 10 hours per upload, which covers virtually any podcast episode.
Can I transcribe podcasts in different languages?
Yes. VexaScribe supports 99 languages for transcription with automatic language detection. You can also translate transcripts to any of 133 languages using the built-in translation feature — useful for international podcasts that want English subtitles, or English podcasts reaching non-English audiences.
How much does podcast transcription cost?
VexaScribe plans start at $2/month for 200 minutes. A weekly 1-hour podcast (~4 hours/month) costs less than $3/month on the Starter plan. Compared to human transcription services like Rev ($1.50/min, ~$90/hr) or Happy Scribe ($1.70/min), AI is 150-450× cheaper. For most podcasters, AI accuracy at ~95% on clear audio is more than enough.
Can I export subtitles for YouTube from my podcast transcript?
Yes. VexaScribe exports SRT and VTT subtitle files with word-level timestamps. Upload directly to YouTube using "Add language → Upload file with timing" → select your .srt file. Same files work in Vimeo, Spotify Video, or any video editor (Premiere Pro, Final Cut Pro, DaVinci Resolve).
Does Apple Podcasts already transcribe my show automatically?
Apple Podcasts launched auto-generated transcripts in March 2024 with iOS 17.4 — but only for English, French, Spanish, and German. Auto-transcription rolled out to new episodes first, with the back catalog filling in over time. If you publish in another language, or want to control the exact transcript shown, you'll still need to provide your own. Spotify also added optional creator-uploaded transcripts (VTT/SRT) starting in 2023-24, but it's opt-in, not automatic for all podcasts.
How accurate is AI transcription for podcasts with multiple speakers?
On clear podcast audio with 2-3 speakers on separate microphones, AI transcription accuracy is around 95% (5% Word Error Rate). Speaker diarization is accurate when speakers have distinct voices and don't talk over each other. State-of-the-art research shows pyannote 3.1 achieves 11-19% Diarization Error Rate on real-world audio. Conditions that hurt accuracy: overlapping speech, similar voices, 4+ speakers, low-bitrate audio.
Related
Transcribe audio to text
All formats, 99 languages, 95% accuracy
MP3 to text
Most podcasts are MP3 — convert to text in 2 minutes
WAV to text
Studio masters from your DAW — straight to transcript
Best podcast transcription tools
Honest comparison of 10 tools — Descript, Castmagic, Otter, Rev, more
Bulk transcription
50 episodes per batch — for podcast agencies and back-catalog projects
Best subtitle generators 2026
12 tools compared — for audiogram subtitle production
Speaker labels — how they work
Multi-host podcast labeling — pipeline, accuracy, cross-file rename
Transcript to summary
Generate show notes & chapters from any transcript
SRT generator
Subtitle files for video podcasts
Video to SRT
Workflow for video podcast SRT generation
Video to text
Plain-text transcript from any video podcast file
M4A to text
For guest interviews recorded on iPhone Voice Memos
OGG to text
For guest interviews via WhatsApp voice notes (Android)
Transcribe Spanish audio
Spanish-language podcast workflows + EN translation
Otter.ai alternatives
8 alternatives for podcasters considering Otter
Captions vs subtitles
Picking the right caption type for video podcasts
How to add subtitles to a video
Step-by-step for YouTube, Premiere, CapCut, iPhone
Interview transcription
Workflow for guest-interview podcasts + speaker labels
Transcribe and translate
Reach international audiences in 133 languages
How accurate is Whisper?
WER benchmarks by language and condition
Whisper transcription
Hosted Whisper Large-v3 — the engine behind podcast transcripts