Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

Podcast Transcription

Transcripts, AI show notes, chapter markers for Spotify/Apple, and SRT for video versions — from one podcast episode upload. 99 languages, host and guest speaker labels, bulk uploads for backlogs. Free 30-min trial, plans from $2/month.

A typical 60-minute podcast episode contains approximately 9,000 words of speech that exists nowhere as searchable text — until you transcribe it. VexaScribe (formerly NovaScribe) transcribes podcast episodes using OpenAI's Whisper Large-v3 with automatic speaker diarization (host vs guests vs co-hosts), word-level timestamps, and exports to TXT, DOCX, SRT, and VTT. A 60-minute episode typically completes in 5–10 minutes with ~95% accuracy on clear podcast audio. Apple Podcasts auto-generated transcripts launched in March 2024 (English, French, Spanish, German only); Spotify added optional creator transcripts in 2023–24. But platform transcripts don't help you build show notes, repurpose episodes into blog posts, generate SRT subtitles for video versions, or feed AI search engines that increasingly cite indexable text. VexaScribe's free tier includes 30 minutes; paid plans start at $2/month. A weekly 1-hour podcast costs less than $3/month to transcribe.

Speaker labelsAI show notes99 languagesSRT for video

~9,000

Words per hour of podcast

Average conversational speaking rate (150 WPM × 60 min)

55%

US adults 12+ listened in last month

Edison Research Infinite Dial 2025

Mar 2024

Apple Podcasts auto-transcripts launched

EN/FR/ES/DE only, iOS 17.4+

How VexaScribe Transcribes Your Podcast

Three steps from upload to finished transcript. No software to install.

1
Upload your episode
MP3 (typically 28-56 MB/hr at 64-128 kbps), WAV, M4A, MP4 video. Up to 5 GB and 10 hours per file. Most podcast episodes are well under this.
2
AI transcribes with speaker labels
Whisper Large-v3 transcribes; diarization automatically separates speakers (Speaker 1, Speaker 2, Speaker 3). Rename to "Host", "Guest Name", actual names — labels carry through every export.
3
Export everything you need
Plain text for show notes, DOCX for editing, SRT/VTT for video versions, JSON for developers, AI summary with chapters and key topics.

What You Get From One Upload

Everything you need to ship an episode — show notes, subtitles, blog content, multilingual versions.

Speaker-labeled transcript

Auto-diarization separates host, co-host, and guests. Useful for interview shows and panel formats. Rename speakers to actual names in the editor.

AI-generated show notes draft

Chapter markers, key topics, episode summary. Use as a starting point for your show notes — saves 30-60 min of manual writing per episode.

SRT/VTT subtitles

For YouTube, Vimeo, or video podcast versions. Word-level timestamps; perfectly synced. Upload directly to YouTube using "Add language → Upload file with timing".

Translated transcripts

Optional: translate to any of 133 languages using built-in translation. Reach international audiences without re-recording — Spanish podcast → English subtitles in one workflow.

Why Podcasters Transcribe

Four specific business reasons. Most podcasters end up with all four, not just one.

1. SEO

Search engines can't index audio. A 60-minute episode adds ~9,000 words of indexable content per upload. Compounds across your back catalog. This American Life reportedly saw a 4.36% inbound traffic lift after transcribing their archive (2014, single datapoint — directional only).

2. AI search citation

ChatGPT, Perplexity, Claude, and Google AI Overviews can only cite text. If your episode discusses a topic but exists only as audio, AI engines won't surface it. Transcripts are the only path for podcast content to appear in AI answers.

3. Accessibility

WCAG 2.1 SC 1.2.1 (Level A) requires text alternatives for prerecorded audio. Public-entity podcasts will fall under ADA Title II — deadline extended to April 26, 2027 for entities serving 50,000+ people, April 26, 2028 for smaller. Private podcasts aren't legally required, but transcripts are still expected by deaf and hard-of-hearing listeners.

4. Content repurposing

One episode → show notes (typically 1,500–2,500 words from a 45–60 min episode) → blog post → social clips → newsletter → quote graphics. Most repurposing tools (Castmagic, Riverside Magic Clips, Opus Clips) start from a transcript.

Speaker Identification: Hosts, Co-hosts, Guests

VexaScribe automatically detects different speakers in your podcast and labels them Speaker 1, Speaker 2, Speaker 3. You can rename them in the editor — “Host”, “Guest: Jane Smith”, “Co-host: Dan” — and the labels carry through every export.

AI diarization accuracy depends on conditions

✓2–3 speakers, separate microphones: very high accuracy (~90%+)
●4+ speakers: accuracy drops; occasional mislabeling
✗Overlapping speech: degrades significantly when people talk over each other
✗Similar voices: two male hosts close in pitch can occasionally be confused

State-of-the-art research benchmarks show pyannote 3.1 at 11–19% Diarization Error Rate (DER) on real-world audio. PyannoteAI commercial: 11.2% DER. (Sources: arXiv 2509.26177, picovoice 2025 benchmark.)

Recording on Riverside.fm or similar? Their separate-track recording (each speaker as their own uncompressed 24-bit/48 kHz WAV file) gives you near-perfect speaker ID — diarization isn't even needed because you've already separated speakers at the recording stage. Source: riverside.com.

Cost: AI vs Human vs Self-Transcribing

For most podcasters, AI accuracy at ~95% is good enough — podcasts are SEO and accessibility content, not legal records.

Method	Cost per hour	Time to deliver	Accuracy
VexaScribe (AI)	$0.20–$0.60/hr	5–10 min	~95% on clear audio
Otter.ai (AI)	From $16.99/mo	Real-time	~95%
Sonix (AI)	~$10/hr pay-as-you-go	5–15 min	~95%
Rev AI (AI)	$15/hr ($0.25/min)	5–10 min	~95%
Rev (human)	~$90/hr ($1.50/min)	12–24 hours	99%+
Happy Scribe (human)	~$102/hr ($1.70/min)	4–24 hours	99%+
Self-transcribe	Your time (~4 hrs typing per 1 hr audio)	Days	Variable

The cost difference between AI ($0.20–$0.60/hr on VexaScribe) and human ($90–$150/hr) is 150–450×. Most podcasters use AI and edit any errors manually (proper nouns, brand names, technical jargon).

From Transcript to Show Notes in 5 Minutes

A typical 45–60 minute interview produces around 7,000–9,000 words of transcript. Manually writing show notes from that takes 30–60 minutes. VexaScribe's AI summary feature does it automatically — you get:

✓Episode summary (2-3 paragraphs)
✓Chapter markers with timestamps (great for YouTube and Apple Podcasts chapters)
✓Key topics discussed
✓Notable quotes with timestamps

The AI output is a starting point, not final copy. Most podcasters edit for brand voice, add their own takes, and pull specific quotes manually. But it cuts the writing time roughly in half.

Beyond summaries: AI Chat lets you ask the transcript questions directly — “pull three punchy quotes I can use for social clips,” “what did the guest say about pricing?”, “find the strongest soundbite about productivity.” Answers come back with clickable timestamps that jump the audio player to the exact moment. Available on paid plans.

Chapter Markers for Spotify, Apple Podcasts & YouTube

Chapters help listeners jump to segments they care about (interview start, main topic, Q&A, sponsor break) and reduce drop-off. Most podcasters write chapters by hand from listening back — 10–20 minutes per episode. VexaScribe generates chapter markers with timestamps from the transcript in a single click.

The workflow

Upload episode → get transcript (5–10 min)
Generate AI summary with the Podcast template — chapter markers included in the output
Copy timestamps and titles into your podcast host's chapter editor

Where chapters go, by platform

Platform	How chapters get in
Apple Podcasts	ID3 chapter tags embedded in episode MP3 (via your host's chapter editor) or Apple Podcasts Connect chapter fields
Spotify	Timestamps in episode description auto-parse into clickable jump points; for full chapters, use RSS with `<podcast:chapters>` tag (Podcasting 2.0)
YouTube (video version)	Timestamps in video description auto-create chapter cards on the progress bar; must start with 00:00 and have ≥3 chapters
Buzzsprout / Transistor / Simplecast / Captivate	Paste chapters into the episode edit page; the host writes ID3 tags on your behalf

VexaScribe outputs chapter timestamps in a copy-paste-friendly plain-text format that works with all the destinations above. The Podcasting 2.0 <podcast:chapters> JSON format is on the roadmap.

Bulk Transcription for Backlogs and Agencies

Prolific podcasters with an un-transcribed back catalog, and agencies servicing multiple shows, need to process dozens of episodes at once. VexaScribe supports up to 50 episodes per batch — upload once, walk away, come back to a ZIP archive with every transcript plus AI summaries.

✓50 files per batch upload, mixed audio and video formats
✓ZIP export of all transcripts + AI summaries in one download
✓Same plan minutes — no bulk premium; the Studio plan at $20/mo covers ~100 hours (roughly 100 one-hour episodes)
✓Independent per-file processing — one failing file doesn't block the batch

For very large backlogs (100+ episodes), split into multiple 50-file batches back-to-back. See the bulk transcription feature page for the detailed workflow.

If You Have a Video Version: SRT Subtitles

YouTube version of your podcast? Spotify Video? VexaScribe exports SRT and VTT subtitle files with word-level timestamps from the same transcription. Upload directly to YouTube using Add language → Upload file with timing → select your .srt file.

Platforms that accept SRT/VTT for podcasts

●Buzzsprout — TXT, SRT, VTT
●Transistor — SRT, VTT, TXT, JSON
●Spotify for Creators — VTT, SRT
●Apple Podcasts — SRT/VTT via RSS <podcast:transcript> tag
●YouTube — SRT

For the full SRT workflow, see the SRT generator page.

Transcribing Non-English and Multilingual Podcasts

VexaScribe transcribes in 99 languages. Important caveat about Apple Podcasts: their auto-transcripts only support English, French, Spanish, and German (as of iOS 17.4 launch in March 2024). Spanish, French, German podcasters can rely on Apple's auto-transcripts; everyone else needs to provide their own.

Transcripts can also be translated to any of 133 languages using VexaScribe's built-in translation. Useful for international episodes — record in Spanish, publish English-translated transcript and SRT for English-speaking audiences.

See the transcribe and translate workflow →

Less than $3/month for a Weekly Podcast

A weekly 1-hour podcast is ~4 hours of audio per month. Starter plan covers it with room to spare.

Free trial

30 min total

No credit card

Starter

$2/month

200 min/month

Weekly 1-hr podcast

Basic

$5/month

1,000 min/month

2-3 episodes per week

See all plans, including Pro and Studio →

Frequently Asked Questions

How do I transcribe a podcast episode?

Upload your podcast episode file (MP3, WAV, M4A, or other supported formats) to VexaScribe (formerly NovaScribe). The AI automatically transcribes the audio using OpenAI's Whisper Large-v3, detects different speakers (host, co-host, guests), and generates word-level timestamps. A typical 60-minute episode completes in 5-10 minutes.

Does podcast transcription identify different speakers?

Yes. VexaScribe includes automatic speaker diarization. When your podcast has multiple speakers, the system labels each one separately (Speaker 1, Speaker 2, Speaker 3...). You can rename them in the editor ("Host", "Guest: Jane Smith", "Co-host: Dan") and the labels carry through every export. Accuracy is highest for 2-3 speakers on separate microphones; degrades with overlapping speech and 4+ speakers.

What can I do with my podcast transcript?

Turn one episode into multiple pieces of content: create show notes for your website, repurpose as blog posts (1,500-2,500 words from a 45-60 min episode), extract quotes for social media, generate YouTube captions (SRT/VTT export), improve SEO with searchable text, and make your content accessible to deaf and hard-of-hearing listeners.

How long does it take to transcribe a 1-hour podcast?

A typical 1-hour podcast episode is transcribed in about 5-10 minutes. Processing time depends on audio length, quality, and current load. You can close your browser while processing — the transcript will be waiting in your dashboard when it's ready.

What audio formats work for podcast transcription?

VexaScribe supports all common podcast formats: MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio. Video formats (MP4, MOV, AVI, MKV, WebM) work too — we extract the audio automatically. Files can be up to 5 GB and 10 hours per upload, which covers virtually any podcast episode.

Can I transcribe podcasts in different languages?

Yes. VexaScribe supports 99 languages for transcription with automatic language detection. You can also translate transcripts to any of 133 languages using the built-in translation feature — useful for international podcasts that want English subtitles, or English podcasts reaching non-English audiences.

How much does podcast transcription cost?

VexaScribe plans start at $2/month for 200 minutes. A weekly 1-hour podcast (~4 hours/month) costs less than $3/month on the Starter plan. Compared to human transcription services like Rev ($1.50/min, ~$90/hr) or Happy Scribe ($1.70/min), AI is 150-450× cheaper. For most podcasters, AI accuracy at ~95% on clear audio is more than enough.

Can I export subtitles for YouTube from my podcast transcript?

Yes. VexaScribe exports SRT and VTT subtitle files with word-level timestamps. Upload directly to YouTube using "Add language → Upload file with timing" → select your .srt file. Same files work in Vimeo, Spotify Video, or any video editor (Premiere Pro, Final Cut Pro, DaVinci Resolve).

Does Apple Podcasts already transcribe my show automatically?

Apple Podcasts launched auto-generated transcripts in March 2024 with iOS 17.4 — but only for English, French, Spanish, and German. Auto-transcription rolled out to new episodes first, with the back catalog filling in over time. If you publish in another language, or want to control the exact transcript shown, you'll still need to provide your own. Spotify also added optional creator-uploaded transcripts (VTT/SRT) starting in 2023-24, but it's opt-in, not automatic for all podcasts.

How accurate is AI transcription for podcasts with multiple speakers?

On clear podcast audio with 2-3 speakers on separate microphones, AI transcription accuracy is around 95% (5% Word Error Rate). Speaker diarization is accurate when speakers have distinct voices and don't talk over each other. State-of-the-art research shows pyannote 3.1 achieves 11-19% Diarization Error Rate on real-world audio. Conditions that hurt accuracy: overlapping speech, similar voices, 4+ speakers, low-bitrate audio.

Can I generate chapter markers for Spotify and Apple Podcasts from my transcript?

Yes. Chapter markers with timestamps are one of the outputs when you generate an AI summary from your transcript — pick the Podcast summary type in the summary editor to get a chapter list with times, per-chapter descriptions, and key quotes. Apple Podcasts and Spotify both accept chapter markers via ID3 chapter tags embedded in the episode audio file. Most podcast hosting platforms (Buzzsprout, Transistor, Captivate, Simplecast) let you paste chapter timestamps directly into the episode edit page rather than encoding them into the audio. Chapters help discovery on both platforms and reduce listener drop-off.

Can I transcribe a backlog of old podcast episodes in bulk?

Yes. VexaScribe supports bulk transcription — upload up to 50 episodes at once as a single batch, and download all transcripts plus AI summaries as a ZIP archive when the batch completes. This is the workflow most podcast agencies and prolific podcasters use to catch up on 6-12 months of un-transcribed back catalog in one session. Batch processing uses the same $2-$20/month plan minutes as single uploads. For very large backlogs (100+ episodes), split into multiple batches; the Studio plan at $20/month covers about 100 hours of audio.

Can I get a full SEO blog post from a podcast episode transcript?

Yes, in two steps. First, get the transcript. Second, use the Podcast summary type to generate structured show notes with chapters, key quotes, and speaker discussion points — that's already usable as a blog post skeleton. For a longer piece, use the transcript editor's regenerate feature with the Lecture or General summary type to get a different structural angle, then paste both into your editor of choice. A typical 45-60 minute episode produces 1,500-2,500 words of blog content. Publishing the transcript itself alongside the episode also has meaningful SEO impact — searchable long-tail keywords the audio can't rank for on its own.

How do I get a transcript of any podcast?

Five methods, honest order. (1) Apple Podcasts — since March 2024, most English/French/Spanish/German shows have auto-transcripts. Open the episode, tap the quote-bubble icon at the top-right. Free, no signup, viewer-only (no download). (2) Spotify — creators can opt-in to display transcripts on the episode page; coverage is thin outside major shows. (3) HappyScribe podcast library (podcasts.happyscribe.com) — user-uploaded searchable transcripts, mostly popular English shows. (4) YouTube — if the podcast publishes to YouTube, use /tools/youtube-transcript to pull the auto-caption. (5) Upload the audio yourself to VexaScribe — works for any podcast, any language, produces a downloadable transcript in TXT/DOCX/SRT with speaker labels. Method 5 is the only path that reliably works for non-English shows, obscure podcasts, and any use case that needs the transcript as a file.

Where can I find existing podcast transcripts online?

Four places. (1) Apple Podcasts app — tap the quote-bubble icon on episodes that have auto-transcripts (mostly English/French/Spanish/German post-2024). (2) Spotify — some shows opt-in to display transcripts on the web episode page; check the episode's page on open.spotify.com. (3) HappyScribe's podcast library (podcasts.happyscribe.com) — searchable database of user-contributed podcast transcripts, tilted toward popular English shows. (4) The podcast's own website or Substack — some hosts publish transcripts for SEO or accessibility. If none of these have your show, upload the episode audio here and get a full transcript back in 5-10 minutes.

Does Spotify have podcast transcripts?

Partially, since 2023. Spotify supports creator-uploaded transcripts (VTT/SRT) that display on the web episode page and inside the app on some devices. It's opt-in, not automatic — the creator has to upload the file — so coverage is thin outside major podcasts. Spotify's own official podcasts have transcripts; smaller independent shows usually don't. If Spotify doesn't show a transcript for the episode you want, download the audio (Spotify allows for offline listening on paid plans; RSS-published shows are downloadable directly) and upload here for a full AI transcript in your language.

Does Apple Podcasts have transcripts?

Yes, for many shows since March 2024 (iOS 17.4). Apple auto-generates transcripts for episodes in English, French, Spanish, and German — no creator action required. Look for the quote-bubble icon (💬) in the top-right of the episode player; tap it to view the transcript. Limits: (1) viewer-only, no download or export; (2) English/French/Spanish/German only — other languages don't have auto-transcripts yet; (3) accuracy is Apple's proprietary model, comparable to Otter, not always as accurate as Whisper Large-v3. For non-Apple-supported languages, exportable files, or higher accuracy, upload the audio here.

Transcribe Your Next Episode in 10 Minutes

30 min free, no credit card. Upload one episode and see speaker labels, AI show notes, and SRT export.

How VexaScribe Transcribes Your Podcast

Upload your episode

AI transcribes with speaker labels

Export everything you need

What You Get From One Upload

Speaker-labeled transcript

AI-generated show notes draft

SRT/VTT subtitles

Translated transcripts

Why Podcasters Transcribe

1. SEO

2. AI search citation

3. Accessibility

4. Content repurposing

Speaker Identification: Hosts, Co-hosts, Guests

AI diarization accuracy depends on conditions

Cost: AI vs Human vs Self-Transcribing

From Transcript to Show Notes in 5 Minutes

Chapter Markers for Spotify, Apple Podcasts & YouTube

The workflow

Where chapters go, by platform

Bulk Transcription for Backlogs and Agencies

If You Have a Video Version: SRT Subtitles

Platforms that accept SRT/VTT for podcasts

Transcribing Non-English and Multilingual Podcasts

Less than $3/month for a Weekly Podcast

Free trial

Starter

Basic

Frequently Asked Questions

Transcribe Your Next Episode in 10 Minutes

Related

YouTube transcript downloader

Transcribe audio to text

MP3 to text

WAV to text

Best podcast transcription tools

Bulk transcription

Sermon transcription

Medical transcription

Transcript formatting

Best subtitle generators 2026

Speaker labels — how they work

Transcript to summary

SRT generator

Video to SRT

Video to text

M4A to text

OGG to text

Transcribe Spanish audio

Otter.ai alternatives

Granola alternatives

Captions vs subtitles

How to add subtitles to a video

Interview transcription

Transcribe and translate

How accurate is Whisper?

Whisper transcription