Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

Transcribe Audio to Text — Free AI Transcription in 99 Languages

Upload any audio or video file and get accurate transcripts with speaker labels, timestamps, and 5 export formats. Free 30-minute trial — no credit card.

VexaScribe (formerly NovaScribe) is a free online audio-to-text converter that transcribes audio and video files into accurate, timestamped text using OpenAI's Whisper Large-v3 model. Upload MP3, WAV, M4A, MP4, MOV, FLAC, and 14 other formats up to 5 GB. Transcripts arrive in 5–10 minutes for a one-hour file with 95% accuracy on clear English audio and support for 99 languages with automatic detection. Free tier includes 30 minutes; paid plans start at $2/month for 200 minutes.

Use VexaScribe to transcribe audio recordings to text from interviews, podcasts, voice memos, lectures, Zoom calls, and dictation. Every transcript includes speaker diarization (Speaker 1, Speaker 2…), word-level timestamps, and an editable transcript view — so the transcription of audio to text is ready to paste, share, or export to TXT, DOCX, SRT, VTT, and JSON.

30 minutes freeNo credit card99 languagesSpeaker labels

How to Transcribe Audio to Text

Three steps from upload to finished transcript — audio-to-text transcription with no setup and no software to install.

1
Upload your file
Drag and drop or browse for an audio or video file. We accept MP3, WAV, M4A, MP4, MOV, FLAC, OGG, AAC, AIFF, WMA, AVI, MKV, WebM, and 7 more formats. Up to 5 GB and 10 hours per file.
2
AI transcribes in minutes
VexaScribe runs OpenAI's Whisper Large-v3 model on your audio. A 60-minute recording typically completes in 5–10 minutes. Close the tab and come back — we'll keep processing.
3
Edit, export, share
Review the transcript in our built-in editor. Rename speakers, fix any errors, then export to TXT, DOCX, SRT, VTT, or JSON. Share via link or download.

Supported Audio and Video Formats

17 formats covering virtually every recording device and tool. Files up to 5 GB and 10 hours per upload.

Audio Formats

MP3Most common
WAVLossless
M4AiPhone default
FLACLossless
OGGOpen format
AACApple/streaming
AIFFPro audio
WMAWindows
AMRMobile
OPUSModern web

Video Formats

MP4Most common
MOVApple/QuickTime
AVIWindows legacy
MKVHigh-quality
WebMWeb video
FLVFlash legacy
WMVWindows

Audio is extracted automatically from video files. Video itself is not retained after transcription.

File limits: 5 GB per file, 10 hours per file. No monthly upload limit beyond your plan's included minutes.

Format-specific deep dives: MP3 to text · SRT generator (audio → subtitles) · transcript to summary

What Can You Transcribe?

If it has audio, VexaScribe can transcribe it. Common use cases:

Podcast episodes

Show notes, blog posts, SEO content, searchable archives. Solo and multi-host shows supported with speaker labels.

Interviews

Journalism, qualitative research, HR. Multi-speaker diarization separates interviewer from subject automatically.

Lectures and classes

Students capturing lectures for review. Teachers generating written course notes from recorded sessions.

Meetings

Zoom, Google Meet, Microsoft Teams calls. Upload the recording or send VexaScribe's meeting bot to join.

Phone calls

Sales calls, customer interviews, support recordings. Record on any device, upload, get a transcript with speakers.

Video content

YouTube videos, training videos, course content. Generate SRT/VTT subtitles with word-level timestamps.

Audio File to Text — Every Format, One Converter

If you think of this job as "converting a file" rather than "transcribing audio" — same thing, same tool. Pick the file, upload it (or paste a Google Drive share link / direct audio URL and skip the upload), and download the text. One conversion pass produces all five output formats: TXT, DOCX, SRT, VTT, and JSON — no re-processing per format, unlike single-format converter sites.

Audio formats (10)

MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS — up to 5 GB and 10 hours per file. For WhatsApp voice notes specifically, see WhatsApp transcription.

Video formats (7)

MP4, MOV, AVI, MKV, WebM, FLV, WMV — the audio track is extracted automatically, no manual conversion step.

File size caps compared (verified July 2026)

VexaScribe accepts files up to 5 GB. For comparison: Canva's free converter caps at 4.5 MB (~4 minutes of MP3), Zamzar at 200 MB free, OpenAI's Whisper API at 25 MB. If your file is a multi-hour lecture, all-day meeting export, or uncompressed WAV master, most converter sites force you to split it first — here you don't.

Only need a one-off conversion? The free 30-minute trial covers most single files with no credit card.

Transcribe in 99 Languages — With Automatic Detection

No need to select language manually. VexaScribe auto-detects the spoken language from the audio. Accuracy varies by language tier:

Tier 1

~5% Word Error Rate (highest accuracy)

EnglishSpanishFrenchGermanItalianPortugueseDutchPolishRussianJapanese

Tier 2

~8–12% Word Error Rate

ArabicChineseKoreanHindiTurkishVietnameseThaiIndonesianHebrewCzechSwedishNorwegianDanishFinnishGreekUkrainian

+ 73 more languages

Including Welsh, Swahili, Filipino, Bengali, Punjabi, Tamil, Telugu, Marathi, Urdu, Persian, Romanian, Hungarian, Bulgarian, Croatian, and many more. Accuracy varies by language and audio quality.

What You Get With Every Transcript

Every transcription includes these features at no extra cost on every paid plan.

Speaker diarization

Automatic speaker detection and labeling. Multiple speakers appear as Speaker 1, Speaker 2, Speaker 3, and so on. Rename them in the editor (e.g., "Host", "Guest", actual names).

Word-level timestamps

Every word is timestamped to the millisecond. Click any word in the transcript editor to jump to that moment in the audio. Essential for video subtitles and quote verification.

Multiple export formats

TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (developers). All formats available on every paid plan with no upgrade required.

AI summaries

Optional AI-generated summary with key points, decisions, action items, and chapter markers. Available on all paid plans. Useful for meeting notes, podcast show notes, and lecture review.

AI Chat

Ask the transcript natural-language questions and get answers with clickable timestamps that jump to the exact moment in the audio. Citations are validated server-side against the actual transcript text — no hallucinated quotes. Available on paid plans; powered by OpenAI models.

Audio File to Text — Every Recording Type We Handle

Whatever the source of your recording, VexaScribe converts it from audio to text transcription in the same upload-and-go workflow. Free audio transcription to text starts with 30 minutes on signup — no credit card required.

Voice memos & dictation

iPhone Voice Memos, Android voice recorder, Otter live capture, hardware dictaphones — drop the .m4a, .mp3, or .wav file straight into VexaScribe. The transcription of audio to text preserves punctuation and paragraph breaks so your dictation reads like prose, not a wall of words.

Recorded interviews

Whether your interview was recorded on a Zoom call, a field recorder, a smartphone, or a USB lavalier mic, VexaScribe transcribes audio recordings to text with speaker labels. Two-, three-, and four-person conversations get separated automatically — rename Speaker 1 to the interviewee in the editor and export.

Podcast episodes & raw RSS

Upload the final mix or the raw multitrack stem. Audio-to-text transcription for a 60-minute episode finishes in about 7 minutes with timestamps you can drop into show notes, chapter markers, or YouTube descriptions.

Lecture & meeting recordings

Long-form recordings from classrooms, university lectures, board meetings, town halls, and webinars. Files up to 10 hours and 5 GB work in a single upload — no chunking needed. Transcription from audio to text comes back with auto-generated AI summaries on paid plans.

Phone calls & voicemails

Compressed phone audio (8 kHz mono telephony) is supported. Accuracy lands at 85–92% on clear calls and degrades on heavily compressed VoIP — review noisy stretches in the editor before exporting the audio to text transcript.

WhatsApp & Telegram voice notes

OPUS-encoded voice notes from messaging apps transcribe cleanly. Forward the .ogg or .opus file to yourself, save it, then drag it into VexaScribe to convert audio to text in under a minute for most short messages.

Field recordings & research interviews

Qualitative researchers, journalists, and ethnographers upload long-form .wav or .flac field recordings. The transcript-from-audio-to-text output is timestamped to the second so you can re-listen to any quote, and JSON export ships clean data into NVivo, ATLAS.ti, or MAXQDA.

Court hearings & legal depositions

Multi-speaker legal recordings (depositions, witness statements, hearing audio) work well with VexaScribe's speaker diarization. Outputs preserve verbatim timing — useful when you need to cite a moment in evidence. Always have a certified human verify before filing.

Need a specific format guide? See MP3 to text, WAV to text, M4A to text, or OGG to text. For phone voicemails specifically, see /voicemail-to-text.

Voice memo transcription (iPhone, Android, Google Recorder)

Voice memos from the iPhone Voice Memos app, Samsung Voice Recorder, Google Recorder (Pixel), and other stock voice-recording apps all export as .m4a, .amr, or .3gp — every one of which VexaScribe accepts directly. Typical accuracy for a clear voice memo (single speaker, phone held near mouth, quiet environment) is 94-97%, and processing time is 5-15 seconds for a 1-minute memo.

How to transcribe an iPhone voice memo

Open the Voice Memos app on your iPhone.
Tap the memo you want to transcribe.
Tap the three-dot menu, then Share.
Choose Save to Files (or AirDrop / Mail).
Open the .m4a file on your computer and drop it into the uploader on this page.

iOS 18 added native voice-memo transcription — it's English-only for the polished view and shows the transcript inside the Voice Memos app but doesn't give you a portable .txt or .docx file. Use this page instead when you want a proper transcript file, speaker labels for multi-person memos, or transcription in any of 98 non-English languages.

How to transcribe an Android voice memo

Behavior varies by phone maker, but the reliable path on every Android is:

Open your voice-recording app (Samsung Voice Recorder, Google Recorder, or the OEM equivalent).
Tap the recording, tap the three-dot menu, choose Share or Save.
The file exports as .m4a (most modern apps), .amr (older Samsung), or .3gp (very old devices).
Move the file to your computer (email to yourself, Google Drive, or USB) and drop it into the uploader here.

Google Recorder on Pixel already provides on-device transcription in English, plus a growing list of other languages depending on Pixel model. Use this page when you want a portable file for other tools, speaker labels, or a language Google Recorder doesn't support yet.

Sample voice-memo transcript

[00:00:02] Note to self — I need to email Sarah about the
[00:00:07] project timeline before Friday. Also, remind me to
[00:00:12] pick up the dry cleaning on the way home.

Meeting recording transcription (Zoom, Google Meet, Teams)

Post-meeting transcription for cloud recordings from Zoom, Google Meet, Microsoft Teams, and any other conferencing tool that gives you an .mp4 or .m4a download. Automatic speaker diarization tags 2-8 distinct speakers, and you can rename them to real names in the editor. A typical 60-minute meeting finishes in 5-10 minutes at 91-95% accuracy with laptop mics.

Zoom cloud recordings

Zoom cloud recordings download as .mp4 (audio+video) or .m4a (audio-only) depending on your account settings. Either works — we extract the audio track from the .mp4 automatically. For a shorter upload, choose "Audio only (M4A)" in your Zoom account's recording settings before recording future meetings.

Google Meet cloud recordings

Google Meet recordings save to the meeting organizer's Google Drive as .mp4. Right-click the file → Download, then drop the .mp4 into the uploader here. Meet also produces its own transcript when the recording feature is enabled, but it's English-primary and doesn't handle multi-language meetings well — re-transcribing with Whisper Large-v3 catches names and technical terms Google Meet's built-in transcription typically misses.

Microsoft Teams recordings

Teams meeting recordings appear in the meeting chat as .mp4 (or occasionally .m4a). Click the recording, choose Download, and drop the file here. Teams also has built-in Live Transcription — if the organizer enabled it during the call, the transcript downloads as .docx. Whisper Large-v3 typically outperforms Teams' Live Transcription on accented speech and technical vocabulary, so re-transcribing is worth it for high-stakes meetings.

Sample 4-speaker meeting transcript

[00:03:14] Speaker 1: So the main question is timeline.
[00:03:18] Speaker 2: Right, and I think we can hit Q3 if
[00:03:22] Speaker 2: engineering is ready by end of July.
[00:03:26] Speaker 3: We are, but design has to lock scope
[00:03:31] Speaker 3: by next Friday, otherwise slip.
[00:03:36] Speaker 4: Agreed. I'll get scope docs to design tomorrow.

For live meeting transcription (during the call, not after), see /meeting-transcription — that page uses a meeting bot that joins the call as a participant.

Podcast episode transcription

One-off transcription of a specific podcast episode. Drop the .mp3 (from your podcast host's download link, or exported from Apple/Spotify/Google Podcasts), get back a full transcript with host/guest speaker labels in 5-10 minutes for a 60-minute episode. Typical accuracy for podcast-quality audio (dedicated mics, treated rooms) is 93-96%.

Common podcast workflows

●Show notes from timestamps. The transcript comes with segment-level timestamps every few seconds. Search for the phrase you want to quote, copy the timestamp, drop it into your show notes as a chapter marker or callout.
●SEO blog post from an episode. Take the 60-minute transcript, cut it to a 1,500-2,500-word article with H2 sections from the natural topic changes. One episode becomes one long-form post plus 3-5 social clips.
●Direct quote extraction. Journalists and researchers cite podcast episodes by pulling verbatim quotes with timestamps. Search the transcript in-editor, cite [00:14:23-00:14:47] alongside the quote.
●Accessibility. Publish the transcript alongside the audio for hearing-impaired listeners. Required in many jurisdictions for publicly funded podcasts (US ADA, EU Accessibility Act from 2025).

For an ongoing podcast workflow — auto-transcribing every new episode from an RSS feed — see /podcast-transcription. Same engine, plus per-episode automation.

Interview transcription

Two-speaker interviews (interviewer + subject) are among the highest-accuracy use cases — Whisper Large-v3 hits 94-97% on clean interview audio and speaker diarization cleanly separates interviewer from subject in 95%+ of cases. Field recorders (Zoom H1, H4n, RØDECaster), phone recordings, and Zoom-call interviews all work.

Recording setup tips for interviews

●Record each speaker on a separate track when possible. Multi-track recorders (Zoom H-series, RØDECaster, most DAWs) let you upload each speaker's WAV separately for near-perfect diarization.
●Use lavalier mics close to each speaker. Even $30 USB lavaliers dramatically outperform laptop built-in mics — the accuracy jump is 5-10 percentage points.
●Avoid overlapping speech. When two speakers talk over each other, diarization can misattribute the crossover. Coach interview subjects to wait a beat before responding — matters more than mic quality.

For qualitative research

Researchers using NVivo, ATLAS.ti, or MAXQDA can export the transcript as JSON (with word-level timestamps) or DOCX (with segment-level timestamps and speaker labels formatted for direct import). Verbatim transcription for research coding typically requires 10-15 minutes of light editing per interview hour to fix proper nouns and confirm turn boundaries.

For qualitative-research-specific workflow guidance, see /transcription-for-qualitative-research.

For journalism

Journalists need clean verbatim transcripts with reliable timestamps for citing quotes. Speaker labels stay stable across the whole interview, and the timestamp on any quoted phrase lets you double-check the audio before publication. For high-stakes reporting (legal claims, direct quotes with named subjects), have a colleague spot-check the transcript against the audio at the timestamps you plan to cite — AI at 95% still misses one word in twenty.

How VexaScribe compares to other audio-to-text tools

A quick side-by-side against the tools most people also evaluate when they search “transcribe audio to text”. Verified July 2026.

Feature	VexaScribe	HappyScribe	Otter	Rev (AI)
Free tier	30 min at signup	10 min AI trial	300 min/mo	45 min/mo
Max file size	5 GB	No hard limit stated	4 hr/file	Standard file caps
Languages	99 (auto-detect)	60+ free / 150+ paid	5	15
Speaker labels	Every plan free	Included	Included	Included
Effective cost/hr	$0.20–$0.60	~$5–$17	~$3.40 (Pro)	$15 ($0.25/min)
Signup required	Yes (email, no card)	Yes	Yes	Yes

Sources: HappyScribe pricing, Otter pricing, Rev pricing, VexaScribe pricing (all verified July 28, 2026). Effective cost/hr is calculated from each vendor's cheapest paid plan divided by included minutes.

Honest picks: Otter is the better choice if your primary need is live meeting captions rather than uploaded-file transcription. Rev is the better choice when you need a certified-human upgrade path on the same platform. For everything else — long files, 99 languages, developer JSON exports, lowest cost per hour — VexaScribe wins on our own comparison. For a broader 13-tool ranking see our Best transcription software 2026 analysis.

How Accurate Is VexaScribe Transcription?

VexaScribe (formerly NovaScribe) achieves 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker.

Real-world accuracy varies by audio condition:

●Clear podcast audio: 3–6% WER (94–97% accurate)
●Noisy interviews, background music: 8–15% WER (85–92% accurate)
●Strong accents, technical jargon, multiple overlapping speakers: 10–20% WER (80–90% accurate)

We recommend reviewing transcripts before publishing critical content — no AI tool achieves the 99%+ accuracy of human transcription, but VexaScribe is 20–100× cheaper than human services like Rev ($1.50/min). For a deeper breakdown of Whisper accuracy by model size, language, and audio condition, see How Accurate Is Whisper in 2026?

Methodology: Word Error Rate (WER) is calculated as (Substitutions + Insertions + Deletions) / Total Words. We use the industry-standard formula. See our editorial standards for full testing methodology.

Simple, Transparent Pricing

Pay for what you use. No per-seat fees, no hidden charges. Cancel anytime.

Starter

$2/month

200 min/month

Solo creators

Basic

$5/month

1,000 min/month

Regular podcasters

Pro

$10/month

2,500 min/month

Heavy use

See all plans, including Studio and Team →

Frequently Asked Questions

How does VexaScribe transcribe audio to text?

VexaScribe (formerly NovaScribe) uses OpenAI's Whisper Large-v3 model to convert speech to text. Upload an audio or video file, and the AI processes the entire recording — adding speaker labels, word-level timestamps, and optional AI summaries. A 60-minute file typically completes in 5-10 minutes.

What audio and video formats can I transcribe?

VexaScribe accepts MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, we extract the audio track automatically.

How long does it take to transcribe a 1-hour audio file?

Most 1-hour files complete in 5-10 minutes. Processing speed depends on audio quality, current load, and file format. You can close the browser tab and return — the transcript will be waiting in your dashboard when it's ready.

Is VexaScribe free to use?

Yes, you get 30 minutes of transcription free with no credit card required. After the free tier, paid plans start at $2/month for 200 minutes (Starter), $5/month for 1,000 minutes (Basic), $10/month for 2,500 minutes (Pro), and $20/month for 6,000 minutes (Studio). Cancel anytime.

How accurate is VexaScribe transcription?

VexaScribe achieves around 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker. Real-world accuracy varies: clear podcast audio averages 3-6% WER, noisy interviews 8-15% WER, and audio with strong accents or technical jargon 10-20% WER. We recommend reviewing transcripts before publishing critical content.

What languages are supported?

99 languages including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Japanese, Chinese, Korean, Arabic, Turkish, Hindi, Vietnamese, Thai, and many more. Language is detected automatically — no need to select it manually before each upload.

Can I transcribe video files?

Yes. Upload MP4, MOV, AVI, MKV, WebM, FLV, or WMV files and we extract the audio track automatically. The transcript includes timestamps so you can sync with your video editing tool, generate subtitles (SRT/VTT export), or repurpose video content into blog posts.

Does VexaScribe identify multiple speakers?

Yes, automatic speaker diarization is included on every transcript. Multiple speakers are labeled Speaker 1, Speaker 2, Speaker 3, and so on. You can rename speakers in the built-in editor (e.g., "Host", "Guest", actual names) for clarity in the final transcript.

Is my audio data private and secure?

Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We do not train AI models on your audio. We do not sell user data. You can delete files at any time from your dashboard, and account deletion is self-serve.

How do I export the transcript?

VexaScribe exports to TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (structured data for developers). All formats are available on every paid plan. SRT and VTT include word-level timestamps for video editors.

How do I transcribe audio recordings to text?

Sign up at VexaScribe (30 minutes free, no credit card), drag your recording onto the upload area, and wait — the audio-to-text transcription completes in roughly 8–12% of the file's duration. A 60-minute recording is ready in 5–10 minutes. The output includes speaker labels, word-level timestamps, and an editable transcript view, and you can export the audio to text transcript as TXT, DOCX, SRT, VTT, or JSON. Works for voice memos, interviews, podcasts, lectures, Zoom calls, and phone recordings.

Is there a free audio transcription to text option?

Yes. Every new account gets 30 minutes of free audio transcription to text with all features enabled — speaker diarization, 99-language support, timestamps, and export to TXT, DOCX, SRT, VTT, and JSON. No credit card required to start. After the free tier, paid plans start at $2/month for 200 minutes (about $0.01 per minute), which is significantly cheaper than typical pay-per-minute transcription services charging $0.10–$0.25 per minute.

What's the difference between an audio file to text converter and a transcription service?

A bare audio-to-text converter usually returns a wall of raw text with no speakers, no timestamps, and no editor — you have to clean it up yourself. A transcription service like VexaScribe returns a structured transcript: speakers are labeled (Speaker 1, Speaker 2…), every word is timestamped to the second, the text is paragraph-broken for readability, you can edit and re-export in-browser, and AI summaries with action items are generated automatically on paid plans. Same upload, much more usable output.

How do I convert an audio file to text?

Three steps, converter-style. (1) Pick your audio file — MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, or OPUS, up to 5 GB per file — and drag it into VexaScribe, or paste a Google Drive share link / direct audio URL and skip the upload. (2) The AI converts speech to text in 5-10 minutes for a 1-hour file, with speaker labels and timestamps added automatically. (3) Download the text as TXT, DOCX, SRT, VTT, or JSON. Unlike single-format converter sites, one upload produces all five output formats — no re-processing per format.

What audio file formats can be converted to text?

All ten common audio containers: MP3 (the most common), WAV (uncompressed studio/voice-recorder), M4A (iPhone voice memos, AAC), FLAC (lossless), OGG (WhatsApp voice notes use Opus-in-OGG), AAC, AIFF (Mac audio), WMA (legacy Windows), AMR (old phone recordings), and OPUS. Video containers work too — MP4, MOV, AVI, MKV, WebM, FLV, WMV — with the audio track extracted automatically. If you have a format-specific question, see our dedicated guides: MP3 to text, WAV to text, M4A to text, OGG to text, MP4 to text.

Is there a free audio file to text converter?

Yes — VexaScribe converts your first 30 minutes free with no credit card, including speaker labels and all export formats. That's enough for most one-off conversions. Other free options have tighter catches: Canva's converter caps files at 4.5 MB (about 4 minutes of MP3), TurboScribe's free tier allows 3 files/day at 30 minutes each, and Otter's Basic tier caps recordings at 30 minutes with limited file imports. For unlimited free conversion, self-hosted OpenAI Whisper costs nothing but requires Python and a GPU. See our full free transcription comparison for every option's real limit.

How do I transcribe a voice memo from my iPhone?

Open the Voice Memos app on your iPhone. Tap the voice memo you want to transcribe. Tap the three-dot menu, then Share. Choose Save to Files (or AirDrop / Mail / Files). The recording saves as an .m4a. Drop the .m4a into VexaScribe here — the transcript comes back in about 5-15 seconds for a typical 1-minute memo, complete with timestamps and paragraph breaks. iOS 18 added native voice-memo transcription, but it's English-only for the polished view and doesn't produce a portable text file for export. Our tool works in 99 languages, produces a portable transcript in TXT/DOCX/JSON, and handles memos of any length up to 4 hours.

Does the iPhone Voice Memos app already transcribe? Why use this?

iOS 18 added on-device voice-memo transcription — English only for the polished experience, viewable in the Voice Memos app but not directly exportable as a text file. Use ours when: (1) the memo is in a language other than English (Turkish, Spanish, Mandarin, Arabic, and 95 others); (2) you want a clean .txt or .docx to paste into a note-taking app or CMS; (3) you want speaker labels for multi-person recordings (iOS doesn't diarize); (4) you're on iOS 17 or older; (5) accuracy matters — Whisper Large-v3 typically outperforms Apple's on-device model by 5-10 percentage points on accented speech and noisy audio.

Can I transcribe a meeting recording (Zoom, Google Meet, Teams)?

Yes. Zoom cloud recordings download as .mp4 (audio+video) or .m4a (audio-only); drop either into VexaScribe. Google Meet cloud recordings download as .mp4. Microsoft Teams recordings download as .mp4 from the meeting chat. All work identically — we extract the audio track from the video container and run Whisper with speaker diarization. For a typical 60-minute meeting with 4-6 speakers, the transcript is ready in 5-10 minutes and each speaker is tagged Speaker 1, Speaker 2, etc. Rename them to real names in the editor and the change applies across the whole transcript. For live meeting transcription (during the call, not after), see /meeting-transcription — that page uses a meeting bot that joins the call.

What about podcast transcription — is this the right tool?

Yes for one-off transcription of a specific episode. Drop the .mp3 (from your podcast host's download link, or from Apple/Spotify/Google Podcasts if you exported the audio) and the transcript comes back with host/guest speaker labels. For an ongoing podcast workflow — auto-transcribing every new episode as it publishes — /podcast-transcription covers the RSS integration and per-episode automation. Both use the same Whisper Large-v3 engine and produce the same output quality. A typical 60-minute podcast episode with two speakers hits 93-96% accuracy after 5-10 minutes of processing.

Can I transcribe an interview with multiple speakers?

Yes — this is one of the highest-accuracy use cases for the tool. Two-speaker interviews (interviewer + subject) with clean mics hit 94-97% accuracy. Speaker diarization automatically labels each turn Speaker 1 / Speaker 2 — rename to real names in the editor. For qualitative research interviews requiring verbatim transcription and coding, see /transcription-for-qualitative-research (same engine, extra workflow guidance for NVivo/ATLAS.ti/MAXQDA import). For legal depositions or news journalism where verbatim accuracy is required for direct quoting, budget 10-15 minutes of review per interview hour to fix proper nouns and confirm speaker attribution.

How is this different from /voicemail-to-text?

Same Whisper Large-v3 engine, different UX and target user. This page (/transcribe-audio-to-text) is the general audio-to-text tool — voice memos, meetings, podcasts, interviews, any audio file. /voicemail-to-text is specialized for phone voicemails: it includes per-carrier export instructions (iPhone Visual Voicemail, Android voicemail apps, Google Voice, business VoIP), an accuracy comparison against carrier transcription, and short-form-audio-optimized UX. If your audio is a voicemail, /voicemail-to-text has better instructions. If it's a voice memo, meeting, podcast, or any other audio, this page is faster.

How do I get a transcript from an audio file?

Three steps. (1) Upload your file — MP3, WAV, M4A, OPUS, OGG, FLAC, AAC, AIFF, WMA, or AMR, up to 5 GB and 10 hours per file. Or paste a Google Drive share link. (2) Whisper Large-v3 processes the audio in about 8-12% of the recording's length — a 60-minute file returns a transcript in 5-10 minutes. Speaker diarization, timestamps, and paragraph breaks are added automatically. (3) Download the transcript object as TXT, DOCX, SRT, VTT, or JSON. The transcript stays in your dashboard for re-download or export in additional formats without re-processing. 30 minutes free, no credit card required to start.

What's the difference between an audio-to-text converter and a transcript generator?

Framing, not technology — same speech recognition underneath. An 'audio-to-text converter' typically refers to a quick single-format utility: paste or upload audio, get a text blob back, done. A 'transcript generator' produces a structured transcript object: speaker labels (Speaker 1, Speaker 2), per-word timestamps synced to the audio, paragraph breaks by pause or speaker change, and an editable in-browser view before you download. VexaScribe produces the transcript object by default. Every plan — including the free 30-minute trial — exports in TXT, DOCX, SRT, VTT, and JSON from a single upload, so you don't re-process to switch formats.

How do I transcribe a voice recording to text?

Same 3 steps as any audio file. (1) Save the voice recording as a file on your computer — from a phone recorder app, dedicated audio recorder (Sony ICD, Zoom H1n, Tascam DR series), or browser-based recorder. (2) Drop the file (MP3, M4A, WAV, OPUS, OGG, FLAC — up to 5 GB, 10 hours) into VexaScribe. (3) Wait 5-10 minutes per audio hour — Whisper Large-v3 runs at 4-10× real-time. Download the transcript in TXT, DOCX, SRT, VTT, or JSON. Works on recordings from any device, in 99 languages, with speaker labels and word-level timestamps. First 30 minutes free, no credit card.

What's the best software to transcribe a voice recording?

Depends on your priorities. For publication-quality quotes: Rev (human) at 99%+ accuracy, $1.99/min. For best AI value: VexaScribe at 93-95% Whisper Large-v3, $2-20/month, EU hosting. For meeting-note UX: Otter at ~88% English, $16.99/mo Pro. For transcript-first editing: Descript at 95%, $24-45/mo. For EU privacy + human option: HappyScribe at 92% AI or 99% human, €15-72/mo. For developer API: Deepgram Nova-3 or AssemblyAI Universal-2 at $0.0043-0.006/min. The honest cheap default for one-off recordings is VexaScribe's 30-min free trial; for ongoing needs, pick by budget and privacy requirement.

Can I transcribe a voice recording for free?

Three legitimate free paths. (1) VexaScribe 30-minute free trial — one-time, no credit card, all export formats (TXT/DOCX/SRT/VTT/JSON), speaker labels, 99 languages. Enough for most one-off recordings. (2) OpenAI Whisper self-hosted — free forever with Python + a GPU (or slow CPU). Best privacy since nothing leaves your machine. (3) Browser-based free tools with tighter limits — TurboScribe free tier (3 files/day), HappyScribe (10 min free), Otter Basic (30 min recording limit). For continuous free use, self-hosted Whisper is the only option; for a one-time transcription, the VexaScribe trial covers a typical 25-30 minute recording end-to-end.

How do I turn a voice recording into text on my computer?

Two paths. Web-based (recommended): drop the recording file into any browser-based transcription tool (VexaScribe, Otter web, Rev web, HappyScribe web). Works on any operating system — Windows, macOS, Linux, ChromeOS. No installation, output ready in 5-10 minutes for a 60-minute recording. Desktop app (offline): install OpenAI Whisper via Python and run locally, or use whisper.cpp (C++ port) which runs on CPU without Python. Requires more setup but produces transcripts without the audio ever leaving your machine — best for sensitive content. VexaScribe's web workflow covers most needs; whisper.cpp is the honest recommendation when privacy is paramount.

What's the difference between voice recording software and transcription software?

Two different jobs. Voice recording software captures audio from a microphone (Windows Voice Recorder, Audacity, Voice Memos on iPhone/Mac, Samsung Voice Recorder on Android). It saves an audio file — but doesn't convert it to text. Transcription software takes an existing audio file and produces text (VexaScribe, Rev, Otter, Whisper). Modern all-in-one tools do both — Otter records + transcribes live, Descript imports + transcribes. For a workflow where you already record on your phone and want text later, use any recording app + a dedicated transcription tool (upload the file). For live meeting note-taking, an all-in-one bot (Otter, Fireflies, Fathom) or dedicated meeting-note-taker page is a better fit.

Start Transcribing in 30 Seconds

30 minutes of free transcription, no credit card required. Upload any audio file and see the result yourself.

How to Transcribe Audio to Text

Upload your file

AI transcribes in minutes

Edit, export, share