Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →
Audio to Text Converter
Convert audio to text online in 99 languages. Upload any audio or video file — get accurate transcripts with speaker labels, timestamps, and AI summaries in minutes.
VexaScribe (formerly NovaScribe) is a free online audio-to-text converter that transcribes audio and video files into accurate, timestamped text using OpenAI's Whisper Large-v3 model. Upload MP3, WAV, M4A, MP4, MOV, FLAC, and 14 other formats up to 5 GB. Transcripts arrive in 5–10 minutes for a one-hour file with 95% accuracy on clear English audio and support for 99 languages with automatic detection. Free tier includes 30 minutes; paid plans start at $2/month for 200 minutes.
Use VexaScribe to transcribe audio recordings to text from interviews, podcasts, voice memos, lectures, Zoom calls, and dictation. Every transcript includes speaker diarization (Speaker 1, Speaker 2…), word-level timestamps, and an editable transcript view — so the transcription of audio to text is ready to paste, share, or export to TXT, DOCX, SRT, VTT, and JSON.
How to Transcribe Audio to Text
Three steps from upload to finished transcript — audio-to-text transcription with no setup and no software to install.
- 1
Upload your file
Drag and drop or browse for an audio or video file. We accept MP3, WAV, M4A, MP4, MOV, FLAC, OGG, AAC, AIFF, WMA, AVI, MKV, WebM, and 7 more formats. Up to 5 GB and 10 hours per file.
- 2
AI transcribes in minutes
VexaScribe runs OpenAI's Whisper Large-v3 model on your audio. A 60-minute recording typically completes in 5–10 minutes. Close the tab and come back — we'll keep processing.
- 3
Edit, export, share
Review the transcript in our built-in editor. Rename speakers, fix any errors, then export to TXT, DOCX, SRT, VTT, or JSON. Share via link or download.
Supported Audio and Video Formats
17 formats covering virtually every recording device and tool. Files up to 5 GB and 10 hours per upload.
Audio Formats
- MP3Most common
- WAVLossless
- M4AiPhone default
- FLACLossless
- OGGOpen format
- AACApple/streaming
- AIFFPro audio
- WMAWindows
- AMRMobile
- OPUSModern web
Video Formats
- MP4Most common
- MOVApple/QuickTime
- AVIWindows legacy
- MKVHigh-quality
- WebMWeb video
- FLVFlash legacy
- WMVWindows
Audio is extracted automatically from video files. Video itself is not retained after transcription.
Format-specific deep dives: MP3 to text · SRT generator (audio → subtitles) · transcript to summary
What Can You Transcribe?
If it has audio, VexaScribe can transcribe it. Common use cases:
Podcast episodes
Show notes, blog posts, SEO content, searchable archives. Solo and multi-host shows supported with speaker labels.
Interviews
Journalism, qualitative research, HR. Multi-speaker diarization separates interviewer from subject automatically.
Lectures and classes
Students capturing lectures for review. Teachers generating written course notes from recorded sessions.
Meetings
Zoom, Google Meet, Microsoft Teams calls. Upload the recording or send VexaScribe's meeting bot to join.
Phone calls
Sales calls, customer interviews, support recordings. Record on any device, upload, get a transcript with speakers.
Video content
YouTube videos, training videos, course content. Generate SRT/VTT subtitles with word-level timestamps.
Transcribe in 99 Languages — With Automatic Detection
No need to select language manually. VexaScribe auto-detects the spoken language from the audio. Accuracy varies by language tier:
~5% Word Error Rate (highest accuracy)
~8–12% Word Error Rate
+ 73 more languages
Including Welsh, Swahili, Filipino, Bengali, Punjabi, Tamil, Telugu, Marathi, Urdu, Persian, Romanian, Hungarian, Bulgarian, Croatian, and many more. Accuracy varies by language and audio quality.
What You Get With Every Transcript
Every transcription includes these features at no extra cost on every paid plan.
Speaker diarization
Automatic speaker detection and labeling. Multiple speakers appear as Speaker 1, Speaker 2, Speaker 3, and so on. Rename them in the editor (e.g., "Host", "Guest", actual names).
Word-level timestamps
Every word is timestamped to the millisecond. Click any word in the transcript editor to jump to that moment in the audio. Essential for video subtitles and quote verification.
Multiple export formats
TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (developers). All formats available on every paid plan with no upgrade required.
AI summaries
Optional AI-generated summary with key points, decisions, action items, and chapter markers. Available on all paid plans. Useful for meeting notes, podcast show notes, and lecture review.
Audio File to Text — Every Recording Type We Handle
Whatever the source of your recording, VexaScribe converts it from audio to text transcription in the same upload-and-go workflow. Free audio transcription to text starts with 30 minutes on signup — no credit card required.
Voice memos & dictation
iPhone Voice Memos, Android voice recorder, Otter live capture, hardware dictaphones — drop the .m4a, .mp3, or .wav file straight into VexaScribe. The transcription of audio to text preserves punctuation and paragraph breaks so your dictation reads like prose, not a wall of words.
Recorded interviews
Whether your interview was recorded on a Zoom call, a field recorder, a smartphone, or a USB lavalier mic, VexaScribe transcribes audio recordings to text with speaker labels. Two-, three-, and four-person conversations get separated automatically — rename Speaker 1 to the interviewee in the editor and export.
Podcast episodes & raw RSS
Upload the final mix or the raw multitrack stem. Audio-to-text transcription for a 60-minute episode finishes in about 7 minutes with timestamps you can drop into show notes, chapter markers, or YouTube descriptions.
Lecture & meeting recordings
Long-form recordings from classrooms, university lectures, board meetings, town halls, and webinars. Files up to 10 hours and 5 GB work in a single upload — no chunking needed. Transcription from audio to text comes back with auto-generated AI summaries on paid plans.
Phone calls & voicemails
Compressed phone audio (8 kHz mono telephony) is supported. Accuracy lands at 85–92% on clear calls and degrades on heavily compressed VoIP — review noisy stretches in the editor before exporting the audio to text transcript.
WhatsApp & Telegram voice notes
OPUS-encoded voice notes from messaging apps transcribe cleanly. Forward the .ogg or .opus file to yourself, save it, then drag it into VexaScribe to convert audio to text in under a minute for most short messages.
Field recordings & research interviews
Qualitative researchers, journalists, and ethnographers upload long-form .wav or .flac field recordings. The transcript-from-audio-to-text output is timestamped to the second so you can re-listen to any quote, and JSON export ships clean data into NVivo, ATLAS.ti, or MAXQDA.
Court hearings & legal depositions
Multi-speaker legal recordings (depositions, witness statements, hearing audio) work well with VexaScribe's speaker diarization. Outputs preserve verbatim timing — useful when you need to cite a moment in evidence. Always have a certified human verify before filing.
Need a specific format guide? See MP3 to text, WAV to text, M4A to text, or OGG to text.
How Accurate Is VexaScribe Transcription?
VexaScribe (formerly NovaScribe) achieves 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker.
Real-world accuracy varies by audio condition:
- ●Clear podcast audio: 3–6% WER (94–97% accurate)
- ●Noisy interviews, background music: 8–15% WER (85–92% accurate)
- ●Strong accents, technical jargon, multiple overlapping speakers: 10–20% WER (80–90% accurate)
We recommend reviewing transcripts before publishing critical content — no AI tool achieves the 99%+ accuracy of human transcription, but VexaScribe is 20–100× cheaper than human services like Rev ($1.50/min). For a deeper breakdown of Whisper accuracy by model size, language, and audio condition, see How Accurate Is Whisper in 2026?
Simple, Transparent Pricing
Pay for what you use. No per-seat fees, no hidden charges. Cancel anytime.
Starter
200 min/month
Solo creators
Basic
1,000 min/month
Regular podcasters
Pro
2,500 min/month
Heavy use
Frequently Asked Questions
How does VexaScribe transcribe audio to text?
VexaScribe (formerly NovaScribe) uses OpenAI's Whisper Large-v3 model to convert speech to text. Upload an audio or video file, and the AI processes the entire recording — adding speaker labels, word-level timestamps, and optional AI summaries. A 60-minute file typically completes in 5-10 minutes.
What audio and video formats can I transcribe?
VexaScribe accepts MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, we extract the audio track automatically.
How long does it take to transcribe a 1-hour audio file?
Most 1-hour files complete in 5-10 minutes. Processing speed depends on audio quality, current load, and file format. You can close the browser tab and return — the transcript will be waiting in your dashboard when it's ready.
Is VexaScribe free to use?
Yes, you get 30 minutes of transcription free with no credit card required. After the free tier, paid plans start at $2/month for 200 minutes (Starter), $5/month for 1,000 minutes (Basic), $10/month for 2,500 minutes (Pro), and $20/month for 6,000 minutes (Studio). Cancel anytime.
How accurate is VexaScribe transcription?
VexaScribe achieves around 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker. Real-world accuracy varies: clear podcast audio averages 3-6% WER, noisy interviews 8-15% WER, and audio with strong accents or technical jargon 10-20% WER. We recommend reviewing transcripts before publishing critical content.
What languages are supported?
99 languages including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Japanese, Chinese, Korean, Arabic, Turkish, Hindi, Vietnamese, Thai, and many more. Language is detected automatically — no need to select it manually before each upload.
Can I transcribe video files?
Yes. Upload MP4, MOV, AVI, MKV, WebM, FLV, or WMV files and we extract the audio track automatically. The transcript includes timestamps so you can sync with your video editing tool, generate subtitles (SRT/VTT export), or repurpose video content into blog posts.
Does VexaScribe identify multiple speakers?
Yes, automatic speaker diarization is included on every transcript. Multiple speakers are labeled Speaker 1, Speaker 2, Speaker 3, and so on. You can rename speakers in the built-in editor (e.g., "Host", "Guest", actual names) for clarity in the final transcript.
Is my audio data private and secure?
Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We do not train AI models on your audio. We do not sell user data. You can delete files at any time from your dashboard, and account deletion is self-serve.
How do I export the transcript?
VexaScribe exports to TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (structured data for developers). All formats are available on every paid plan. SRT and VTT include word-level timestamps for video editors.
How do I transcribe audio recordings to text?
Sign up at VexaScribe (30 minutes free, no credit card), drag your recording onto the upload area, and wait — the audio-to-text transcription completes in roughly 8–12% of the file's duration. A 60-minute recording is ready in 5–10 minutes. The output includes speaker labels, word-level timestamps, and an editable transcript view, and you can export the audio to text transcript as TXT, DOCX, SRT, VTT, or JSON. Works for voice memos, interviews, podcasts, lectures, Zoom calls, and phone recordings.
Is there a free audio transcription to text option?
Yes. Every new account gets 30 minutes of free audio transcription to text with all features enabled — speaker diarization, 99-language support, timestamps, and export to TXT, DOCX, SRT, VTT, and JSON. No credit card required to start. After the free tier, paid plans start at $2/month for 200 minutes (about $0.01 per minute), which is significantly cheaper than typical pay-per-minute transcription services charging $0.10–$0.25 per minute.
What's the difference between an audio file to text converter and a transcription service?
A bare audio-to-text converter usually returns a wall of raw text with no speakers, no timestamps, and no editor — you have to clean it up yourself. A transcription service like VexaScribe returns a structured transcript: speakers are labeled (Speaker 1, Speaker 2…), every word is timestamped to the second, the text is paragraph-broken for readability, you can edit and re-export in-browser, and AI summaries with action items are generated automatically on paid plans. Same upload, much more usable output.
Learn more
MP3 to text
The most common consumer audio format — bitrate guide inside
WAV to text
Uncompressed PCM — when WAV actually beats MP3 for accuracy
Whisper transcription
Hosted Whisper Large-v3 — 99 languages, no GPU
SRT generator
Generate .srt subtitle files with timestamps
Video to SRT
4-step workflow — upload video, get .srt subtitles
Video to text
Plain-text transcripts from any video file — MP4, MOV, MKV, WebM
MP4 to text
MP4 video to TXT, DOCX, JSON transcript
M4A to text
iPhone Voice Memos to transcript — 30-min free trial
OGG to text
WhatsApp voice notes, Discord recordings, Linux audio
Transcribe Spanish audio
All regional dialects + Spanish-to-English translation
Transcrever áudio em texto (Português)
Brazilian Portuguese guide — Whisper Tier 1, LGPD-friendly, BRL pricing
Transcription for qualitative research
Methodology, IRB, CAQDAS — for academic researchers
How to add subtitles to a video
Step-by-step guide: YouTube, Premiere, CapCut, iPhone
Interview transcription
For researchers, journalists, podcasters & HR — workflow + cost math
Lecture transcription
For students, MOOC learners & academics — AI study guides + 99 languages
Speaker labels — how they work
Pipeline mechanics, DER benchmarks, SRT/VTT/DOCX format examples
YouTube transcript downloader
Paste a YouTube URL, download SRT/VTT/TXT in seconds
TikTok transcript generator
Paste a TikTok URL, get transcript in 6 formats
Instagram transcript generator
Reels, Posts, IGTV — get transcript in 6 formats
Transcribe and translate
Translate transcripts in 133 languages
How accurate is Whisper?
WER benchmarks across LibriSpeech & FLEURS
AI transcription — full guide
How it works, accuracy, tools landscape, pricing models
13 Best transcription software 2026
13 tools tested — Otter, VexaScribe, Rev, Descript, Granola, AssemblyAI, Deepgram
Pricing
All plans, side-by-side
Editorial standards
How we test and disclose