Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →
Transcribe Audio to Text
AI-powered transcription in 99 languages. Upload any audio or video file — get accurate text with speaker labels, timestamps, and AI summaries in minutes.
VexaScribe (formerly NovaScribe) converts audio and video files into accurate, timestamped text using OpenAI's Whisper Large-v3 model. Upload MP3, WAV, M4A, MP4, MOV, FLAC, and 14 other formats up to 5 GB. Transcripts arrive in 5–10 minutes for a one-hour file with 95% accuracy on clear English audio and support for 99 languages with automatic detection. Free tier includes 30 minutes; paid plans start at $2/month for 200 minutes.
How It Works
Three steps from upload to finished transcript. No setup, no software to install.
- 1
Upload your file
Drag and drop or browse for an audio or video file. We accept MP3, WAV, M4A, MP4, MOV, FLAC, OGG, AAC, AIFF, WMA, AVI, MKV, WebM, and 7 more formats. Up to 5 GB and 10 hours per file.
- 2
AI transcribes in minutes
VexaScribe runs OpenAI's Whisper Large-v3 model on your audio. A 60-minute recording typically completes in 5–10 minutes. Close the tab and come back — we'll keep processing.
- 3
Edit, export, share
Review the transcript in our built-in editor. Rename speakers, fix any errors, then export to TXT, DOCX, SRT, VTT, or JSON. Share via link or download.
Supported Audio and Video Formats
17 formats covering virtually every recording device and tool. Files up to 5 GB and 10 hours per upload.
Audio Formats
- MP3Most common
- WAVLossless
- M4AiPhone default
- FLACLossless
- OGGOpen format
- AACApple/streaming
- AIFFPro audio
- WMAWindows
- AMRMobile
- OPUSModern web
Video Formats
- MP4Most common
- MOVApple/QuickTime
- AVIWindows legacy
- MKVHigh-quality
- WebMWeb video
- FLVFlash legacy
- WMVWindows
Audio is extracted automatically from video files. Video itself is not retained after transcription.
What Can You Transcribe?
If it has audio, VexaScribe can transcribe it. Common use cases:
Podcast episodes
Show notes, blog posts, SEO content, searchable archives. Solo and multi-host shows supported with speaker labels.
Interviews
Journalism, qualitative research, HR. Multi-speaker diarization separates interviewer from subject automatically.
Lectures and classes
Students capturing lectures for review. Teachers generating written course notes from recorded sessions.
Meetings
Zoom, Google Meet, Microsoft Teams calls. Upload the recording or send VexaScribe's meeting bot to join.
Phone calls
Sales calls, customer interviews, support recordings. Record on any device, upload, get a transcript with speakers.
Video content
YouTube videos, training videos, course content. Generate SRT/VTT subtitles with word-level timestamps.
Transcribe in 99 Languages — With Automatic Detection
No need to select language manually. VexaScribe auto-detects the spoken language from the audio. Accuracy varies by language tier:
~5% Word Error Rate (highest accuracy)
~8–12% Word Error Rate
+ 73 more languages
Including Welsh, Swahili, Filipino, Bengali, Punjabi, Tamil, Telugu, Marathi, Urdu, Persian, Romanian, Hungarian, Bulgarian, Croatian, and many more. Accuracy varies by language and audio quality.
What You Get With Every Transcript
Every transcription includes these features at no extra cost on every paid plan.
Speaker diarization
Automatic speaker detection and labeling. Multiple speakers appear as Speaker 1, Speaker 2, Speaker 3, and so on. Rename them in the editor (e.g., "Host", "Guest", actual names).
Word-level timestamps
Every word is timestamped to the millisecond. Click any word in the transcript editor to jump to that moment in the audio. Essential for video subtitles and quote verification.
Multiple export formats
TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (developers). All formats available on every paid plan with no upgrade required.
AI summaries
Optional AI-generated summary with key points, decisions, action items, and chapter markers. Available on all paid plans. Useful for meeting notes, podcast show notes, and lecture review.
How Accurate Is VexaScribe Transcription?
VexaScribe (formerly NovaScribe) achieves 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker.
Real-world accuracy varies by audio condition:
- ●Clear podcast audio: 3–6% WER (94–97% accurate)
- ●Noisy interviews, background music: 8–15% WER (85–92% accurate)
- ●Strong accents, technical jargon, multiple overlapping speakers: 10–20% WER (80–90% accurate)
We recommend reviewing transcripts before publishing critical content — no AI tool achieves the 99%+ accuracy of human transcription, but VexaScribe is 20–100× cheaper than human services like Rev ($1.50/min).
Simple, Transparent Pricing
Pay for what you use. No per-seat fees, no hidden charges. Cancel anytime.
Starter
200 min/month
Solo creators
Basic
1,000 min/month
Regular podcasters
Pro
2,500 min/month
Heavy use
Frequently Asked Questions
How does VexaScribe transcribe audio to text?
VexaScribe (formerly NovaScribe) uses OpenAI's Whisper Large-v3 model to convert speech to text. Upload an audio or video file, and the AI processes the entire recording — adding speaker labels, word-level timestamps, and optional AI summaries. A 60-minute file typically completes in 5-10 minutes.
What audio and video formats can I transcribe?
VexaScribe accepts MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, we extract the audio track automatically.
How long does it take to transcribe a 1-hour audio file?
Most 1-hour files complete in 5-10 minutes. Processing speed depends on audio quality, current load, and file format. You can close the browser tab and return — the transcript will be waiting in your dashboard when it's ready.
Is VexaScribe free to use?
Yes, you get 30 minutes of transcription free with no credit card required. After the free tier, paid plans start at $2/month for 200 minutes (Starter), $5/month for 1,000 minutes (Basic), $10/month for 2,500 minutes (Pro), and $20/month for 6,000 minutes (Studio). Cancel anytime.
How accurate is VexaScribe transcription?
VexaScribe achieves around 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker. Real-world accuracy varies: clear podcast audio averages 3-6% WER, noisy interviews 8-15% WER, and audio with strong accents or technical jargon 10-20% WER. We recommend reviewing transcripts before publishing critical content.
What languages are supported?
99 languages including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Japanese, Chinese, Korean, Arabic, Turkish, Hindi, Vietnamese, Thai, and many more. Language is detected automatically — no need to select it manually before each upload.
Can I transcribe video files?
Yes. Upload MP4, MOV, AVI, MKV, WebM, FLV, or WMV files and we extract the audio track automatically. The transcript includes timestamps so you can sync with your video editing tool, generate subtitles (SRT/VTT export), or repurpose video content into blog posts.
Does VexaScribe identify multiple speakers?
Yes, automatic speaker diarization is included on every transcript. Multiple speakers are labeled Speaker 1, Speaker 2, Speaker 3, and so on. You can rename speakers in the built-in editor (e.g., "Host", "Guest", actual names) for clarity in the final transcript.
Is my audio data private and secure?
Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We do not train AI models on your audio. We do not sell user data. You can delete files at any time from your dashboard, and account deletion is self-serve.
How do I export the transcript?
VexaScribe exports to TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (structured data for developers). All formats are available on every paid plan. SRT and VTT include word-level timestamps for video editors.