Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

Transcribe Audio to Text

AI-powered transcription in 99 languages. Upload any audio or video file — get accurate text with speaker labels, timestamps, and AI summaries in minutes.

VexaScribe (formerly NovaScribe) converts audio and video files into accurate, timestamped text using OpenAI's Whisper Large-v3 model. Upload MP3, WAV, M4A, MP4, MOV, FLAC, and 14 other formats up to 5 GB. Transcripts arrive in 5–10 minutes for a one-hour file with 95% accuracy on clear English audio and support for 99 languages with automatic detection. Free tier includes 30 minutes; paid plans start at $2/month for 200 minutes.

30 minutes freeNo credit card99 languagesSpeaker labels

How It Works

Three steps from upload to finished transcript. No setup, no software to install.

  1. 1

    Upload your file

    Drag and drop or browse for an audio or video file. We accept MP3, WAV, M4A, MP4, MOV, FLAC, OGG, AAC, AIFF, WMA, AVI, MKV, WebM, and 7 more formats. Up to 5 GB and 10 hours per file.

  2. 2

    AI transcribes in minutes

    VexaScribe runs OpenAI's Whisper Large-v3 model on your audio. A 60-minute recording typically completes in 5–10 minutes. Close the tab and come back — we'll keep processing.

  3. 3

    Edit, export, share

    Review the transcript in our built-in editor. Rename speakers, fix any errors, then export to TXT, DOCX, SRT, VTT, or JSON. Share via link or download.

Supported Audio and Video Formats

17 formats covering virtually every recording device and tool. Files up to 5 GB and 10 hours per upload.

Audio Formats

  • MP3Most common
  • WAVLossless
  • M4AiPhone default
  • FLACLossless
  • OGGOpen format
  • AACApple/streaming
  • AIFFPro audio
  • WMAWindows
  • AMRMobile
  • OPUSModern web

Video Formats

  • MP4Most common
  • MOVApple/QuickTime
  • AVIWindows legacy
  • MKVHigh-quality
  • WebMWeb video
  • FLVFlash legacy
  • WMVWindows

Audio is extracted automatically from video files. Video itself is not retained after transcription.

File limits: 5 GB per file, 10 hours per file. No monthly upload limit beyond your plan's included minutes.

What Can You Transcribe?

If it has audio, VexaScribe can transcribe it. Common use cases:

Podcast episodes

Show notes, blog posts, SEO content, searchable archives. Solo and multi-host shows supported with speaker labels.

Interviews

Journalism, qualitative research, HR. Multi-speaker diarization separates interviewer from subject automatically.

Lectures and classes

Students capturing lectures for review. Teachers generating written course notes from recorded sessions.

Meetings

Zoom, Google Meet, Microsoft Teams calls. Upload the recording or send VexaScribe's meeting bot to join.

Phone calls

Sales calls, customer interviews, support recordings. Record on any device, upload, get a transcript with speakers.

Video content

YouTube videos, training videos, course content. Generate SRT/VTT subtitles with word-level timestamps.

Transcribe in 99 Languages — With Automatic Detection

No need to select language manually. VexaScribe auto-detects the spoken language from the audio. Accuracy varies by language tier:

Tier 1

~5% Word Error Rate (highest accuracy)

EnglishSpanishFrenchGermanItalianPortugueseDutchPolishRussianJapanese
Tier 2

~8–12% Word Error Rate

ArabicChineseKoreanHindiTurkishVietnameseThaiIndonesianHebrewCzechSwedishNorwegianDanishFinnishGreekUkrainian

+ 73 more languages

Including Welsh, Swahili, Filipino, Bengali, Punjabi, Tamil, Telugu, Marathi, Urdu, Persian, Romanian, Hungarian, Bulgarian, Croatian, and many more. Accuracy varies by language and audio quality.

What You Get With Every Transcript

Every transcription includes these features at no extra cost on every paid plan.

Speaker diarization

Automatic speaker detection and labeling. Multiple speakers appear as Speaker 1, Speaker 2, Speaker 3, and so on. Rename them in the editor (e.g., "Host", "Guest", actual names).

Word-level timestamps

Every word is timestamped to the millisecond. Click any word in the transcript editor to jump to that moment in the audio. Essential for video subtitles and quote verification.

Multiple export formats

TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (developers). All formats available on every paid plan with no upgrade required.

AI summaries

Optional AI-generated summary with key points, decisions, action items, and chapter markers. Available on all paid plans. Useful for meeting notes, podcast show notes, and lecture review.

How Accurate Is VexaScribe Transcription?

VexaScribe (formerly NovaScribe) achieves 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker.

Real-world accuracy varies by audio condition:

  • Clear podcast audio: 3–6% WER (94–97% accurate)
  • Noisy interviews, background music: 8–15% WER (85–92% accurate)
  • Strong accents, technical jargon, multiple overlapping speakers: 10–20% WER (80–90% accurate)

We recommend reviewing transcripts before publishing critical content — no AI tool achieves the 99%+ accuracy of human transcription, but VexaScribe is 20–100× cheaper than human services like Rev ($1.50/min).

Methodology: Word Error Rate (WER) is calculated as (Substitutions + Insertions + Deletions) / Total Words. We use the industry-standard formula. See our editorial standards for full testing methodology.

Simple, Transparent Pricing

Pay for what you use. No per-seat fees, no hidden charges. Cancel anytime.

Starter

$2/month

200 min/month

Solo creators

Basic

$5/month

1,000 min/month

Regular podcasters

Pro

$10/month

2,500 min/month

Heavy use

Frequently Asked Questions

How does VexaScribe transcribe audio to text?

VexaScribe (formerly NovaScribe) uses OpenAI's Whisper Large-v3 model to convert speech to text. Upload an audio or video file, and the AI processes the entire recording — adding speaker labels, word-level timestamps, and optional AI summaries. A 60-minute file typically completes in 5-10 minutes.

What audio and video formats can I transcribe?

VexaScribe accepts MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, we extract the audio track automatically.

How long does it take to transcribe a 1-hour audio file?

Most 1-hour files complete in 5-10 minutes. Processing speed depends on audio quality, current load, and file format. You can close the browser tab and return — the transcript will be waiting in your dashboard when it's ready.

Is VexaScribe free to use?

Yes, you get 30 minutes of transcription free with no credit card required. After the free tier, paid plans start at $2/month for 200 minutes (Starter), $5/month for 1,000 minutes (Basic), $10/month for 2,500 minutes (Pro), and $20/month for 6,000 minutes (Studio). Cancel anytime.

How accurate is VexaScribe transcription?

VexaScribe achieves around 95% accuracy (5% Word Error Rate) on clear English audio with a single speaker. Real-world accuracy varies: clear podcast audio averages 3-6% WER, noisy interviews 8-15% WER, and audio with strong accents or technical jargon 10-20% WER. We recommend reviewing transcripts before publishing critical content.

What languages are supported?

99 languages including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Japanese, Chinese, Korean, Arabic, Turkish, Hindi, Vietnamese, Thai, and many more. Language is detected automatically — no need to select it manually before each upload.

Can I transcribe video files?

Yes. Upload MP4, MOV, AVI, MKV, WebM, FLV, or WMV files and we extract the audio track automatically. The transcript includes timestamps so you can sync with your video editing tool, generate subtitles (SRT/VTT export), or repurpose video content into blog posts.

Does VexaScribe identify multiple speakers?

Yes, automatic speaker diarization is included on every transcript. Multiple speakers are labeled Speaker 1, Speaker 2, Speaker 3, and so on. You can rename speakers in the built-in editor (e.g., "Host", "Guest", actual names) for clarity in the final transcript.

Is my audio data private and secure?

Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We do not train AI models on your audio. We do not sell user data. You can delete files at any time from your dashboard, and account deletion is self-serve.

How do I export the transcript?

VexaScribe exports to TXT (plain text), DOCX (Word document), SRT (video subtitles), VTT (web subtitles), and JSON (structured data for developers). All formats are available on every paid plan. SRT and VTT include word-level timestamps for video editors.

Start Transcribing in 30 Seconds

30 minutes of free transcription, no credit card required. Upload any audio file and see the result yourself.