Verified June 2026

Bulk Transcription — 50 Files, One Flat Rate, London Storage

Whisper Large-v3 accuracy across 99 languages. Audio and transcripts stored in AWS eu-west-2 (London) under UK-GDPR. No AI training on your audio. Built for agencies, research labs, legal teams, and media archives that need predictable cost and procurement-ready disclosure.

VexaScribe transcribes up to 50 files in a single batch with mixed audio and video formats — MP3, M4A, WAV, FLAC, OPUS, MP4, MOV, MKV, WEBM. Whisper Large-v3 gives you 92–95% accuracy on Tier 1 languages (English, Spanish, French, German, Italian, Portuguese, Japanese, Mandarin, Korean), and we ship the result as a ZIP with original filenames preserved plus a CSV manifest of per-file metadata. Pricing is flat-rate ($2–$20/month, 200–6,000 minutes) — at 100 hours of monthly volume that's $20 vs $1,500 on Rev AI or $11,940 on Rev Human. No AI model training on user data. Storage in AWS eu-west-2 (London) under UK-GDPR with LGPD compliance for Brazilian clients. Transcription processing runs on specialized ML inference infrastructure — partner is named in our DPA. DPA available on request for research labs, legal firms, and corporate procurement. Honest scope: bulk is currently UI-first (drag-and-drop, dashboard, ZIP download) — not a programmatic API. If you need an API with S3 sync and webhooks, AssemblyAI or Deepgram are better fits today. 30 minutes free trial on signup, no credit card.

Up to 50 files per batchMixed audio + video formatsZIP delivery, original filenamesDPA / NDA on request

The essentials

  • Up to 50 files per batch, mixed audio and video formats. No daily cap beyond your plan's monthly minutes.
  • Whisper Large-v3 on specialized ML inference infrastructure — 92–95% accuracy on Tier 1 languages, 99 languages total.
  • ZIP delivery with original filenames preserved + CSV manifest of per-file metadata (duration, language, speaker count).
  • Flat-rate $2–$20/month. At 100h/month: $20 vs $1,500 Rev AI vs $11,940 Rev Human. For pure-API consumers, AssemblyAI batch still wins at $15.
  • London storage in AWS eu-west-2. Audio and transcripts under UK-GDPR. LGPD-friendly for Brazilian clients. No AI training on user data. Inference partner disclosed in DPA.
  • Honest scope: UI-first today. Drag-and-drop, dashboard, ZIP download. No programmatic API yet — if you need one, AssemblyAI or Deepgram are stronger fits.
  • DPA + NDA available for research labs (IRB), legal firms (privileged data), and corporate procurement.

Who actually needs bulk transcription

“Bulk transcription” is a B2B-leaning query — almost no consumer searches it. The six personas below cover ~95% of real bulk customers. Each has a distinct pain that shapes which tool wins.

Content / marketing agency

5–30 client podcasts or videos per week

Per-minute pricing kills margin, manual one-by-one uploads burn the producer's day, needs branded DOCX + SRT deliverables per client.

Academic / UX research team

Just finished a study with 50–200 interviews

Needs verbatim transcripts, consistent speaker labels, NVivo / ATLAS.ti / MAXQDA import, IRB-compliant data residency, one-shot ZIP delivery.

Legal / paralegal team

Deposition or discovery batches

Confidentiality (no AI training), retention controls, accurate timestamps, DPA + NDA on file before any data moves.

Media archive / newsroom

Years of interviews, broadcasts, podcasts to digitize

Wants to make a back catalog searchable, EU residency for European partnerships, predictable monthly cost rather than per-minute archive billing.

B2B product builder

Building an app where transcription is one feature

Per-minute API math, webhook reliability, predictable monthly cap. Honest note: if you need a true bulk API today, AssemblyAI ($0.15/hr) or Deepgram ($0.26/hr) are stronger fits — VexaScribe bulk is UI-first.

Educator / course producer

Library of lectures, webinars, or training sessions

Accuracy on technical jargon, batch SRT export for WCAG/EAA accessibility compliance, multiple language tracks for international cohorts.

Across all six personas, the common pain is the same in different language: per-minute pricing destroys budget predictability as volume grows, one-by-one uploads waste hours, compliance documentation is needed for procurement, and consistent output matters for downstream automation. Bulk transcription is not a discount feature — it's a workflow feature.

How VexaScribe bulk works (3 steps)

  1. 1

    Drag a folder

    Select up to 50 files at once from your desktop, or drop a folder onto the upload area. Mixed formats — MP3 + M4A + MP4 + WAV in the same batch is fine. Audio is extracted from video automatically.

  2. 2

    Pick languages, formats, options

    Auto-detect language per file (recommended) or force a single language. Choose output formats — pick one or several: TXT, DOCX, SRT, VTT, PDF, JSON. Toggle diarization on/off.

  3. 3

    Walk away, return to a ZIP

    Processing happens in parallel. Status dashboard shows per-file progress. When complete, download a single ZIP with original filenames preserved plus a CSV manifest. Partial-batch download is supported if some files fail.

Typical processing speed: 5–10 minutes per hour of audio when GPU capacity is available. A 50-file batch of one-hour interviews completes in roughly 60–90 minutes (parallelized) — not 50 hours of sequential processing.

The math: flat-rate vs per-minute (100-hour project)

The load-bearing question for buyers. We picked 100 hours because it's the realistic size of one research study, one media archive sprint, or one quarter of a marketing agency's podcast workload. Numbers verified June 2026.

ProviderCalculation100h totalNote
Rev AI100h × $0.25/min × 60$1,500Per-minute AI; human is 8× more
Rev Human100h × $1.99/min × 60$11,940Verbatim certified for legal/broadcast
Otter Business$30/user/mo, capped 6,000 min/user$30 (fits 100h)No Portuguese support
AssemblyAI batch100h × $0.0025/min × 60 + add-ons~$15 base, ~$54 loadedAPI only, no UI
Trint BulkScribeCustom enterprise contract from ~$0.20/h~$20+, contract requiredEnterprise sales cycle
VexaScribe StudioFlat monthly subscription$20 / monthUI-first, ZIP delivery, London storage (UK-GDPR)

Honest read of this table

Flat-rate dominates per-minute pricing once you exceed ~5 hours/month, and dominates seat-based pricing once you exceed one user's needs. For pure-API consumers (developers building a transcription feature into their own product), AssemblyAI's $15 base is genuinely cheaper than VexaScribe Studio's $20 — we don't compete on that segment, and we'll tell you that honestly rather than pretend our UI matters to a developer building a SaaS. For everyone else — agencies, researchers, legal teams, media archives, educators — the managed UI plus flat-rate plus London storage under UK-GDPR is the better deal.

Batch mechanics: queues, parallelism, failure handling

The boring details that matter when you're processing 50 files at once and one of them is corrupted.

  • Mixed-format support. MP3, M4A, WAV, FLAC, OGG, OPUS, MP4, MOV, MKV, WEBM, AVI in one batch. Audio is extracted from video automatically — no pre-conversion needed.
  • Parallel processing. Multiple files transcribe simultaneously, not sequentially. A 50-file batch of one-hour interviews finishes in 60–90 minutes total, not 50 hours.
  • Per-file status dashboard. Each file shows: queued, uploading, processing, complete, or failed-with-reason. Refresh anytime; nothing is lost if you close the tab.
  • Independent failure handling. If 3 of 50 files fail (corrupted header, no audio track, exceeds 10-hour cap), the other 47 finish normally. Failed files retry once free; you can fix and re-upload manually after that.
  • Speaker labels — honest limit. Diarization is consistent within each file (Speaker 1 through Speaker 10) but NOT auto-matched across files. The bulk-rename UI applies consistent labels across all files in a batch in one operation — typical research study with one interviewer and many participants takes ~30 seconds total to relabel.
  • Partial-batch download. Don't wait for stragglers — download a ZIP of completed files anytime. Failed files can finish later and be downloaded separately.

Output formats and the ZIP manifest

Pick one or several formats per batch — every file in the batch gets every format you selected. The ZIP delivery preserves your original filenames, with extension matched to the output (interview.mp3 → interview.docx, interview.srt, etc).

FormatContainsTypically used by
TXTPlain text, raw transcriptQuick read, copy-paste, LLM prompt input
DOCXWord document with speakers, timestampsResearchers (NVivo/ATLAS.ti import), journalists, legal teams
SRTSubtitle file with timingVideo creators, YouTube, Premiere, DaVinci, CapCut
VTTWeb subtitle (HTML5 video)Web players, browser-native captions
PDFFormatted, print-ready transcriptClient deliverables, legal exhibits, archives
JSONStructured with word-level timestampsDevelopers, search indexing, custom downstream tools

Every ZIP includes a CSV manifest with one row per file: original filename, duration, detected language, number of speakers, processing timestamp, word count. The manifest is the bridge for downstream automation — pipe it into a script that uploads to your CAQDAS tool, CMS, or shared drive without manual mapping.

Compliance and security (the procurement section)

Everything procurement, legal, and IT typically asks about. Depth matters here — buyers need to forward this section to a security review.

Storage residency

AWS eu-west-2 (London, UK). Audio, transcripts, and account data stored under UK-GDPR (post-Brexit EU-GDPR equivalent).

Transcription processing

Whisper Large-v3 runs on specialized ML inference infrastructure. Inference partner is US-domiciled — full name and role disclosed in our DPA. For end-to-end EU-only processing, ask about our dedicated infrastructure option.

LGPD compliance

Lei 13.709/2018. London storage satisfies cross-border requirements for Brazilian clients via EU/UK adequacy posture. DPO contact provided on request.

No AI training

Contractual commitment in our Terms and DPA. Your audio and transcripts never train any model — ours or our inference partner's, per their published Terms of Service.

Encryption

TLS 1.2+ in transit (upload, inference call, download). AES-256 at rest in S3. Per-bucket encryption keys. Audit logs available on Studio + Enterprise plans.

Retention

30-day default post-transcription retention. Auto-delete on transcript download available on Pro and Studio. On-request immediate deletion always honored.

DPA + NDA

Standard GDPR Article 28 DPA on request, typically 1–2 business days. Project-specific NDAs for sensitive batches (legal investigations, media archives) on 2–5 day turnaround.

Sub-processors: AWS (primary storage and application compute, eu-west-2 London) and our ML inference partner (Whisper Large-v3 and pyannote.audio inference). Both are named with their specific role and location in our DPA. We do not use the OpenAI API, AssemblyAI, or Deepgram. Honest disclosure for procurement: because our inference partner is US-domiciled, the transcription step is within the legal reach of the US CLOUD Act even though storage is in London. For adversarial legal contexts — sources under US legal pressure, sealed legal evidence with US-government adversaries — get in touch about our dedicated infrastructure option.

Languages: 99 supported, Tier 1 PT-BR

Whisper Large-v3 supports 99 languages with accuracy that tiers by training data volume. Bulk batches can mix languages — language is auto-detected per file from the first 30 seconds.

Tier 1 (92–95%)

English, Spanish, French, German, Italian, Dutch, Russian, Polish, Portuguese (PT and BR), Japanese, Mandarin, Korean.

Tier 2 (88–92%)

Arabic, Turkish, Hindi, Vietnamese, Thai, Indonesian, Ukrainian, Czech, Hungarian, Romanian, Swedish, Danish, Finnish.

Tier 3 (75–88%)

Swahili, Bengali, Punjabi, Tamil, Telugu, Welsh, and other lower-resource languages. Sample test with your audio recommended before committing a large batch.

Notable PT-BR differentiator: Otter.ai does NOT support Portuguese in 2026 — its official supported languages are English, French, and Spanish only. For Brazilian agencies, Portuguese-language researchers, and LATAM media operations, VexaScribe is the practical choice over Otter for bulk Portuguese workloads. We cover this in depth in our PT-BR transcription guide.

Frequently asked questions

How many files can I upload at once in a bulk batch?

Up to 50 files per batch on every paid plan. The 50-file limit is generous for most workflows: a research lab with 30 hour-long interviews fits in one batch; a podcast agency with weekly client deliverables runs one batch per client. If you need more, run consecutive batches — there's no per-day or per-account cap beyond your plan's monthly minutes. The 50-file ceiling exists because larger batches degrade UI responsiveness; processing 200+ files via API is on the roadmap for developer use cases.

What's the maximum file size and total batch size limit?

Per file: 5 GB and 10 hours (whichever comes first). Per batch: 50 files. There is no hard total-batch-size limit beyond the per-file cap × 50 — so a theoretical maximum batch is 50 files × 5 GB = 250 GB, though we recommend keeping batches under ~25 GB in practice for upload reliability over typical office internet. Long files (3-10 hours) work fine — common with full-day depositions, half-day workshops, or oral-history projects. For files over 10 hours, split before upload using a free tool like ffmpeg or Audacity.

Do you support mixed formats (MP3, M4A, MP4, WAV) in one batch?

Yes — mix any combination of MP3, M4A, WAV, FLAC, OGG, OPUS (audio) plus MP4, MOV, MKV, WEBM, AVI (video) in a single batch. Audio is extracted from video automatically — no need to pre-convert. Each file is detected, processed, and transcribed independently; the batch waits for all files to finish before delivering the ZIP. Mixed-format support matters for real workflows: an agency receives MP3s from one client, M4As from iPhones, MP4s from Zoom recordings — they shouldn't have to pre-process everything just to transcribe.

What happens if one file fails mid-batch — do I lose the others?

No. Each file is processed independently. If 3 of 50 files fail (corrupted audio, unsupported codec, exceeds duration cap), the other 47 finish and you can download a partial ZIP containing the successful files. Failed files appear in the dashboard with the specific error reason — corrupted header, no audio track detected, file exceeds 10 hours, etc. You can retry failed files individually (free retry within 24 hours) or fix the source and re-upload. The batch never silently swallows failures, and successful work is never blocked by a single bad file.

Can I download the whole batch as a ZIP with original filenames?

Yes. The ZIP preserves your original filenames — “interview-maria-2026-06-10.mp3” becomes “interview-maria-2026-06-10.docx” (or .srt, .txt, etc., depending on the format you selected). The ZIP also includes a CSV manifest with per-file metadata: original filename, duration, detected language, number of speakers, processing timestamp, word count. The manifest makes downstream automation easy — pipe it into a script that uploads to your CAQDAS tool (NVivo, ATLAS.ti), CMS, or shared drive without manual mapping. Multiple output formats per file are supported in the same batch (one ZIP with both .docx and .srt for every file).

Are speaker labels consistent across files in the same batch?

Speaker labels are consistent WITHIN each file (Speaker 1, Speaker 2... up to 10) but NOT auto-matched ACROSS files. This is an honest technical limitation of diarization in 2026: cross-file speaker identification requires voice-print enrollment, which adds complexity and privacy implications we've chosen not to ship. Workaround: use the bulk-rename UI in the dashboard to apply consistent labels (Speaker 1 → “Interviewer”, Speaker 2 → “Participant”) across all files in a batch in one operation. For research studies where the same interviewer appears in all 50 files, this takes ~30 seconds total. We're transparent about this because over-promising cross-file matching is a common industry trap.

Do you train your AI models on my files?

No. We contractually commit to never training models on user audio or transcripts — verifiable in our Terms and DPA. We use OpenAI's Whisper Large-v3 (open-source, MIT license) for transcription and pyannote.audio for diarization. Inference runs on specialized ML compute infrastructure (our inference partner is disclosed by name in the DPA). Our inference partner is contractually committed to not training models on inference data per their published Terms of Service. This is a deliberate differentiator vs. providers like Otter.ai, which trains on user audio by default with manual opt-out.

Where is my audio stored and processed?

Storage: AWS eu-west-2 (London, UK). All audio, transcripts, and account data live in London under UK-GDPR (the post-Brexit equivalent to EU-GDPR), with AES-256 encryption at rest and TLS 1.2+ in transit. Processing: during transcription, audio is sent from our London infrastructure to an ML inference partner that runs Whisper Large-v3 on specialized GPU infrastructure, then results return to London for storage and delivery. Honest disclosure: our inference partner is US-domiciled, so the transcription step is within reach of the US CLOUD Act even though all storage is in London. For workloads where end-to-end EU residency is non-negotiable — adversarial legal contexts, sensitive journalistic sources under US legal pressure — get in touch about our dedicated infrastructure option. We retain audio for 30 days post-transcription by default; auto-delete on transcript download is available on Pro and Studio plans, and on-request immediate deletion is always supported.

Is there a bulk API for S3 / webhook workflows?

Not yet. As of June 2026, bulk transcription is UI-first — drag-and-drop folder upload, dashboard status, ZIP download. Programmatic API access (POST a list of S3 URLs, receive webhooks per file completion) is on the roadmap but not shipped. If your workflow strictly requires API/webhook integration with S3 sync, we honestly recommend AssemblyAI ($0.15/hr batch) or Deepgram ($0.26/hr) — both have mature batch APIs. For everyone else (agencies, researchers, legal teams, podcasters), the UI workflow is faster than wrangling API code: drag a folder, walk away, return to a ZIP. We'll announce API access here when it ships.

How does flat-rate pricing compare to Rev or Otter for 100 hours?

For 100 hours in a single month: Rev AI charges $0.25/min = $1,500. Rev Human charges $1.99/min = $11,940. Otter Business charges $30/user/month but caps imported-file minutes at 6,000/user — 100 hours fits, so $30, but go over and you need a second seat. AssemblyAI batch is $0.0025/min = $15 (genuinely cheaper if you only need API). Trint BulkScribe starts around $0.20/hour but requires enterprise contract negotiation. VexaScribe Studio is $20/month flat for 6,000 minutes = 100 hours. The pattern: flat-rate dominates per-minute pricing once you exceed ~5 hours/month, and dominates seat-based pricing once you exceed one user's needs. For pure-API consumers, AssemblyAI may still win — we don't compete on that segment.

Is bulk transcription supported in Portuguese, Spanish, and other languages?

Yes — all 99 Whisper Large-v3 languages are supported in bulk. Tier 1 languages (92-95% accuracy on clean audio): English, Spanish, French, German, Italian, Portuguese (PT and BR), Dutch, Russian, Polish, Japanese, Mandarin, Korean. Tier 2 (88-92%): Arabic, Turkish, Hindi, Vietnamese, Thai, Indonesian, Ukrainian, Czech, Hungarian, Romanian. Tier 3 (75-88%): Swahili, Bengali, Tamil, Welsh, and lower-resource languages. A single batch can contain mixed languages — language is auto-detected per file from the first 30 seconds. Notable: Otter.ai does NOT support Portuguese in 2026 (English/French/Spanish only), making VexaScribe a practical choice for Brazilian agencies, Portuguese researchers, and LATAM media operations.

Can I get a DPA or sign an NDA for a legal or research batch?

Yes. We provide a standard Data Processing Agreement (DPA) on request to any paid account — typical for research labs needing IRB documentation, legal firms with privileged client data, and corporate buyers requiring procurement review. NDAs for specific projects (large media archives, sensitive corporate audio, legal investigations) are signed on a case-by-case basis. Email legal@vexascribe.com with your batch details (estimated volume, sensitivity level, retention requirements). The standard DPA covers GDPR Article 28 processor obligations, full sub-processor disclosure (AWS for storage and our ML inference partner — both named, with their roles and locations), London storage residency, and the no-AI-training clause. Turnaround is typically 1-2 business days for DPA, 2-5 days for custom NDA.

Methodology and sources

  • ● Pricing verified June 2026 on vendor sites: Rev.com, Otter.ai, AssemblyAI, Deepgram, Trint, OpenAI Whisper API. Pricing changes — always confirm on the source.
  • ● Whisper Large-v3: OpenAI, November 2023. MIT license. Paper: Radford et al. “Robust Speech Recognition via Large-Scale Weak Supervision” (2022).
  • ● Diarization: pyannote.audio 3.1 (Apache 2.0). Bredin et al., Université du Mans / IRIT.
  • ● GDPR: Regulation (EU) 2016/679. UK-GDPR: Data Protection Act 2018 + UK-GDPR (post-Brexit).
  • ● LGPD: Lei 13.709/2018 (Brazil). ANPD guidance 2024-2025 on cross-border transfers.
  • ● US CLOUD Act: 18 U.S.C. § 2713 (2018).
  • ● Otter.ai PT-BR support: verified on otter.ai/languages June 2026 — English, French, Spanish only.

Related guides