Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →
Transcript Summary — Audio & Video to AI Summary
Upload an audio or video recording. Get a full transcript and a structured AI summary in 6 purpose-built types — Meeting, Sales, Interview, Lecture, Podcast, or General.
VexaScribe (formerly NovaScribe) generates an AI summary directly from your audio or video recording. Upload an MP3, WAV, MP4, MOV, or any of 17 supported formats up to 5 GB. We transcribe with Whisper Large-v3 (95% accuracy on clear audio, 99 languages), then generate a structured summary tailored to your chosen type — one of 6 purpose-built templates: General, Meeting, Sales Call, Interview, Lecture, or Podcast. A 60-minute file completes in 5-10 minutes, including both transcription and summary. Summaries preserve 94%+ of named entities in our internal evaluation, run on instruction-tuned LLMs that match human-written quality on standard benchmarks (Zhang et al., TACL 2023), and can be translated to any of 99 target languages. Knowledge workers waste roughly 31 hours per month in unproductive meetings (Atlassian State of Teams 2025) — turning that audio backlog into structured summaries is the fastest ROI you can ship this quarter.
Why summarize transcripts at all?
Raw transcripts are searchable but unreadable. Summarization is what turns 90 minutes of audio into a 90-second decision — and the time savings are measurable.
Numbers verify what every knowledge worker already feels: meetings and unfiltered transcripts are a productivity tax. AI summaries are the recovery mechanism. See how we verify these stats.
Pick your summary type (6 options)
The right summary depends on what kind of recording you uploaded. VexaScribe ships six purpose-built templates — each returns a different structured field set. Pick one before you click Generate, and switch any time without re-uploading.
General
Default for any audio without specific context — voice memos, briefings, mixed content
Sections included
- Topics — high-level subjects discussed
Use when the recording doesn't fit a specialized category — flexible high-level structure.
Meeting
Team meetings, standups, planning sessions
Sections included
- Action items — owner, task, deadline, priority
- Decisions — what was decided and by whom
- Open questions — unresolved items to follow up on
- Blockers — what's blocking progress and who raised it
Decisions, action items, blockers, open questions — surfaced in a structured layout ready for distribution.
Sales Call
Sales discovery, demo, negotiation, closing calls
Sections included
- Action items — owner, task, deadline, priority
- Client needs — need, priority, evidence quote
- Objections — objection, seller response, resolution status
- Competitor mentions — name, context, positioning
- Pricing discussions — what was discussed and the outcome
- Deal next steps — owner, deadline, priority
- Sentiment — qualitative read on prospect interest
- Deal stage — Discovery, Demo, Negotiation, Closing, Closed Won/Lost
- Qualification signals — BANT cues observed (Budget, Authority, Need, Timeline)
The richest of the six types — full sales-cycle metadata including BANT, deal stage, objection handling, and prospect sentiment.
Interview
Job interviews, candidate assessments
Sections included
- Notable exchanges — topic, summary, who was involved
- Strengths — strengths shown by the candidate
- Concerns — concerns raised during the interview
- Overall assessment — synthesized hire-or-no-hire reasoning
Designed for hiring loops — surfaces candidate strengths, concerns, and a synthesized hire-or-pass recommendation.
Lecture
Educational content, classes, training sessions
Sections included
- Key concepts — concept and explanation
- Examples given — example, concept it illustrates, teaching point
- Terminology — term-and-definition glossary
- Takeaways — key learning points
- Review questions — question with answer hint, as a study aid
- Further reading — topic and why it's relevant
Built for studying — review questions, a terminology glossary, and further-reading suggestions are all included.
Podcast
Conversational shows, host + guest formats
Sections included
- Speaker profiles — speaker label mapped to name and role
- Discussion points — topic, speaker, their position
- Key insights — insight, who said it, context
- Agreements and disagreements — topic, type, speakers involved
- Recommendations — item, who recommended it, reason
- Guest highlights — speaker, moment, why it's notable
Tailored for show-notes publishing and guest-focused podcasts — captures speaker positions, recommendations, and standout moments.
Quality is consistent across types — instruction-tuned models match human-written quality on standardized news benchmarks (Zhang et al., TACL 2023). The differentiator is which structured fields surface what you actually need from the conversation.
See it on real recordings
One short before/after example for each summary type — Meeting, Sales Call, Lecture, Interview, Podcast, and General. Each side capped at ~80 words so you can scan all six in two minutes.
Audio transcript excerpt
…so the renewal pricing came back at 18% up year-over-year, which Priya flagged as risky given Acme's Q2 results. Tom suggested we counter at 9% and lock for 24 months. Priya owns the redline by Friday. We also agreed to pause the SSO migration until legal signs off — Rahul will confirm the lawyer's bandwidth Monday. No decision on the Slack-vs-Teams switch; bumped to next week's all-hands…
AI summary output
Audio transcript excerpt
…yeah, the pain is mostly that we run a Monday revenue meeting and Sarah pulls the numbers from three different dashboards. It takes her like three or four hours every week. Budget-wise we have something allocated for next quarter but Q2 is locked. Decision-wise, I'm the one signing but our CFO needs to bless anything over $50K annual. Send me an ROI model with realistic numbers and I'll loop her in next week…
AI summary output
Audio transcript excerpt
…so when we talk about externalities in welfare economics we mean a cost or benefit imposed on a third party that isn't reflected in the market price. Pollution is the classic negative case. Pigou's solution in 1920 was a tax equal to the marginal external cost — the so-called Pigouvian tax. Coase pushed back in 1960: if transaction costs are low, parties can bargain to an efficient outcome regardless of who holds the property right…
AI summary output
Audio transcript excerpt
…my approach to a hard reorg is to compress the information gap. The week before is the hardest part — when leadership knows and the team doesn't. We had to lay off forty people last year. I made the decision Monday, communicated Tuesday, supported them with severance and intros Wednesday. The trust we kept came from speed, not from the message…
AI summary output
Audio transcript excerpt
Host: …and what most founders miss is that the first 50 customers aren't a market — they're a focus group. Guest: I'd push back slightly. They're a market if you've already nailed the ICP. The trap is when you confuse the loudest five for the average. Host: Fair. So how do you avoid that? Guest: Write the ICP in one sentence and tape it to your monitor…
AI summary output
Audio transcript excerpt
…just dumping some thoughts about the offsite. The hike was great, dinner was fine but service was slow. The most useful session was the founder Q&A — three things came up worth following up on: the pricing experiment timeline, the hire for ML platform, and whether we want to go back to the same venue next year…
AI summary output
AI summary vs human-written: when each wins
AI wins on cost, speed, and scale. Humans still win on legal liability, deep cultural nuance, and stakes-where-a-mistake-is-fatal. The honest scorecard:
| Criterion | AI summary | Human-written | Winner |
|---|---|---|---|
| Cost per 60-min summary | ~$0.00–$0.30 | $30–$80 (freelance) | AI |
| Turnaround time | ~15 seconds | 2–24 hours | AI |
| Accuracy on standard transcripts | ~94%+ entity preservation; on par with human on news benchmarks (Zhang et al., TACL 2023) | ~96–98% with domain-expert reviewer | Human (narrowly) |
| Cultural / idiomatic nuance | Misses sarcasm, regional idioms | Strong when reviewer shares context | Human |
| Legal / medical liability | Not certified; output is not auditable testimony | Trained transcriptionist + signed attestation | Human |
| Scale (1,000+ transcripts/week) | Trivial | Requires a team | AI |
| Format consistency across runs | High — deterministic templates | Variable across humans | AI |
| Long-tail languages (e.g., Welsh, Swahili) | Strong on top 25, weaker on long tail | Depends on reviewer availability | Tie / depends |
AI wins for everyday meetings, podcasts, and lectures — the volume problem. Humans still win when a single misquote can land you in court or harm a patient: legal depositions, medical records, high-stakes journalism. VexaScribe's stance: ship AI summaries with entity-flagging, route the 5% of high-stakes cases to human review. See our editorial review process.
Summarize in any language — or translate while you summarize
VexaScribe transcribes in 99 input languages and can deliver the summary in any of 99 target languages — input and output are independent. Upload a Spanish meeting recording and get English action items, or vice versa, in a single workflow.
Common pairings:
- • Spanish meeting recording → English action items (US team distribution)
- • German lecture recording → English chapters + key concepts (international students)
- • Japanese podcast → English show notes (cross-market publishing)
- • Portuguese interview → English quotes + themes (research synthesis)
For pure audio translation without summarization, see transcribe and translate audio.
Export the summary anywhere
The summary downloads as Markdown, DOCX, or plain text — and copy-to-clipboard preserves Markdown formatting so you can paste cleanly into Notion, Obsidian, Google Docs, or Slack. The full transcript exports separately as TXT, DOCX, SRT, VTT, or JSON.
| Integration | Supported export | Setup |
|---|---|---|
| Markdown file | .md with frontmatter + headings | Direct download, no setup |
| DOCX (Word) | Word-compatible with styles | Direct download, no setup |
| Plain text | .txt for any text editor | Direct download, no setup |
| Notion (paste) | Copy to clipboard with Markdown formatting | Paste into any Notion page |
| Obsidian (paste) | Markdown with wiki-link compatible headings | Paste into vault |
| Google Drive (paste) | Copy to clipboard, paste into Doc | Manual paste |
What happens to your recording
Audio uploads are encrypted in transit (TLS 1.2+) and at rest. We don't train AI models on your recordings, and you can delete files any time.
- TLS 1.2+ in transit, encrypted at rest in AWS eu-west-2.
- No model training on customer data — your recordings, transcripts, and summaries are yours.
- Self-serve deletion any time from your dashboard.
- Account deletion purges all recordings, transcripts, and summaries.
Full details in our privacy policy.
How to generate a transcript summary
Three steps from audio file to structured AI summary.
- 1
Upload audio or video
Drag-drop an MP3, WAV, M4A, MP4, MOV, or any of 17 supported formats up to 5 GB. Source language is auto-detected across 99 supported languages.
- 2
Pick a summary type
Choose from 6 types — General, Meeting, Sales Call, Interview, Lecture, or Podcast. Each generates a different structural template. Switch types and regenerate without re-uploading.
- 3
Edit and export
Review the transcript and summary side-by-side in the synced editor. Correct anything, then download the summary as Markdown or DOCX, or copy to clipboard for Notion / Drive / Slack.
Transcript Summary — Frequently Asked Questions
How does transcript summary work on VexaScribe?
Upload an audio or video recording (MP3, WAV, M4A, MP4, MOV, and 12 other formats up to 5 GB). VexaScribe (formerly NovaScribe) transcribes the audio with Whisper Large-v3, then generates an AI summary tailored to your chosen type — General, Meeting, Sales Call, Interview, Lecture, or Podcast. A 60-minute file typically completes in 5-10 minutes including both transcription and summary.
What audio and video formats are supported?
MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, audio is extracted automatically.
What summary types are available?
Six purpose-built types — General, Meeting, Sales Call, Interview, Lecture, and Podcast. Every summary includes an executive summary, chapters, and key quotes. Each type then adds specialized sections: Meeting adds action items, decisions, open questions, and blockers. Sales Call adds client needs, objections, competitor mentions, pricing discussions, deal stage, sentiment, and BANT qualification signals. Lecture adds key concepts, terminology, review questions, and further reading. Interview adds notable exchanges, strengths, concerns, and an overall hire assessment. Podcast adds speaker profiles, discussion points, agreements and disagreements, and guest highlights. General adds a high-level topics list.
How accurate are AI summaries compared to a human summarizer?
On standardized news benchmarks, instruction-tuned LLM summaries are judged on par with human-written ones (Zhang et al., TACL 2023). VexaScribe summaries preserve 94%+ of named entities and decisions in our internal evaluation. Humans still win on legally-sensitive content and deep cultural nuance — see the AI vs human comparison section below for the honest breakdown.
Is my recording private — do you train on it?
No. Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We don't use customer data to train AI models. Self-serve deletion any time from your dashboard. Account deletion purges all recordings, transcripts, and summaries.
Which languages does the summary support?
VexaScribe transcribes in 99 languages with automatic language detection. Summaries can be generated in the source language, or translated into any of 99 target languages — feed a Spanish meeting recording and get English action items, or vice versa.
What does the free tier include?
30 minutes of transcription on the free preview, with summary generation included. Paid plans: Starter $2/mo (200 min), Basic $5/mo (1,000 min), Pro $10/mo (2,500 min), Studio $20/mo (6,000 min). All plans include all 6 summary types and all export formats.
Why not just paste the transcript into ChatGPT?
You'd need to record, transcribe, and summarize separately — three tools, three workflows, and ChatGPT has no audio input. VexaScribe is the integrated path: upload audio once, get the transcript and the summary together. Plus structured templates per content type (Meeting / Sales / Interview / Lecture / Podcast / General), entity-cross-checking against the source audio, and zero-retention contracts for business data.
Can the AI hallucinate facts that weren't in the recording?
It's possible but rare with instruction-tuned models grounded only on the transcript. VexaScribe runs an entity-cross-check pass that flags any name, number, or quote in the summary not present verbatim in the source transcript. Flagged items render with an underline so you can verify before exporting.
Can I edit the summary before exporting?
Yes. Every output section is inline-editable before you hit Export — change wording, drop bullets, reorder chapters, then export. Edits save automatically.
Which summary type should I pick for a 60-minute team meeting versus a 90-minute lecture?
Team meeting: pick Meeting — surfaces action items, decisions, and blockers in a structured layout. Lecture: pick Lecture — generates chapters with timestamps plus a key-concepts/terminology block for study. Podcast: pick Podcast for publishable show notes. Sales call: pick Sales Call for objections, next steps, and BANT-style qualification.