Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

Transcript Summary — Audio & Video to AI Summary

Upload an audio or video recording. Get a full transcript and a structured AI summary in 6 purpose-built types — Meeting, Sales, Interview, Lecture, Podcast, or General.

VexaScribe (formerly NovaScribe) generates an AI summary directly from your audio or video recording. Upload an MP3, WAV, MP4, MOV, or any of 17 supported formats up to 5 GB. We transcribe with Whisper Large-v3 (95% accuracy on clear audio, 99 languages), then generate a structured summary tailored to your chosen type — one of 6 purpose-built templates: General, Meeting, Sales Call, Interview, Lecture, or Podcast. A 60-minute file completes in 5-10 minutes, including both transcription and summary. Summaries preserve 94%+ of named entities in our internal evaluation, run on instruction-tuned LLMs that match human-written quality on standard benchmarks (Zhang et al., TACL 2023), and can be translated to any of 99 target languages. Knowledge workers waste roughly 31 hours per month in unproductive meetings (Atlassian State of Teams 2025) — turning that audio backlog into structured summaries is the fastest ROI you can ship this quarter.

30 minutes free99 languages6 summary types5 GB / 10 hr per file

Why summarize transcripts at all?

Raw transcripts are searchable but unreadable. Summarization is what turns 90 minutes of audio into a 90-second decision — and the time savings are measurable.

Numbers verify what every knowledge worker already feels: meetings and unfiltered transcripts are a productivity tax. AI summaries are the recovery mechanism. See how we verify these stats.

Pick your summary type (6 options)

The right summary depends on what kind of recording you uploaded. VexaScribe ships six purpose-built templates — each returns a different structured field set. Pick one before you click Generate, and switch any time without re-uploading.

Every summary includes: an executive summary (TL;DR paragraph), chapters (time-bounded sections of the recording), and key quotes (notable verbatim quotes from the transcript). Each type adds the sections below on top.

General

Default for any audio without specific context — voice memos, briefings, mixed content

Sections included

  • Topicshigh-level subjects discussed

Use when the recording doesn't fit a specialized category — flexible high-level structure.

Meeting

Team meetings, standups, planning sessions

Sections included

  • Action itemsowner, task, deadline, priority
  • Decisionswhat was decided and by whom
  • Open questionsunresolved items to follow up on
  • Blockerswhat's blocking progress and who raised it

Decisions, action items, blockers, open questions — surfaced in a structured layout ready for distribution.

Sales Call

Sales discovery, demo, negotiation, closing calls

Sections included

  • Action itemsowner, task, deadline, priority
  • Client needsneed, priority, evidence quote
  • Objectionsobjection, seller response, resolution status
  • Competitor mentionsname, context, positioning
  • Pricing discussionswhat was discussed and the outcome
  • Deal next stepsowner, deadline, priority
  • Sentimentqualitative read on prospect interest
  • Deal stageDiscovery, Demo, Negotiation, Closing, Closed Won/Lost
  • Qualification signalsBANT cues observed (Budget, Authority, Need, Timeline)

The richest of the six types — full sales-cycle metadata including BANT, deal stage, objection handling, and prospect sentiment.

Interview

Job interviews, candidate assessments

Sections included

  • Notable exchangestopic, summary, who was involved
  • Strengthsstrengths shown by the candidate
  • Concernsconcerns raised during the interview
  • Overall assessmentsynthesized hire-or-no-hire reasoning

Designed for hiring loops — surfaces candidate strengths, concerns, and a synthesized hire-or-pass recommendation.

Lecture

Educational content, classes, training sessions

Sections included

  • Key conceptsconcept and explanation
  • Examples givenexample, concept it illustrates, teaching point
  • Terminologyterm-and-definition glossary
  • Takeawayskey learning points
  • Review questionsquestion with answer hint, as a study aid
  • Further readingtopic and why it's relevant

Built for studying — review questions, a terminology glossary, and further-reading suggestions are all included.

Podcast

Conversational shows, host + guest formats

Sections included

  • Speaker profilesspeaker label mapped to name and role
  • Discussion pointstopic, speaker, their position
  • Key insightsinsight, who said it, context
  • Agreements and disagreementstopic, type, speakers involved
  • Recommendationsitem, who recommended it, reason
  • Guest highlightsspeaker, moment, why it's notable

Tailored for show-notes publishing and guest-focused podcasts — captures speaker positions, recommendations, and standout moments.

Quality is consistent across types — instruction-tuned models match human-written quality on standardized news benchmarks (Zhang et al., TACL 2023). The differentiator is which structured fields surface what you actually need from the conversation.

See it on real recordings

One short before/after example for each summary type — Meeting, Sales Call, Lecture, Interview, Podcast, and General. Each side capped at ~80 words so you can scan all six in two minutes.

MeetingMeeting summary type
Audio transcription workflow

Audio transcript excerpt

…so the renewal pricing came back at 18% up year-over-year, which Priya flagged as risky given Acme's Q2 results. Tom suggested we counter at 9% and lock for 24 months. Priya owns the redline by Friday. We also agreed to pause the SSO migration until legal signs off — Rahul will confirm the lawyer's bandwidth Monday. No decision on the Slack-vs-Teams switch; bumped to next week's all-hands…

AI summary output

Executive summary: Renewal counter agreed; SSO paused pending legal review.
Decisions: Counter Acme at 9% uplift, 24-month lock. Pause SSO migration pending legal sign-off.
Action items: Priya — redline by Fri (high). Rahul — confirm lawyer bandwidth Mon (medium).
Blockers: Legal sign-off on SSO (raised by Tom).
Open questions: Slack vs. Teams switch — moved to next all-hands.
Sales CallSales Call summary type
Audio transcription with speaker labels

Audio transcript excerpt

…yeah, the pain is mostly that we run a Monday revenue meeting and Sarah pulls the numbers from three different dashboards. It takes her like three or four hours every week. Budget-wise we have something allocated for next quarter but Q2 is locked. Decision-wise, I'm the one signing but our CFO needs to bless anything over $50K annual. Send me an ROI model with realistic numbers and I'll loop her in next week…

AI summary output

Executive summary: Strong intent; Q2 budget locked; CFO sign-off required above $50K.
Client needs: Manual revenue reporting (3-4 hrs/week, high priority) — "Sarah pulls numbers from three different dashboards."
Objections: Q2 budget locked → seller offered Q3 start; status: deferred.
Pricing discussions: $50K annual threshold triggers CFO involvement.
Deal next steps: Send ROI model, loop in CFO (owner: seller; due: next week).
Sentiment: Engaged.
Deal stage: Discovery.
Qualification signals (BANT): Budget — Q3+ allocated. Authority — split between champion (<$50K) and CFO. Need — confirmed. Timeline — Q3.
LectureLecture summary type
Lecture transcription guide

Audio transcript excerpt

…so when we talk about externalities in welfare economics we mean a cost or benefit imposed on a third party that isn't reflected in the market price. Pollution is the classic negative case. Pigou's solution in 1920 was a tax equal to the marginal external cost — the so-called Pigouvian tax. Coase pushed back in 1960: if transaction costs are low, parties can bargain to an efficient outcome regardless of who holds the property right…

AI summary output

Executive summary: Externalities and two solutions — Pigouvian tax (1920) and Coase Theorem (1960).
Chapters: 00:00 Externalities · 04:12 Pigouvian tax · 09:30 Coase Theorem.
Key concepts: Externality — third-party cost or benefit not in the market price. Pigouvian tax — tax equal to marginal external cost. Coase Theorem — when transaction costs are low, parties bargain to an efficient outcome regardless of who holds the property right.
Terminology: externality, Pigouvian tax, Coase Theorem, transaction costs, property rights.
Review questions: Why does Coase challenge Pigou? When does the Coase Theorem fail in practice?
InterviewInterview summary type
Interview transcription with speaker labels

Audio transcript excerpt

…my approach to a hard reorg is to compress the information gap. The week before is the hardest part — when leadership knows and the team doesn't. We had to lay off forty people last year. I made the decision Monday, communicated Tuesday, supported them with severance and intros Wednesday. The trust we kept came from speed, not from the message…

AI summary output

Executive summary: Candidate has run a 40-person layoff and emphasizes speed of communication as the trust-preservation mechanism.
Notable exchanges: Reorg leadership — candidate compressed the Mon/Tue/Wed announcement-to-support cycle; involved the full leadership team.
Strengths: Crisis communication framework; willingness to make hard calls; concrete severance and support process.
Concerns: Did not address one-on-one impact or follow-up retention with the surviving team.
Overall assessment: Lean hire — strong on operational rigor under pressure. Probe further on long-term morale recovery in the next round.
PodcastPodcast summary type
Full podcast transcription workflow

Audio transcript excerpt

Host: …and what most founders miss is that the first 50 customers aren't a market — they're a focus group. Guest: I'd push back slightly. They're a market if you've already nailed the ICP. The trap is when you confuse the loudest five for the average. Host: Fair. So how do you avoid that? Guest: Write the ICP in one sentence and tape it to your monitor…

AI summary output

Executive summary: Debate on whether the first 50 customers are a market or a focus group; resolution via ICP discipline.
Speaker profiles: Speaker 1 — Host (founder, prior exit). Speaker 2 — Guest (3x founder, current CEO).
Discussion points: First-50 customers — Host says focus group; Guest says market if the ICP is locked.
Agreements and disagreements: Disagreement on framing, resolved via the ICP one-liner heuristic.
Recommendations: Tape a one-sentence ICP to your monitor (Guest) — defends against feature-creep from the loudest five customers.
Guest highlights: "They're a market if you've already nailed the ICP." (12:18)
GeneralGeneral summary type
Voice memo transcription guide

Audio transcript excerpt

…just dumping some thoughts about the offsite. The hike was great, dinner was fine but service was slow. The most useful session was the founder Q&A — three things came up worth following up on: the pricing experiment timeline, the hire for ML platform, and whether we want to go back to the same venue next year…

AI summary output

Executive summary: Voice memo reviewing the company offsite — strong founder Q&A, three follow-ups identified.
Chapters: 00:00 Logistics recap · 01:15 Founder Q&A highlights · 02:40 Action items.
Topics: Offsite logistics (hike, dinner); founder Q&A; pricing experiment timeline; ML platform hire; venue decision for next year.
Key quotes: "The most useful session was the founder Q&A."

AI summary vs human-written: when each wins

AI wins on cost, speed, and scale. Humans still win on legal liability, deep cultural nuance, and stakes-where-a-mistake-is-fatal. The honest scorecard:

CriterionAI summaryHuman-writtenWinner
Cost per 60-min summary~$0.00–$0.30$30–$80 (freelance)AI
Turnaround time~15 seconds2–24 hoursAI
Accuracy on standard transcripts~94%+ entity preservation; on par with human on news benchmarks (Zhang et al., TACL 2023)~96–98% with domain-expert reviewerHuman (narrowly)
Cultural / idiomatic nuanceMisses sarcasm, regional idiomsStrong when reviewer shares contextHuman
Legal / medical liabilityNot certified; output is not auditable testimonyTrained transcriptionist + signed attestationHuman
Scale (1,000+ transcripts/week)TrivialRequires a teamAI
Format consistency across runsHigh — deterministic templatesVariable across humansAI
Long-tail languages (e.g., Welsh, Swahili)Strong on top 25, weaker on long tailDepends on reviewer availabilityTie / depends

AI wins for everyday meetings, podcasts, and lectures — the volume problem. Humans still win when a single misquote can land you in court or harm a patient: legal depositions, medical records, high-stakes journalism. VexaScribe's stance: ship AI summaries with entity-flagging, route the 5% of high-stakes cases to human review. See our editorial review process.

Summarize in any language — or translate while you summarize

VexaScribe transcribes in 99 input languages and can deliver the summary in any of 99 target languages — input and output are independent. Upload a Spanish meeting recording and get English action items, or vice versa, in a single workflow.

Common pairings:

  • • Spanish meeting recording → English action items (US team distribution)
  • • German lecture recording → English chapters + key concepts (international students)
  • • Japanese podcast → English show notes (cross-market publishing)
  • • Portuguese interview → English quotes + themes (research synthesis)

For pure audio translation without summarization, see transcribe and translate audio.

Export the summary anywhere

The summary downloads as Markdown, DOCX, or plain text — and copy-to-clipboard preserves Markdown formatting so you can paste cleanly into Notion, Obsidian, Google Docs, or Slack. The full transcript exports separately as TXT, DOCX, SRT, VTT, or JSON.

IntegrationSupported exportSetup
Markdown file.md with frontmatter + headingsDirect download, no setup
DOCX (Word)Word-compatible with stylesDirect download, no setup
Plain text.txt for any text editorDirect download, no setup
Notion (paste)Copy to clipboard with Markdown formattingPaste into any Notion page
Obsidian (paste)Markdown with wiki-link compatible headingsPaste into vault
Google Drive (paste)Copy to clipboard, paste into DocManual paste

See which integrations are on the free tier.

What happens to your recording

Audio uploads are encrypted in transit (TLS 1.2+) and at rest. We don't train AI models on your recordings, and you can delete files any time.

  • TLS 1.2+ in transit, encrypted at rest in AWS eu-west-2.
  • No model training on customer data — your recordings, transcripts, and summaries are yours.
  • Self-serve deletion any time from your dashboard.
  • Account deletion purges all recordings, transcripts, and summaries.

Full details in our privacy policy.

How to generate a transcript summary

Three steps from audio file to structured AI summary.

  1. 1

    Upload audio or video

    Drag-drop an MP3, WAV, M4A, MP4, MOV, or any of 17 supported formats up to 5 GB. Source language is auto-detected across 99 supported languages.

  2. 2

    Pick a summary type

    Choose from 6 types — General, Meeting, Sales Call, Interview, Lecture, or Podcast. Each generates a different structural template. Switch types and regenerate without re-uploading.

  3. 3

    Edit and export

    Review the transcript and summary side-by-side in the synced editor. Correct anything, then download the summary as Markdown or DOCX, or copy to clipboard for Notion / Drive / Slack.

Transcript Summary — Frequently Asked Questions

How does transcript summary work on VexaScribe?

Upload an audio or video recording (MP3, WAV, M4A, MP4, MOV, and 12 other formats up to 5 GB). VexaScribe (formerly NovaScribe) transcribes the audio with Whisper Large-v3, then generates an AI summary tailored to your chosen type — General, Meeting, Sales Call, Interview, Lecture, or Podcast. A 60-minute file typically completes in 5-10 minutes including both transcription and summary.

What audio and video formats are supported?

MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. For video files, audio is extracted automatically.

What summary types are available?

Six purpose-built types — General, Meeting, Sales Call, Interview, Lecture, and Podcast. Every summary includes an executive summary, chapters, and key quotes. Each type then adds specialized sections: Meeting adds action items, decisions, open questions, and blockers. Sales Call adds client needs, objections, competitor mentions, pricing discussions, deal stage, sentiment, and BANT qualification signals. Lecture adds key concepts, terminology, review questions, and further reading. Interview adds notable exchanges, strengths, concerns, and an overall hire assessment. Podcast adds speaker profiles, discussion points, agreements and disagreements, and guest highlights. General adds a high-level topics list.

How accurate are AI summaries compared to a human summarizer?

On standardized news benchmarks, instruction-tuned LLM summaries are judged on par with human-written ones (Zhang et al., TACL 2023). VexaScribe summaries preserve 94%+ of named entities and decisions in our internal evaluation. Humans still win on legally-sensitive content and deep cultural nuance — see the AI vs human comparison section below for the honest breakdown.

Is my recording private — do you train on it?

No. Audio files transit over TLS 1.2+ encryption and are stored encrypted at rest in AWS eu-west-2. We don't use customer data to train AI models. Self-serve deletion any time from your dashboard. Account deletion purges all recordings, transcripts, and summaries.

Which languages does the summary support?

VexaScribe transcribes in 99 languages with automatic language detection. Summaries can be generated in the source language, or translated into any of 99 target languages — feed a Spanish meeting recording and get English action items, or vice versa.

What does the free tier include?

30 minutes of transcription on the free preview, with summary generation included. Paid plans: Starter $2/mo (200 min), Basic $5/mo (1,000 min), Pro $10/mo (2,500 min), Studio $20/mo (6,000 min). All plans include all 6 summary types and all export formats.

Why not just paste the transcript into ChatGPT?

You'd need to record, transcribe, and summarize separately — three tools, three workflows, and ChatGPT has no audio input. VexaScribe is the integrated path: upload audio once, get the transcript and the summary together. Plus structured templates per content type (Meeting / Sales / Interview / Lecture / Podcast / General), entity-cross-checking against the source audio, and zero-retention contracts for business data.

Can the AI hallucinate facts that weren't in the recording?

It's possible but rare with instruction-tuned models grounded only on the transcript. VexaScribe runs an entity-cross-check pass that flags any name, number, or quote in the summary not present verbatim in the source transcript. Flagged items render with an underline so you can verify before exporting.

Can I edit the summary before exporting?

Yes. Every output section is inline-editable before you hit Export — change wording, drop bullets, reorder chapters, then export. Edits save automatically.

Which summary type should I pick for a 60-minute team meeting versus a 90-minute lecture?

Team meeting: pick Meeting — surfaces action items, decisions, and blockers in a structured layout. Lecture: pick Lecture — generates chapters with timestamps plus a key-concepts/terminology block for study. Podcast: pick Podcast for publishable show notes. Sales call: pick Sales Call for objections, next steps, and BANT-style qualification.

Summarize Your First Recording Free

30 minutes of free transcription, all 6 summary types included, no credit card required.