Updated June 28, 2026

Transcript Formatting: Styles, Standards, and How to Clean Up AI Output

By VexaScribe Editorial · Published June 28, 2026

Transcript formatting in 2026 falls along three styles. Verbatim captures every word — including “um,” “uh,” self-corrections, and false starts — and is used for research, legal, and journalism quote-checking. Intelligent verbatim removes most filler but keeps meaningful repetitions and emphasis; this is the most common default for business and journalism workflows. Edited (clean read) smooths the text for publication, removes hesitations, and may consolidate sentences for blog posts, books from interviews, and content workflows. Different use cases call for different styles: research and legal lean verbatim; business meetings, content publishing, and accessible reading lean intelligent verbatim or edited. This reference covers the three styles, formatting conventions by use case (research, legal, business, journalism, podcasts, video), the practical decisions you'll need to make (filler words, timestamps, speaker labels, paragraph breaks), and workflows for converting raw AI output to publish-ready format. Updated June 2026.

Key takeaways

→Three styles: verbatim, intelligent verbatim, edited. Each has appropriate use cases. Picking the right style first prevents a lot of downstream rework.
→The decision rule for filler words: keep them if the verbal behavior is data (research, legal). Remove them if the transcript is for reading (publication, business, accessibility).
→AI defaults to intelligent verbatim. Most modern tools (VexaScribe, Otter, Sonix) produce something close to intelligent verbatim automatically. Strict verbatim and edited styles require manual work.
→Speaker labels need renaming. AI tools produce “Speaker 1, Speaker 2” — every workflow that publishes the transcript requires manual rename to actual names or roles.
→Match format to use case. Research conventions (Jefferson, GAT, Bailey) differ from legal (Q/A numbered lines) which differ from journalism (full name first use, then last name).
→Consistency matters more than which convention you pick. Use the same labeling, timestamp granularity, and filler word handling throughout a single transcript.

The three transcript styles

The most important decision in transcript formatting is which of the three styles your use case calls for. Pick wrong and downstream cleanup is painful.

Verbatim

Every word as spoken. Filler words preserved. Self-corrections preserved. False starts preserved. Repetitions preserved. Pauses noted. Used when the verbal behavior itself is data.

Example:

Interviewer: So, um, can you tell me about — about your experience?

Participant: Yeah, well, I mean, when I first — when I first started, it was, you know, really hard. I didn't, um, I didn't really know what I was doing. [pause] And then, like, after a few months, I started to figure it out, you know?

Intelligent verbatim

Meaning preserved, most non-substantive filler removed. Repetitions collapsed when meaningless. False starts smoothed. Meaningful hesitations and emphasis preserved. This is the default for most business, journalism, and content workflows.

Same content, intelligent verbatim:

Interviewer: Can you tell me about your experience?

Participant: Yeah, well, when I first started, it was really hard. I didn't really know what I was doing. And then, after a few months, I started to figure it out.

Edited (clean read)

Smoothed for publication. All filler removed. Sentences may be consolidated or restructured. The speaker's voice is preserved but the prose is publication-ready. Used for blog posts, books from interviews, sermon-to-blog workflows.

Same content, edited:

Interviewer: Can you tell me about your experience?

Participant: When I first started, it was really hard — I didn't know what I was doing. After a few months, I started to figure it out.

The line between styles is judgment. Anything that conveys hesitation, emphasis, or meaning stays in intelligent verbatim; pure verbal noise gets cleaned. Edited goes further — restructuring for prose flow rather than preserving the conversational rhythm.

Formatting decisions you'll need to make

Style is the first decision; these are the second. The right answer depends on the style you've chosen plus the specific publication context.

Decision	Verbatim	Intelligent verbatim	Edited
Filler words	Keep all ("um," "uh," "like," "you know")	Remove most; keep meaning-bearing ones	Remove all
Self-corrections	Keep ("I think — I mean, I believe...")	Often consolidate or mark with em-dash	Smooth into final sentence
Repetitions	Keep all	Collapse meaningless ones, keep emphasis	Remove all
False starts	Keep	Mark with em-dash or remove	Remove
Sentence completeness	As-spoken; incomplete sentences preserved	Light smoothing	Complete sentences throughout
Punctuation	Indicate verbal pauses and intonation	Natural punctuation reflecting flow	Standard prose punctuation
Capitalization	Standard prose capitalization	Standard prose	Standard prose, may add proper-noun corrections
Crosstalk	Mark with brackets ([crosstalk])	Mark or simplify to one speaker	Simplify to readable single thread

Decisions not in the table

● Speaker labels: full names, initials, role-based, or generic? See by-use-case section below.
● Timestamps: none, sentence-level, paragraph-level, or word-level? See our dedicated timestamps reference for the four format options.
● Paragraph breaks: speaker turns only, topic changes, time intervals, or AI-generated? Most transcripts work best with a mix — break at speaker turns AND at topical shifts within long monologues.
● Header metadata: include interview date, location, participants, length, transcriber? Required for research and journalism; optional for business.

Formatting by use case

Conventions vary by discipline and purpose. Quick reference table, then detailed subsections.

Use case	Style	Speakers	Timestamps	Notes
Qualitative research (NVivo, ATLAS.ti, MAXQDA)	Verbatim	Roles (Interviewer/Participant) or IDs (P1, P2)	Paragraph-level or 30-60s intervals	Pauses, laughter, overlapping speech in brackets
Conversation analysis (Jefferson notation)	Verbatim with special notation	Numbered or initialed	Line-level with precise timing	Specialized symbols for pauses, intonation, emphasis, overlap
Legal depositions and court	Strict verbatim	Q (question) / A (answer)	Page-line references	Numbered lines, page headers, certification page; certified court reporters have specific format requirements
Business meetings and calls	Intelligent verbatim	Full name + role on first use	Sentence-level optional	Action items pulled out separately; AI summary often at top
Journalism interviews	Verbatim for quote-check + intelligent verbatim for publication	Full name first use, then last name or initial	Preserved for fact-checking	Maintain two versions: verbatim working copy + edited publication version
Podcast and audio content	Intelligent verbatim or edited	Full names	Optional; chapter markers preferred for SEO	H2 headers at topical breaks improve SEO and readability
Blog posts and articles from interviews	Edited / clean read	Named with context	Usually removed	Smooth prose, may consolidate paragraphs across speaker turns
Video subtitles and captions	Heavily edited for reading speed	Speaker name in brackets when speaker changes	Word-level required for cue boundaries	Different beast — SRT/VTT cue format; max ~80 chars / 5s per cue typically

Research transcripts (NVivo, ATLAS.ti, MAXQDA)

Qualitative researchers using NVivo, ATLAS.ti, or MAXQDA follow verbatim conventions. The verbal behavior is data — filler words, hesitations, and corrections are preserved because they carry meaning.

Standard format:

Interview ID: P003
Date: 2026-06-15
Duration: 47:23
Interviewer: Sarah Chen
Participant: P003 (anonymized)
Transcriber: AI + human review

[00:00:14]
Interviewer: Thank you for agreeing to participate. Can you tell me about, um, when you first realized that — that your approach needed to change?

[00:00:32]
P003: Yeah, well, I mean, it was a, [pause 2s] it was a gradual thing. I didn't, you know, I didn't wake up one day and think "this isn't working." It was more like, um, [laughs] it was more like a series of small failures, you know?

[00:01:08]
Interviewer: Mm-hm. Can you give me an example?

Conventions: verbatim, speaker labels by role with participant ID, paragraph-level timestamps at speaker turns, bracketed notes for pauses and non-verbal behavior, header with study metadata.

Specialized notation: conversation analysis researchers use the Jefferson Transcription System (Gail Jefferson) with specific symbols for overlap ([), latching (=), prolonged sound (::), and emphasis (italics). German conversation analysis uses GAT (Gesprächsanalytisches Transkriptionssystem). These specialized systems sit on top of verbatim transcription — you produce the verbatim first, then annotate. See our qualitative research guide for deeper workflow context.

Legal and court transcripts

Strict verbatim for certified court reporters. Specific format requirements set by jurisdiction and by the NCRA (National Court Reporters Association) for the US and similar bodies elsewhere. Court reporters have specialized training and certification; this section provides general guidance for understanding the format, not a substitute for court-reporter standards.

Standard deposition format:

                    DEPOSITION OF JANE DOE
                    Page 47

  1   Q.  Please state your full name for the record.
  2
  3   A.  My name is Jane Doe.
  4
  5   Q.  And what is your current occupation, Ms. Doe?
  6
  7   A.  I'm a software engineer at — at Acme Corporation.
  8
  9   Q.  How long have you held that position?
 10
 11   A.  I've been there for about three years.

Conventions: strict verbatim (including self-corrections marked with em-dash), numbered lines, Q/A format with bold or capitalized speaker indicators, page headers identifying the proceeding, certification page at the end signed by the court reporter.

AI tools and court-admissibility: AI-generated transcripts are not court-admissible as certified records in 2026. They're useful for prep, discovery review, and internal review — but the filed record requires a certified court reporter. See our legal transcription guide for the broader framing of where AI fits and doesn't in legal workflows.

Business meetings and calls

Intelligent verbatim as default. Readability matters more than verbal fidelity. Common pattern: AI-generated summary at the top, action items pulled out separately, full transcript below for reference.

Standard format:

Meeting: Q3 Planning Review
Date: 2026-06-25
Duration: 52 minutes
Attendees: Sarah Chen (VP Marketing), Marcus Rivera (CTO), Jordan Park (PM)

SUMMARY
The team aligned on Q3 priorities: shipping the new analytics dashboard
by end of August, expanding the EU customer success team, and pausing
the redesign of the onboarding flow until Q4.

ACTION ITEMS
- Marcus: finalize analytics dashboard scope by July 5
- Sarah: hiring plan for EU CS team by July 12
- Jordan: communicate redesign delay to design team this week

TRANSCRIPT

Sarah Chen [VP Marketing] (00:00:32):
Let's start with where we are on the analytics dashboard. Marcus, what's
the status?

Marcus Rivera [CTO] (00:00:45):
We're about 70% done with the core data model. The frontend work is going
to be tight — I think we can ship by end of August if we don't take on
any more scope changes.

Conventions: intelligent verbatim, full speaker name with role on first use, sentence-level timestamps optional but useful for navigation, summary and action items at top for executive readers who won't read the whole transcript.

Journalism and interview transcripts

Journalists typically maintain two versions: a verbatim working copy for quote-checking and fact-verification, and an intelligent verbatim or edited version for publication. The verbatim copy is the source of record; the published version is the readable extract.

Standard format (working copy):

Interview: Dr. Maria Santos, NIH
Topic: New approach to chronic fatigue research
Date: 2026-06-20 | Location: Bethesda, MD | Duration: 1:14:32
Reporter: Alex Kim

[00:02:14]
Kim: Tell me about how this approach came together. When did you first
realize that the existing framework wasn't capturing what you were
seeing in patients?

[00:02:38]
Santos: Yeah, it was, um, it was probably 2019. I'd been seeing these
patients for, you know, maybe ten years at that point. And the standard
framework just — it didn't fit. It didn't explain the pattern of
remissions and relapses we were seeing.

[00:03:12]
Kim: When you say "the standard framework," can you spell that out?

[00:03:20]
Santos: Sure. So the model that, that everyone was using assumed a
relatively stable course over time, with, you know, gradual decline or
gradual improvement. But what we were actually seeing was much more
episodic.

Conventions: verbatim style for the working copy preserves the source for quote-verification, timestamps preserved for fact-checking and locating the original audio, last-name-only speakers after first use, header with interview metadata for citation. For published quotes, journalists then produce a clean edited version that maintains the speaker's voice while removing distracting verbal noise.

Podcast and audio content transcripts

Intelligent verbatim or lightly edited for readability. Often optimized for SEO — H2 headers at topical breaks, intro paragraph summarizing the episode, full transcript below with speaker names.

Standard SEO-friendly format:

# How AI Note-Takers Are Changing Meeting Culture
## Episode 47 of the Founder Stack podcast

In this episode, we talked with Sarah Chen about how AI note-takers
are reshaping meeting culture at startups — what works, what doesn't,
and where the technology is heading.

## The starting point: meeting overload

Host: Sarah, thanks for joining us. Let's start at the beginning. What
problem were you trying to solve when you started looking at AI note-takers?

Sarah Chen: We were drowning in meetings. Like every startup that's
grown past 30 people, suddenly half our day was meetings, and nobody
was actually capturing what was decided.

## The first wave of tools

Host: And what was your first experiment?

Sarah Chen: We tried Otter first, in 2024. It was good for English-only
meetings but our team is bilingual, so it didn't really work for us.

SEO considerations: H2 headers at topical breaks help Google understand structure, intro paragraph summarizes for search snippets, speaker names provide context, schema markup (Article, PodcastEpisode if applicable, FAQPage if the episode is Q&A heavy) improves rich-result eligibility. Cross-link from podcast platform pages to the transcript page. See our podcast transcription guide for broader podcaster-specific context.

Video subtitles and captions

Different beast from prose transcripts. Subtitles are time-coded cues with strict character and duration constraints, formatted as SRT (SubRip) or VTT (WebVTT) files. Heavy editing for reading speed is required — viewers can't read as fast as people speak, so cues are condensed.

SRT cue example:

1
00:00:01,500 --> 00:00:04,200
Welcome to the show. Today we're talking
about AI note-takers.

2
00:00:04,201 --> 00:00:07,850
Specifically, how they're changing
how teams work.

3
00:00:07,851 --> 00:00:11,300
[Sarah Chen]: We were drowning
in meetings.

Conventions: max ~80 characters per cue, ~5 seconds per cue typical, max 2 lines per cue, speaker labels in brackets when speaker changes mid-stream, cue boundaries on real word starts and ends (not interpolated). For deeper coverage including the SRT vs VTT format distinction and the captions vs subtitles terminology, see our transcription timestamps reference and captions vs subtitles guide.

Workflow — raw AI output to publish-ready

Practical step-by-step for converting AI-generated raw transcript into the format you actually need.

Generate the raw AI transcript. Any modern tool (VexaScribe, Otter, Sonix, Notta, Rev AI, self-hosted Whisper) produces something close to intelligent verbatim with speaker diarization. The defaults: AI capitalization, AI punctuation, speakers labeled “Speaker 1, Speaker 2,” paragraph breaks at speaker turns.
Choose your target style. Decide upfront: verbatim, intelligent verbatim, or edited? Different styles require different cleanup decisions. Picking wrong wastes time later.
Rename speakers. Replace “Speaker 1, Speaker 2” with actual names or roles. For research, use participant IDs. For legal, use Q/A. For journalism, use full names on first use, then last name or initial. This is usually a 2-minute find-and-replace in your editor.
Apply style-specific cleanup. For verbatim: typically no further cleanup needed (AI tools may have removed some filler — add it back if needed for fidelity). For intelligent verbatim: lightly clean remaining filler, smooth false starts. For edited: heavily restructure for prose flow.
Refine paragraph breaks. AI defaults to breaking at speaker turns. Within long monologues, add breaks at topical shifts for readability. Aim for 2-4 sentence paragraphs in business and journalism; longer paragraphs OK in research transcripts where preserving conversational rhythm matters.
Manual review for technical content. AI capitalization, brand names, technical terminology, specialty vocabulary, foreign-language words, and proper nouns require manual verification. Read through once specifically looking for these.
Adjust timestamps to your target granularity. Most tools support paragraph-level, sentence-level, or word-level timestamps. Pick by use case (research: paragraph-level; business: optional sentence-level; subtitles: word-level required).
Add header metadata. For research, legal, and journalism: date, location, participants, duration, transcriber. For business: meeting name, attendees, action items at top.
Export to target format. DOCX for editing and review, PDF for finalization, Markdown for blog/content workflows, SRT/VTT for subtitles, plain TXT for research import to NVivo/ATLAS.ti/MAXQDA.

Time estimate: for a 1-hour audio, expect 15-30 minutes of cleanup for intelligent verbatim, 30-60 minutes for edited (with prose restructuring), and 5-10 minutes for verbatim (mostly just speaker renaming and metadata). AI tools take 5-10 minutes for the initial transcription itself.

Tool-specific format notes

Different downstream tools expect different formats. Quick reference:

● Microsoft Word and Google Docs: DOCX export from your transcription tool typically works well. Use Heading styles (H1, H2) for sections so the navigation pane works. For long documents, consider adding a clickable table of contents.
● Markdown (blog and content workflows): use ## for section headers, ** for speaker names in bold, > for quoted callouts. Most static site generators handle Markdown transcripts natively.
● NVivo: import as plain TXT or DOCX. NVivo's “Transcribed Source” format supports time-coded segments — export from your tool with sentence-level timestamps for best results. NVivo can also auto-detect speakers from labeled transcripts.
● ATLAS.ti: imports DOCX with speaker labels preserved. Time codes in the transcript link back to the audio file if you import both.
● MAXQDA: imports DOCX or specialized MAXQDA transcript format. Supports paragraph-level time coding linked to audio.
● SRT and VTT (subtitles): word-level timing required for high-quality cue boundaries. Most modern AI tools (VexaScribe, AssemblyAI, Deepgram, Whisper-based tools) produce word-level timing internally — check that your export uses it for SRT/VTT generation.
● Notion, Obsidian, and content workflows: Markdown works natively. For Notion, you can also paste DOCX directly. Speaker labels in bold or as toggle blocks improves scanability.

Common mistakes

Patterns we see repeatedly when reviewing reader-submitted transcripts:

1. Over-cleaning a verbatim transcript

Impact: Loses the research/legal value — the filler words and hesitations ARE the data in qualitative research and the verbatim record in legal contexts

2. Under-cleaning an intelligent verbatim transcript

Impact: Reader struggles with too much filler and false-start noise; defeats the purpose of producing readable content

3. Inconsistent speaker labels across pages

Impact: "Sarah Chen" on page 3, "S. Chen" on page 7, "SC" on page 12 — readers lose track; in legal contexts can be cited as an error

4. Missing timestamps when fact-checking is needed

Impact: Journalism quote-checking and legal verification need timestamps to locate the original audio; removing them too early forces re-transcription if a question arises

5. Inconsistent filler word handling within same transcript

Impact: Reader perceives bias or arbitrary editing; in research, undermines methodological transparency

6. Trusting AI capitalization and punctuation on technical content

Impact: Technical terms, brand names, abbreviations, and specialty vocabulary often get capitalized wrong by AI; manual review required for any publication

7. Paragraph breaks only at speaker turns

Impact: Within long monologues, AI defaults often produce wall-of-text paragraphs; add topical breaks for readability

8. Removing pauses in research transcripts

Impact: In qualitative research, pauses are meaningful (hesitation, reflection, conversational floor); marking them in brackets is standard

9. Applying one format to all use cases

Impact: Research-grade verbatim is wrong for a blog post; clean-read edited is wrong for legal depositions; match the format to the use case

Frequently asked questions

What's the difference between verbatim and intelligent verbatim transcripts?

Verbatim captures every word — including filler words ("um," "uh," "like"), self-corrections, false starts, repetitions, and incomplete sentences. Used when the verbal behavior itself is data: qualitative research, court reporting, journalism quote-checking, conversation analysis. Intelligent verbatim preserves meaning but removes most non-substantive filler — a few "um"s removed, meaningless repetitions collapsed, false starts cleaned up, but the speaker's voice and meaningful pauses preserved. Used as the default for business meetings, content publishing, journalism, and accessible reading. The line between them is judgment: anything that conveys hesitation, emphasis, or meaning stays; pure verbal noise gets cleaned. For most transcription tools, intelligent verbatim is the implicit default — AI tools typically remove some filler automatically.

Should I keep filler words like 'um' and 'uh' in a transcript?

Depends on the use case. Keep filler words when: (1) doing qualitative research where verbal hesitation is data (conversation analysis, discourse analysis, language acquisition research); (2) producing a legal verbatim transcript for court or deposition use; (3) journalism quote-checking where exact wording matters; (4) linguistics or sociolinguistics research. Remove filler words when: (1) producing a published blog post or article from interview content; (2) meeting transcripts for executive review (readability matters more than fidelity); (3) podcast show notes and SEO-friendly content versions; (4) accessibility transcripts where the goal is readable comprehension. Default decision: if you're publishing the transcript for reading, remove filler. If you're using the transcript as research data or legal record, keep filler.

How do I add or remove timestamps from a transcript?

Most transcription tools support this as an export option. To add timestamps: re-export from your tool with the "include timestamps" option toggled on; choose granularity (word-level, sentence-level, paragraph-level). To remove timestamps: re-export with the option toggled off; or use find-and-replace with regex if you only have the file. For deeper guidance on the four timestamp formats (SRT comma-decimal, VTT period-decimal, JSON fractional seconds, TXT brackets) and how granularity affects subtitle quality vs reading workflows, see our dedicated guide on transcription timestamps. Word-level matters for subtitle generation and clip extraction; paragraph-level is fine for prose reading; sentence-level is the most common compromise.

What's the standard format for an interview transcript?

Depends on the field. For qualitative research (NVivo, ATLAS.ti, MAXQDA), the convention is: header with date, location, interviewer/participant identifiers, then speaker-labeled paragraphs with timestamps every 30-60 seconds or at speaker turns. Verbatim style is preferred — filler words preserved, pauses noted in brackets ([pause], [laughs]). For journalism, the convention is: top-of-document with interview metadata (date, subject, location, length), then speaker-labeled paragraphs in intelligent verbatim style (filler mostly removed, but meaning-bearing hesitations preserved for quote-checking). For legal depositions, the convention is strict verbatim with numbered lines, Q/A format, and certification page (handled by court reporters, not general tools). For business interviews and content production, intelligent verbatim with speaker names is typical. No single universal standard exists; check your discipline's or organization's specific conventions.

What's the difference between a research transcript and a business transcript?

Research transcripts (qualitative research, social science, linguistics): typically verbatim style, speaker-labeled by role (Interviewer/Participant), paragraph-level timestamps, special notation for pauses and overlapping speech, formatted for import into NVivo, ATLAS.ti, or MAXQDA. The verbal behavior is data — filler words, hesitations, and corrections are preserved because they carry meaning. Business transcripts (meetings, calls, internal): typically intelligent verbatim, speaker-labeled by name and role, sentence-level timestamps, AI-generated summary often included at the top, action items pulled out separately. Readability and decision-tracking matter more than verbal fidelity. The fundamental difference: research treats the audio AS data; business treats the audio as a SOURCE of data. Different goals, different formats.

How do I label speakers in a transcript?

Common conventions, from most formal to most casual. (1) Court reporting and legal: Q for question, A for answer, plus full names listed in the caption page (e.g., "Q. Please state your name. A. John Smith."). (2) Academic research: roles abbreviated (Interviewer I:, Participant P:, or Researcher R:), with the participant identifier noted in the document header ("P1", "P2" for de-identified studies). (3) Journalism: full names on first use, then last name or initials thereafter ("Smith said..." or "S:"). (4) Business: full names with role ("Sarah Chen, VP Marketing:") for executive readers; first names alone for informal team minutes. (5) AI defaults: most AI tools produce "Speaker 1, Speaker 2" and require manual renaming. Consistency matters more than which convention you pick — use the same labeling style throughout a single transcript.

Can AI generate properly formatted transcripts out of the box?

Mostly intelligent verbatim by default, with structural decisions (speaker labels, paragraph breaks, timestamps) configurable but not always perfect. Most AI tools (VexaScribe, Otter, Sonix, Notta) generate something close to intelligent verbatim automatically — they remove some filler words, capitalize sentences, add punctuation, identify speakers via diarization. What requires manual work: speaker renaming ("Speaker 1" to actual names), paragraph break refinement (AI may break at speaker turns when topic shifts would be better), special formatting decisions (verbatim style choice, timestamp granularity, header metadata). For research workflows requiring true verbatim with conversation analysis notation, no current AI tool produces the final format automatically — you start with AI output and refine to the specialized notation. For business and journalism workflows, AI defaults are usually 80% there.

What's the best format for a podcast transcript for SEO?

Three components matter for SEO. (1) Structural content: an H1 with the episode title, a brief introductory paragraph summarizing the episode, then the transcript proper with H2 section headers at topic breaks (use AI-generated chapter markers as a starting point). Speakers labeled with names. (2) Style: intelligent verbatim or lightly edited — readable prose, not raw verbatim. Filler removed. Long monologues broken into 2-3 sentence paragraphs. (3) Schema: include FAQPage schema if the episode covers Q&A content, Article schema for the transcript page, and consider PodcastEpisode schema if you have multiple episodes. Cross-link from your podcast platform pages to the transcript page. Many podcasters underestimate the SEO value — Google indexes transcripts and you can rank for long-tail keywords from your own content.

Methodology & disclosure

Sources: Jefferson Transcription System references draw on Gail Jefferson's foundational conversation analysis methodology. GAT (Gesprächsanalytisches Transkriptionssystem) is the German conversation analysis convention. NCRA (National Court Reporters Association) standards verified against ncra.org. CAQDAS software documentation verified against the official sites for NVivo, ATLAS.ti, and MAXQDA. SRT and WebVTT format conventions verified against the W3C WebVTT 1.0 specification.

Disclosure: This page is published by VexaScribe. The formatting conventions described are not VexaScribe inventions — they're established standards across disciplines. We've described them clearly because the existing reference material is fragmented across disciplines and we wanted a single page covering the cross-discipline picture. Where we mention VexaScribe specifically (in the workflow section), the same workflow applies to any modern AI transcription tool.

Editorial standards: See our editorial standards.