Key takeaways
- •Transcription method matters — verbatim for conversation analysis, intelligent verbatim for thematic analysis and grounded theory, clean read for narrative summaries.
- •Jeffersonian notation is the standard for conversation analysis (CA); AI doesn't produce it directly but can be a starting point for manual notation, saving 60-70% of pre-notation work.
- •AI has documented bias — Koenecke et al. (PNAS 2020) found 35% WER for Black speakers vs 19% for white speakers across major commercial ASR systems.
- •IRB/GDPR consent must explicitly cover third-party AI transcription services and data flow; HIPAA research requires a BAA.
- •CAQDAS-friendly exports — DOCX with consistent speaker labels works in NVivo, MAXQDA, ATLAS.ti, Dedoose, Quirkos; SRT/VTT enables timestamp sync in MAXQDA and ATLAS.ti.
- •Manual transcription costs 4-6 hours per audio hour; AI runs at 4-10× real-time at ~$0.20-$0.60/hour.
- •Document AI use in methods sections — disclose tool, verification process, accuracy limitations.
- •Verify every transcript against audio before coding — AI hallucination is a documented risk.
Transcription method by analytical framework
The right transcription method depends on what you'll do with the transcript. Conversation analysis needs full prosodic detail; thematic analysis works from cleaned content. Forcing one method on every project either wastes effort (over-transcribing for thematic work) or loses analytical purchase (under-transcribing for CA).
| Analytical framework | Recommended method | Rationale |
|---|---|---|
| Conversation analysis (CA) | Full Jeffersonian notation | Analyzes turn-taking, overlap, pause timing, prosody |
| Discourse analysis | Modified Jefferson or near-verbatim | Linguistic features and interactional detail matter |
| Interpretative phenomenological analysis (IPA) | Verbatim with pauses, emotional markers | Meaning-making in delivery is central |
| Thematic analysis (Braun & Clarke) | Intelligent verbatim | Content > delivery for coding themes |
| Grounded theory | Intelligent verbatim | Code from cleaned content; theoretical sampling iterative |
| Phenomenology | Verbatim or intelligent verbatim | Varies by tradition (descriptive vs hermeneutic) |
| Narrative inquiry | Intelligent verbatim | Story structure preserved, smoothed delivery |
| Ethnography | Mixed — selective verbatim + field notes | Context-dependent; field notes carry analytical weight |
| Oral history | Strict verbatim, OHA standards | Archival quality required for permanent record |
Methodology literature note: Braun & Clarke (thematic analysis) emphasize matching transcription depth to analytical aims rather than over-transcribing as a default. Gibbs (2007) and Poland (1995) frame similar tiered approaches but don't share a single canonical numbered framework — "Levels 1-4" terminology varies across authors.
Verbatim vs intelligent verbatim vs clean read
Three commonly-used transcription styles in qualitative research. Each captures different levels of speech detail and fits different analytical needs.
| Style | What it captures | Typical uses | AI fit |
|---|---|---|---|
| Strict verbatim | Every utterance, fillers, stutters, false starts, repetitions, non-verbal sounds, partial words | Conversation analysis, IPA, oral history, court transcripts | AI base output close; manual cleanup minimal |
| Intelligent verbatim | All meaning-bearing speech; filler words and stutters removed | Thematic analysis, grounded theory, narrative inquiry, most applied qualitative research | Best AI fit — minimal manual editing required |
| Clean read / edited | Smoothed grammar, polished prose, content-only | Executive summaries, public-facing research outputs, content summaries | Use AI summary feature instead of transcript |
Naturalized vs denaturalized: these terms appear in qualitative methodology literature with opposite definitions depending on the source. Oliver, Serovich & Mason (2005) describe naturalized transcription as capturing every utterance in detail (similar to strict verbatim) and denaturalized as removing idiosyncratic elements. Bucholtz (2000) uses the terms in reverse. When citing or writing methods, specify which framework you follow rather than assuming readers share your definition.
Jeffersonian notation for conversation analysis
Developed by sociologist Gail Jefferson, the Jeffersonian transcription system is the standard for conversation analysis (CA) and many discourse analysis projects. It captures interactional details — overlap, pause timing, prosody, breathing — that are lost in standard transcription but central to CA's analytical concerns.
Key Jeffersonian symbols
| Symbol | Meaning |
|---|---|
| [ ] | Overlapping speech (square brackets mark where overlap starts/ends) |
| = | Latching (one speaker continues immediately, no gap) |
| (0.5) | Timed pause in seconds and tenths |
| (.) | Micro-pause shorter than 0.2 seconds |
| : | Sound elongation (more colons = longer) |
| underline | Stress or emphasis on syllable |
| °word° | Quiet speech (lowered volume) |
| >word< | Speeded up speech |
| <word> | Slowed speech |
| hh | Audible exhalation (more h's = longer) |
| .hh | Audible inhalation |
| ↑ ↓ | Pitch shift up or down |
| (( )) | Transcriber's non-verbal description (e.g., ((laughs))) |
| (word) | Uncertain hearing; transcriber's best guess |
Sample Jeffersonian transcript
A: I [really- ] B: [yeah no I get it]= A: =yeah (0.5) it's just (.) hh ((sighs)) °hard° B: ↑right (.) so what do you do A: I just (0.3) >try to figure it out< on my own
AI and Jeffersonian notation. AI transcription does not produce Jeffersonian notation directly. Even Whisper Large-v3 outputs a clean text transcript without interactional markup. What AI can do for CA researchers: produce the base transcript that you then mark up manually with Jefferson symbols. This saves roughly 60-70% of pre-notation work — typing the words is mechanical; marking the prosody is the analytical labor. Several CA researchers report using AI for the first pass then spending 2-3 hours per audio hour on Jefferson notation, vs 6-8 hours for fully manual transcription plus notation.
Reference: Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13-31). John Benjamins.
AI transcription accuracy and demographic bias
Documented bias in commercial automated speech recognition (ASR) systems is a critical methodological consideration for qualitative researchers — particularly for research with populations affected by acoustic model gaps.
Koenecke et al. (2020) — racial disparities in commercial ASR
Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684-7689.
Study design: Tested five major commercial ASR systems (Amazon, Apple, Google, IBM, Microsoft) on 19.8 hours of audio from 42 white and 73 Black speakers.
Key findings:
- • Average word error rate (WER): ~35% for Black speakers vs ~19% for white speakers
- • Worst-case WER: ~45% vs ~15%
- • Disparity attributed to acoustic models trained on insufficient African American Vernacular English (AAVE) data
Graham & Roll (2024) — Whisper accent bias
Graham, C., & Roll, N. (2024). Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Letters, 4(2), 025206.
Found systematic accuracy differences in OpenAI Whisper across: American > British/Australian English; native > non-native speakers; read > conversational speech. Disparities are smaller than the Koenecke et al. findings on older commercial ASR but remain meaningful for research conclusions on accented populations.
Implications for qualitative research:
- →Document AI bias in methods sections. Cite Koenecke et al. (2020) and discuss whether your participant population is affected.
- →Verify every transcript against audio. Don't trust AI blindly — manual verification is the documented standard.
- →Consider human transcription for affected populations. For research on Black speakers, AAVE users, heavily accented English, or non-native speakers where accuracy is central to findings.
- →AI hallucination is a documented risk. AI can invent content during silence or unclear audio. Manual review is essential.
- →Report observed accuracy when accuracy is central. If your findings depend on what participants said precisely, report your verification process and observed error rates by participant demographic.
For broader accuracy methodology including LibriSpeech and FLEURS benchmarks, see how accurate is Whisper?.
IRB, GDPR, and consent for AI transcription
Third-party AI transcription introduces specific compliance questions that must be addressed in your IRB protocol, informed consent form, and (for EU researchers) data processing documentation.
US IRB considerations
- →Consent form disclosure. Name the third-party vendor (or describe the vendor category if vendor may change). Cloud transcription violates protocols that promised "no third-party sharing" without amendment.
- →Data flow disclosure. Where audio is processed, where transcripts are stored, retention timeline, deletion process.
- →Confidentiality agreement. Documented commitment that vendor will not use participant data for purposes beyond transcription.
- →Protocol amendments. Switching transcription vendors mid-study typically requires IRB amendment.
HIPAA considerations (US health research)
If your audio contains Protected Health Information (PHI), the transcription vendor becomes a Business Associate and a Business Associate Agreement (BAA) is required before any PHI-containing audio leaves your environment.
Honest disclosure: VexaScribe is not currently HIPAA-certified and does not sign BAAs. PHI-affected research should use HIPAA-compliant vendors — Rev offers BAAs on Enterprise tier, Verbit offers BAAs, some institutional transcription services. For research involving health information that does not meet the PHI threshold (anonymized recordings, public health research without identifiable data), the BAA requirement may not apply — consult your IRB and institutional compliance office.
GDPR considerations (EU research)
- →Lawful basis: typically informed consent for research purposes (Article 6(1)(a)) or legitimate interest (Article 6(1)(f)) with appropriate safeguards.
- →Article 28 Data Processing Agreement (DPA): required with any third-party processor.
- →EU data residency: preferred when possible. VexaScribe stores data in AWS eu-west-2 (London).
- →Cross-border transfer safeguards: Standard Contractual Clauses (SCCs) or adequacy decisions for data leaving the EEA.
- →Right to withdraw: participant can request data deletion at any time.
- →Retention policy: aligned with your institutional ethics approval, typically 3-7 years for de-identified research transcripts.
Sample IRB protocol language
"Audio recordings will be transcribed using VexaScribe, an AI-based transcription service that uses the Whisper Large-v3 model. Audio files will be uploaded over TLS 1.2+ encrypted connection and stored encrypted at rest in AWS eu-west-2. Recordings will be deleted from the service within [N] days of transcription. Transcripts will be manually verified by the research team against original audio before analysis. VexaScribe does not train models on user audio per their stated policy. A Data Processing Agreement will be in place with VexaScribe prior to processing participant data."
CAQDAS integration: NVivo, MAXQDA, ATLAS.ti, Dedoose, Quirkos, Taguette
Computer-Assisted Qualitative Data Analysis Software (CAQDAS) is where your transcripts will live during analysis. Import format compatibility matters — and a few CAQDAS tools handle timestamped transcripts better than others.
| Tool | Owner | Import formats | Timestamp sync | Speaker labels |
|---|---|---|---|---|
| NVivo 14 | Lumivero | DOCX, RTF, TXT, PDF | Via CSV/TSV format | Yes (consistent label format) |
| MAXQDA | VERBI | DOCX, RTF, TXT, PDF, SRT, VTT | Native auto-sync (SRT/VTT) | Yes |
| ATLAS.ti | Scientific Software | DOCX, RTF, TXT, PDF, SRT, VTT | Yes (SRT/VTT or RTF with timecodes) | Yes |
| Dedoose | SocioCultural Research | DOCX, TXT, spreadsheets | Limited | Yes (basic) |
| Quirkos | Quirkos | DOCX, ODT, TXT, RTF, PDF, XLSX | No | Yes |
| Taguette | OSS (Taguette project) | PDF, DOCX, TXT, HTML, EPUB, MOBI, ODT, RTF | No | Yes |
Most portable format: DOCX with consistent speaker labels (e.g., Speaker 1: on a new line). Speaker labels survive import in all five major tools when formatted consistently.
For timestamp sync: SRT or VTT — MAXQDA and ATLAS.ti both auto-sync these formats for clip-based coding. NVivo can use CSV/TSV with timecodes.
REFI-QDA standard: The Rotterdam Exchange Format Initiative for Qualitative Data Analysis enables project-level interchange between NVivo, MAXQDA, ATLAS.ti, Quirkos, Dedoose, and Taguette. Useful when collaborating across institutions with different CAQDAS preferences.
VexaScribe export: All four formats (TXT, DOCX, JSON, SRT) from a single transcription — no re-processing required. DOCX with speaker labels is the most portable for general CAQDAS workflows; SRT for MAXQDA/ATLAS.ti timestamp sync.
Citation conventions for AI-transcribed sources
As of 2026, APA, Chicago, and MLA have no transcription-specific citation entry. Current practice — per the APA Style Blog and most institutional library guides — treats AI tools like software references. Field-wide standards are still emerging.
APA 7 style reference (software treatment)
OpenAI. (2024). Whisper (large-v3) [Speech recognition model]. https://openai.com/whisper
In-text citation: (OpenAI, 2024). For transcription via a service that uses Whisper, also cite the service: Transcription was performed using VexaScribe (Whisper Large-v3; OpenAI, 2024).
Sample methods-section disclosure
"Audio recordings of [N] semi-structured interviews were transcribed using VexaScribe (Whisper Large-v3) and verified verbatim against original recordings by the first author. Transcription accuracy was estimated at approximately 95% on clean audio; all transcripts were manually corrected before coding. We acknowledge documented racial disparities in commercial ASR accuracy (Koenecke et al., 2020) and conducted enhanced verification on interviews with [specific population] participants. VexaScribe's stated privacy policy is that the service does not train models on user audio. A Data Processing Agreement was in place prior to transcription."
Honest note: Citation conventions for AI-assisted transcription are field-emerging. Expect this guidance to evolve as journals develop specific requirements. Consult your target journal's author guidelines and your discipline's methodology literature for current best practice.
Pricing: AI vs human vs in-house
Verified 2026 rates for major research-transcription services. Per-audio-hour pricing varies dramatically — AI services are 100-500× cheaper than human transcription, while human transcription delivers 99%+ accuracy and BAA availability for HIPAA-affected research.
| Service | Type | Per audio hour | Per minute | Notes |
|---|---|---|---|---|
| VexaScribe | AI | $0.20-$0.60 | ~$0.005-$0.01 | Whisper Large-v3, 99 languages, no model training on user audio |
| Rev (AI) | AI | ~$15 | $0.25/min | Pay-as-you-go, no subscription |
| Rev (Human) | Human | ~$119 | $1.99/min | 12-48hr turnaround, 99%+ accuracy |
| TranscribeMe (AI) | AI | ~$4 | $0.07/min | Budget AI tier |
| TranscribeMe (Human) | Human | ~$47-$48 | $0.79/min | Budget human tier; longer turnaround |
| GoTranscript (Human) | Human | ~$59 | $0.99/min | Mid-market human transcription |
| Verbit | Mixed | Enterprise custom | Custom | Enterprise-only; broadcast and legal-grade |
| Self-hosted Whisper | AI | $0 | $0 | Free forever with GPU + Python skills |
Cost math by project scale
- →PhD thesis (15-30 interviews × 60 min each): 15-30 hours audio = $3-$18 AI vs $1,800-$3,600 human transcription.
- →Multi-year longitudinal study (100+ interviews): $20-$60 AI vs $12,000-$24,000 human.
- →UX research sprint (10 customer interviews × 45 min): 7.5 hours audio = $1.50-$4.50 AI vs $900-$1,800 human.
- →Manual self-transcription cost: 4-6 hours of researcher time per audio hour. For a 20-hour study, that's 80-120 hours of work — typically 2-3 weeks of full-time labor.
For full transcription cost analysis across 14 tools with an interactive calculator, see how much does transcription cost?.
Researcher audience segments
Qualitative research isn't one workflow. Five distinct researcher segments use transcription with different priorities, tools, and budgets.
Academic qualitative researchers
PhDs, postdocs, faculty doing IRB-approved interview studies. Recording conditions vary from quiet interview rooms to home video calls. Accuracy is critical because transcripts often become quoted material in publications.
Typical tools: NVivo, MAXQDA, ATLAS.ti — institutional licenses common
Budget: Modest — grant-funded or department-funded; usually $50-$500 per study for transcription
Workflow: AI transcription + manual verification + qualitative coding in CAQDAS
UX researchers
Industry researchers running usability studies, customer interviews, ethnographic fieldwork. Often pressed for speed (one or two days from interview to insight). Strict verbatim usually not required; intelligent verbatim is typical.
Typical tools: Dovetail, Notably, Condens — research-specific platforms with AI summarization. Otter and Rev for raw transcription
Budget: Company-funded; per-study budgets often $100-$1,000 for transcription
Workflow: AI transcription → AI tagging in research platform → highlights and insights for stakeholders
Market researchers (focus groups)
Agency and in-house researchers running focus groups (4-12 participants) and in-depth interviews (IDIs). Crosstalk-heavy audio is a recurring challenge. Vendor-driven workflows are common.
Typical tools: Verbit, Rev enterprise tiers, sometimes Trint
Budget: Project-funded; transcription often 5-15% of total research budget
Workflow: Studio recording → vendor transcription (often human or hybrid) → thematic synthesis
Ethnographers and fieldwork researchers
Anthropologists, sociologists, organizational researchers doing extended fieldwork. Recordings often noisy (public spaces, vehicles, outdoor environments). Selective transcription is the norm — full audio archived, key exchanges transcribed verbatim.
Typical tools: Atlas.ti, NVivo for the transcribed portions; field notes in Evernote, Obsidian, or paper
Budget: Variable; full transcription often impractical given hours of recording
Workflow: Archival recording → field notes → selective AI transcription of analytically important segments
Oral historians
Researchers creating archival records for permanent collection. Long-form interviews (multiple hours, sometimes multiple sessions). Often involves non-standard accents, elderly speakers, and historical-specific vocabulary. Strict verbatim is the norm per Oral History Association (OHA) standards.
Typical tools: Specialized oral history platforms; OHMS (Oral History Metadata Synchronizer); sometimes manual transcription
Budget: Often grant-funded; archival quality may require human transcription only
Workflow: Audio recording → human transcription (or AI + heavy manual review) → archival deposit with metadata
Tool comparison for research transcription
Six transcription options mapped to research-specific criteria: IRB-friendliness, BAA availability for HIPAA research, CAQDAS-compatible export, and per-audio-hour cost.
| Tool | Best for research use | IRB-friendly | BAA available | CAQDAS export | Per audio hour |
|---|---|---|---|---|---|
| VexaScribe | Academic + UX research, multi-language, intelligent verbatim | Yes (DPA available) | No (not HIPAA-certified) | DOCX/SRT/JSON/TXT | $0.20-$0.60 |
| Rev (AI) | Mixed research workflows, vendor familiarity | Yes | Yes (Enterprise) | DOCX/TXT | ~$15 |
| Rev (Human) | Verbatim accuracy, oral history, PHI research with BAA | Yes | Yes (Enterprise) | DOCX/TXT | ~$119 |
| TranscribeMe | Budget-conscious human transcription | Limited | Limited (Enterprise) | DOCX/TXT | $47-$48 (human) |
| Verbit | Enterprise research, broadcast-grade accuracy | Yes | Yes | Multiple formats | Custom |
| Self-hosted Whisper | Privacy-critical research, technical teams, high volume | Yes (no third-party) | n/a (you control data) | Any (custom export) | $0 |
Decision rule. For most academic and UX research with cleanable populations and standard methodology, VexaScribe or comparable AI is genuinely sufficient — $3-$18 for a PhD thesis worth of transcription with manual verification. For PHI-affected research, oral history, or research on populations significantly affected by ASR bias, human transcription via Rev or Verbit is appropriate. Self-hosted Whisper fits technical research teams with privacy-critical content (e.g., research with vulnerable populations, security research, sensitive policy work).
FAQ
Frequently Asked Questions
What's the best transcription method for qualitative research?
Depends on your analytical framework. For thematic analysis (Braun & Clarke), grounded theory, and most narrative inquiry, intelligent verbatim (removing filler words but preserving all meaning-bearing speech) is standard. For conversation analysis (CA) and discourse analysis, Jeffersonian notation with full pauses, overlaps, and prosody markers is required. For interpretative phenomenological analysis (IPA), verbatim with pauses and emotional markers is typical because meaning-making in delivery matters. For oral history, strict verbatim per Oral History Association (OHA) standards is the norm. AI transcription produces a clean transcript that researchers can use directly for thematic and grounded theory work; for CA and discourse analysis, AI output serves as the starting point for manual Jefferson notation.
Can I use AI transcription for IRB-approved research?
Yes, with proper disclosure and consent. Your IRB protocol and informed consent form must name the third-party transcription vendor (or describe the vendor category if vendors may change), disclose the data flow (where audio is processed, where transcripts are stored, retention timeline), and confirm the vendor has appropriate confidentiality terms. Cloud transcription violates protocols that promised 'no third-party sharing' without amendment. If your research involves Protected Health Information (PHI), you need a Business Associate Agreement (BAA) with the vendor — VexaScribe is not currently HIPAA-certified, so PHI research should use HIPAA-compliant alternatives. For EU researchers under GDPR, you need an Article 28 Data Processing Agreement (DPA) and ideally EU data residency. Most IRBs approve AI transcription with these safeguards in place.
Is AI transcription accurate enough for academic publication?
Yes, with verification. AI transcription (Whisper Large-v3 and equivalents) achieves 92-97% word accuracy on clean recordings of native English speakers. However, Koenecke et al. (Proceedings of the National Academy of Sciences, 2020) documented significant racial disparities — average WER of 35% for Black speakers vs 19% for white speakers across major commercial ASR systems. For research on populations affected by ASR bias (including Black speakers, AAVE, heavily accented English, non-native English, regional dialects), accuracy gaps are substantial. Best practice: verify every transcript against the original audio before coding, document accuracy limitations in your methods section, and consider human transcription or verification for populations where AI bias is documented.
How do I cite AI transcription in my methods section?
APA, Chicago, and MLA have no transcription-specific citation entry as of 2026. Current practice (per the APA Style Blog) treats AI tools like software. Example APA 7 reference: 'OpenAI. (2024). Whisper (large-v3) [Speech recognition model]. https://openai.com/whisper'. Sample methods-section disclosure: 'Audio recordings were transcribed using VexaScribe (Whisper Large-v3) and verified verbatim against original recordings by the first author. Transcription accuracy was estimated at approximately 95% on clean audio; all transcripts were corrected before coding. VexaScribe does not train models on user audio per their stated policy.' Field-wide citation standards for AI-assisted transcription are still emerging — expect this guidance to evolve.
Will my transcripts import into NVivo, MAXQDA, or ATLAS.ti?
Yes — DOCX with consistent speaker labels imports cleanly into all major CAQDAS tools. NVivo 14 (Lumivero) accepts DOCX, RTF, TXT, PDF; for timestamp sync, CSV/TSV formatted transcripts work. MAXQDA (VERBI) has the strongest auto-sync — it imports SRT and VTT natively with timestamps preserved. ATLAS.ti accepts DOCX, RTF, TXT, PDF and supports timestamped transcripts via SRT/VTT or RTF with timecodes. Dedoose imports DOCX, TXT, and spreadsheets. Quirkos imports DOCX, ODT, TXT, RTF, PDF, XLSX. Taguette (open-source) accepts PDF, DOCX, TXT, HTML, EPUB. VexaScribe exports DOCX, TXT, JSON, and SRT from every transcription — DOCX is the most portable for CAQDAS workflows; SRT/VTT enables timestamp sync in MAXQDA and ATLAS.ti specifically. The REFI-QDA standard allows project-level interchange between major CAQDAS tools.
Does GDPR allow AI transcription of EU participant interviews?
Yes, with proper safeguards. EU researchers transcribing participant audio need: (1) a lawful basis for processing — typically informed consent for research purposes, (2) an Article 28 Data Processing Agreement (DPA) with the transcription vendor, (3) ideally EU data residency for the audio and transcripts, (4) documented cross-border transfer safeguards if data leaves the EU, (5) participant right to withdraw and request data deletion, (6) retention and deletion policy aligned with your ethics approval. VexaScribe stores data in AWS eu-west-2 (London) with TLS 1.2+ encryption in transit and encrypted at rest. For GDPR-strict research, also verify your institutional ethics committee's specific requirements — some EU universities require additional vendor assessments beyond standard DPAs.
How does AI transcription bias affect my research?
Significantly, for research on populations affected by ASR demographic bias. Koenecke et al. (PNAS 2020) found average word error rate of 35% for Black speakers compared to 19% for white speakers across major commercial ASR systems including Amazon, Apple, Google, IBM, and Microsoft. The disparity is attributed to acoustic models trained on insufficient African American Vernacular English (AAVE) data. Graham & Roll (JASA Express Letters 2024) found similar accent-based disparities in Whisper across American vs British/Australian English, native vs non-native speakers, and read vs conversational speech. Implications: document the limitation in your methods section, verify every transcript manually, consider human transcription for affected populations, and report observed accuracy by participant demographic if accuracy is central to your findings.
What's the difference between verbatim and intelligent verbatim?
Verbatim (also 'true verbatim' or 'strict verbatim') captures every utterance — filler words ('um', 'uh'), stutters, false starts, repetitions, non-verbal sounds, partial words. It preserves the full delivery of speech and is required for conversation analysis, IPA, and oral history. Intelligent verbatim (also 'clean verbatim') removes fillers, stutters, and false starts but preserves all meaning-bearing speech and full sentences. It's the standard for thematic analysis, grounded theory, and most applied qualitative research where content matters more than delivery. Clean read (also 'edited transcript') goes further — smoothing grammar and producing polished prose. Use it when summarizing content rather than analyzing speech. Naturalized vs denaturalized terms are used inconsistently across the literature (Oliver, Serovich & Mason 2005 and Bucholtz 2000 use them in reverse senses) — specify your framework when writing methods.
Do I need a BAA if my research involves health information?
Yes, if your research involves Protected Health Information (PHI) and you're operating under HIPAA. The transcription vendor becomes a Business Associate and a Business Associate Agreement (BAA) is required before any PHI-containing audio leaves your secure environment. VexaScribe is not currently HIPAA-certified and does not sign BAAs — HIPAA-affected research should use a HIPAA-compliant vendor (Rev offers BAAs on Enterprise tier, Verbit offers BAAs, some institutional transcription services). For research involving health information that does not meet the HIPAA PHI threshold (e.g., wellness research without identifiable health data, public health research with anonymized recordings), the BAA requirement may not apply — consult your IRB and institutional compliance office for the specific determination on your project.
Is human transcription still necessary for some research?
Yes, for specific scenarios. Use human transcription when: (1) your research focuses on populations affected by AI bias (Black speakers, AAVE, heavily accented English, regional dialects) and accuracy is central to findings, (2) you're doing conversation analysis with Jeffersonian notation that requires expert transcribers familiar with CA conventions, (3) your audio has heavy crosstalk or overlap (multi-person focus groups, family interviews) where AI diarization fails, (4) you need court-grade or publication-grade verbatim for legal, medical, or archival oral history work, (5) your IRB protocol or institutional policy requires human verification of all transcripts. The hybrid approach works well for many projects: AI for initial transcription of all recordings, human verification for critical interviews you'll quote directly. This typically costs 5-15% of pure human transcription at 95%+ accuracy on quoted passages.
Methodology & disclosure
Verification window. Pricing for VexaScribe, Rev, TranscribeMe, GoTranscript, Verbit verified against each vendor's pricing page between May 28 and June 2, 2026. CAQDAS import format documentation verified against current vendor documentation (NVivo 14 / Lumivero, MAXQDA / VERBI, ATLAS.ti, Dedoose, Quirkos, Taguette) in the same window.
Peer-reviewed sources. Koenecke, A., Nam, A., Lake, E. et al. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684-7689. Graham, C., & Roll, N. (2024). Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Letters, 4(2), 025206. Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13-31). John Benjamins.
Conflict of interest. VexaScribe is our product. We've disclosed pricing for every comparable tool and honestly identified scenarios where each competitor wins — Rev (Human) for verbatim oral history and PHI research with BAA, Verbit for enterprise broadcast-grade work, self-hosted Whisper for privacy-critical research teams, TranscribeMe for budget human transcription.
Honest limitations disclosure. (1) VexaScribe is not currently HIPAA-certified — researchers studying PHI should use HIPAA-compliant vendors with available BAA. (2) VexaScribe uses Whisper Large-v3, which is subject to the demographic accuracy disparities documented by Koenecke et al. (2020) and Graham & Roll (2024). (3) AI transcription does not produce Jeffersonian notation directly — CA researchers will need to add notation manually. (4) Field-wide citation conventions for AI-assisted transcription are still emerging.
No affiliate links. VexaScribe does not earn commissions from any of the alternative tools mentioned on this page. Recommendations reflect honest editorial assessment based on documented features, verified pricing, and peer-reviewed methodology literature.
What changed since last update? First publication, June 2, 2026. Future updates will be reflected in the "Verified" badge and datePublished/dateModified schema fields.
Editorial standards. Full disclosure policy at editorial standards.