Transcription for Qualitative Research in 2026: Honest Guide

Key takeaways

•Transcription method matters — verbatim for conversation analysis, intelligent verbatim for thematic analysis and grounded theory, clean read for narrative summaries.
•Jeffersonian notation is the standard for conversation analysis (CA); AI doesn't produce it directly but can be a starting point for manual notation, saving 60-70% of pre-notation work.
•AI has documented bias — Koenecke et al. (PNAS 2020) found 35% WER for Black speakers vs 19% for white speakers across major commercial ASR systems.
•IRB/GDPR consent must explicitly cover third-party AI transcription services and data flow; HIPAA research requires a BAA.
•CAQDAS-friendly exports — DOCX with consistent speaker labels works in NVivo, MAXQDA, ATLAS.ti, Dedoose, Quirkos; SRT/VTT enables timestamp sync in MAXQDA and ATLAS.ti.
•Manual transcription costs 4-6 hours per audio hour; AI runs at 4-10× real-time at ~$0.20-$0.60/hour.
•Document AI use in methods sections — disclose tool, verification process, accuracy limitations.
•Verify every transcript against audio before coding — AI hallucination is a documented risk.

Transcription method by analytical framework

The right transcription method depends on what you'll do with the transcript. Conversation analysis needs full prosodic detail; thematic analysis works from cleaned content. Forcing one method on every project either wastes effort (over-transcribing for thematic work) or loses analytical purchase (under-transcribing for CA).

Analytical framework	Recommended method	Rationale
Conversation analysis (CA)	Full Jeffersonian notation	Analyzes turn-taking, overlap, pause timing, prosody
Discourse analysis	Modified Jefferson or near-verbatim	Linguistic features and interactional detail matter
Interpretative phenomenological analysis (IPA)	Verbatim with pauses, emotional markers	Meaning-making in delivery is central
Thematic analysis (Braun & Clarke)	Intelligent verbatim	Content > delivery for coding themes
Grounded theory	Intelligent verbatim	Code from cleaned content; theoretical sampling iterative
Phenomenology	Verbatim or intelligent verbatim	Varies by tradition (descriptive vs hermeneutic)
Narrative inquiry	Intelligent verbatim	Story structure preserved, smoothed delivery
Ethnography	Mixed — selective verbatim + field notes	Context-dependent; field notes carry analytical weight
Oral history	Strict verbatim, OHA standards	Archival quality required for permanent record

Methodology literature note: Braun & Clarke (thematic analysis) emphasize matching transcription depth to analytical aims rather than over-transcribing as a default. Gibbs (2007) and Poland (1995) frame similar tiered approaches but don't share a single canonical numbered framework — "Levels 1-4" terminology varies across authors.

Verbatim vs intelligent verbatim vs clean read

Three commonly-used transcription styles in qualitative research. Each captures different levels of speech detail and fits different analytical needs.

Style	What it captures	Typical uses	AI fit
Strict verbatim	Every utterance, fillers, stutters, false starts, repetitions, non-verbal sounds, partial words	Conversation analysis, IPA, oral history, court transcripts	AI base output close; manual cleanup minimal
Intelligent verbatim	All meaning-bearing speech; filler words and stutters removed	Thematic analysis, grounded theory, narrative inquiry, most applied qualitative research	Best AI fit — minimal manual editing required
Clean read / edited	Smoothed grammar, polished prose, content-only	Executive summaries, public-facing research outputs, content summaries	Use AI summary feature instead of transcript

Naturalized vs denaturalized: these terms appear in qualitative methodology literature with opposite definitions depending on the source. Oliver, Serovich & Mason (2005) describe naturalized transcription as capturing every utterance in detail (similar to strict verbatim) and denaturalized as removing idiosyncratic elements. Bucholtz (2000) uses the terms in reverse. When citing or writing methods, specify which framework you follow rather than assuming readers share your definition.

Jeffersonian notation for conversation analysis

Developed by sociologist Gail Jefferson, the Jeffersonian transcription system is the standard for conversation analysis (CA) and many discourse analysis projects. It captures interactional details — overlap, pause timing, prosody, breathing — that are lost in standard transcription but central to CA's analytical concerns.

Key Jeffersonian symbols

Symbol	Meaning
[ ]	Overlapping speech (square brackets mark where overlap starts/ends)
=	Latching (one speaker continues immediately, no gap)
(0.5)	Timed pause in seconds and tenths
(.)	Micro-pause shorter than 0.2 seconds
:	Sound elongation (more colons = longer)
underline	Stress or emphasis on syllable
°word°	Quiet speech (lowered volume)
>word<	Speeded up speech
<word>	Slowed speech
hh	Audible exhalation (more h's = longer)
.hh	Audible inhalation
↑ ↓	Pitch shift up or down
(( ))	Transcriber's non-verbal description (e.g., ((laughs)))
(word)	Uncertain hearing; transcriber's best guess

Sample Jeffersonian transcript

A: I [really- ]
B:    [yeah no I get it]=
A: =yeah (0.5) it's just (.) hh ((sighs)) °hard°
B: ↑right (.) so what do you do
A: I just (0.3) >try to figure it out< on my own

AI and Jeffersonian notation. AI transcription does not produce Jeffersonian notation directly. Even Whisper Large-v3 outputs a clean text transcript without interactional markup. What AI can do for CA researchers: produce the base transcript that you then mark up manually with Jefferson symbols. This saves roughly 60-70% of pre-notation work — typing the words is mechanical; marking the prosody is the analytical labor. Several CA researchers report using AI for the first pass then spending 2-3 hours per audio hour on Jefferson notation, vs 6-8 hours for fully manual transcription plus notation.

Reference: Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13-31). John Benjamins.

AI transcription accuracy and demographic bias

Documented bias in commercial automated speech recognition (ASR) systems is a critical methodological consideration for qualitative researchers — particularly for research with populations affected by acoustic model gaps.

Koenecke et al. (2020) — racial disparities in commercial ASR

Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684-7689.

Study design: Tested five major commercial ASR systems (Amazon, Apple, Google, IBM, Microsoft) on 19.8 hours of audio from 42 white and 73 Black speakers.

Key findings:

• Average word error rate (WER): ~35% for Black speakers vs ~19% for white speakers
• Worst-case WER: ~45% vs ~15%
• Disparity attributed to acoustic models trained on insufficient African American Vernacular English (AAVE) data

Graham & Roll (2024) — Whisper accent bias

Graham, C., & Roll, N. (2024). Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Letters, 4(2), 025206.

Found systematic accuracy differences in OpenAI Whisper across: American > British/Australian English; native > non-native speakers; read > conversational speech. Disparities are smaller than the Koenecke et al. findings on older commercial ASR but remain meaningful for research conclusions on accented populations.

Implications for qualitative research:

→Document AI bias in methods sections. Cite Koenecke et al. (2020) and discuss whether your participant population is affected.
→Verify every transcript against audio. Don't trust AI blindly — manual verification is the documented standard.
→Consider human transcription for affected populations. For research on Black speakers, AAVE users, heavily accented English, or non-native speakers where accuracy is central to findings.
→AI hallucination is a documented risk. AI can invent content during silence or unclear audio. Manual review is essential.
→Report observed accuracy when accuracy is central. If your findings depend on what participants said precisely, report your verification process and observed error rates by participant demographic.

For broader accuracy methodology including LibriSpeech and FLEURS benchmarks, see how accurate is Whisper?.

IRB, GDPR, and consent for AI transcription

Third-party AI transcription introduces specific compliance questions that must be addressed in your IRB protocol, informed consent form, and (for EU researchers) data processing documentation.

US IRB considerations

→Consent form disclosure. Name the third-party vendor (or describe the vendor category if vendor may change). Cloud transcription violates protocols that promised "no third-party sharing" without amendment.
→Data flow disclosure. Where audio is processed, where transcripts are stored, retention timeline, deletion process.
→Confidentiality agreement. Documented commitment that vendor will not use participant data for purposes beyond transcription.
→Protocol amendments. Switching transcription vendors mid-study typically requires IRB amendment.

HIPAA considerations (US health research)

If your audio contains Protected Health Information (PHI), the transcription vendor becomes a Business Associate and a Business Associate Agreement (BAA) is required before any PHI-containing audio leaves your environment.

Honest disclosure: VexaScribe is not currently HIPAA-certified and does not sign BAAs. PHI-affected research should use HIPAA-compliant vendors — Rev offers BAAs on Enterprise tier, Verbit offers BAAs, some institutional transcription services. For research involving health information that does not meet the PHI threshold (anonymized recordings, public health research without identifiable data), the BAA requirement may not apply — consult your IRB and institutional compliance office.

GDPR considerations (EU research)

→Lawful basis: typically informed consent for research purposes (Article 6(1)(a)) or legitimate interest (Article 6(1)(f)) with appropriate safeguards.
→Article 28 Data Processing Agreement (DPA): required with any third-party processor.
→EU data residency: preferred when possible. VexaScribe stores data in AWS eu-west-2 (London).
→Cross-border transfer safeguards: Standard Contractual Clauses (SCCs) or adequacy decisions for data leaving the EEA.
→Right to withdraw: participant can request data deletion at any time.
→Retention policy: aligned with your institutional ethics approval, typically 3-7 years for de-identified research transcripts.

Sample IRB protocol language

"Audio recordings will be transcribed using VexaScribe, an AI-based transcription service that uses the Whisper Large-v3 model. Audio files will be uploaded over TLS 1.2+ encrypted connection and stored encrypted at rest in AWS eu-west-2. Recordings will be deleted from the service within [N] days of transcription. Transcripts will be manually verified by the research team against original audio before analysis. VexaScribe does not train models on user audio per their stated policy. A Data Processing Agreement will be in place with VexaScribe prior to processing participant data."

CAQDAS integration: NVivo, MAXQDA, ATLAS.ti, Dedoose, Quirkos, Taguette

Computer-Assisted Qualitative Data Analysis Software (CAQDAS) is where your transcripts will live during analysis. Import format compatibility matters — and a few CAQDAS tools handle timestamped transcripts better than others.

Tool	Owner	Import formats	Timestamp sync	Speaker labels
NVivo 14	Lumivero	DOCX, RTF, TXT, PDF	Via CSV/TSV format	Yes (consistent label format)
MAXQDA	VERBI	DOCX, RTF, TXT, PDF, SRT, VTT	Native auto-sync (SRT/VTT)	Yes
ATLAS.ti	Scientific Software	DOCX, RTF, TXT, PDF, SRT, VTT	Yes (SRT/VTT or RTF with timecodes)	Yes
Dedoose	SocioCultural Research	DOCX, TXT, spreadsheets	Limited	Yes (basic)
Quirkos	Quirkos	DOCX, ODT, TXT, RTF, PDF, XLSX	No	Yes
Taguette	OSS (Taguette project)	PDF, DOCX, TXT, HTML, EPUB, MOBI, ODT, RTF	No	Yes

Most portable format: DOCX with consistent speaker labels (e.g., Speaker 1: on a new line). Speaker labels survive import in all five major tools when formatted consistently.

For timestamp sync: SRT or VTT — MAXQDA and ATLAS.ti both auto-sync these formats for clip-based coding. NVivo can use CSV/TSV with timecodes.

REFI-QDA standard: The Rotterdam Exchange Format Initiative for Qualitative Data Analysis enables project-level interchange between NVivo, MAXQDA, ATLAS.ti, Quirkos, Dedoose, and Taguette. Useful when collaborating across institutions with different CAQDAS preferences.

VexaScribe export: All four formats (TXT, DOCX, JSON, SRT) from a single transcription — no re-processing required. DOCX with speaker labels is the most portable for general CAQDAS workflows; SRT for MAXQDA/ATLAS.ti timestamp sync.

AI Chat as a coding aid — what it does and doesn't replace

AI Chat is a per-transcript Q&A interface that lets researchers ask thematic questions of an interview transcript and get answers with citations linked to the exact moment in the audio. It's genuinely useful for first-pass theme spotting and quote-finding — but it's not a replacement for systematic qualitative coding, and how you frame it in your methods chapter matters.

Where it helps

● First-pass theme spotting. “What concerns did the informant raise about workplace dynamics?” → ranked list with timestamps, useful as a starting point for inductive coding in NVivo / ATLAS.ti / MAXQDA.
● Quote retrieval for the findings chapter. “Find quotes that illustrate the tension between autonomy and structure” → quotes with timestamps ready to cite, with the link back to source audio for verification.
● Member-checking preparation. Pull all the participant's statements on a specific topic before re-contacting them for clarification.
● Cross-language interviews. Ask in English about a transcript in Indonesian, Turkish, Spanish — get answers in your working language with original-language quotes preserved.

Where it does NOT replace systematic coding

● Codebook development. Iterative open / axial / selective coding (Strauss & Corbin) requires researcher judgment that AI doesn't replicate.
● Inter-rater reliability. AI Chat doesn't replace having a second coder; methodological transparency requires documented coder agreement.
● Saturation tracking. Theoretical saturation is a researcher decision based on full immersion in the data — AI summaries can short-circuit that.
● Cross-transcript analysis. AI Chat is scoped to ONE transcript at a time. You can't ask “what theme appears across all 12 interviews?” — that's still CAQDAS territory.

How to disclose it in your methods chapter

If you use AI Chat for first-pass theme spotting or quote retrieval, disclose it — same as you would disclose using NVivo's auto-coding features. Suggested phrasing:

“Interviews were transcribed using VexaScribe (Whisper Large-v3). Initial theme identification was supported by VexaScribe's AI Chat (an OpenAI-based question-answering interface scoped to individual transcripts). All AI-surfaced quotes were verified against source audio before inclusion in the analysis. Final coding was performed by the author in [NVivo / ATLAS.ti / MAXQDA] following [Braun & Clarke / Charmaz / chosen approach].”

IRB and consent considerations

If your IRB protocol limits AI processing of participant data, AI Chat is subject to the same constraints as transcription itself — same vendor (us), same data handling (no training on user data, encrypted in transit and at rest, 90-day chat conversation retention with user-controlled deletion). Disclose in your consent forms if your institution requires AI-tool disclosure. For sensitive populations, anonymize identifying details in the transcript itself before generating chat queries, since the AI sees whatever the transcript contains.

Citation conventions for AI-transcribed sources

As of 2026, APA, Chicago, and MLA have no transcription-specific citation entry. Current practice — per the APA Style Blog and most institutional library guides — treats AI tools like software references. Field-wide standards are still emerging.

APA 7 style reference (software treatment)

OpenAI. (2024). Whisper (large-v3) [Speech recognition model]. https://openai.com/whisper

In-text citation: (OpenAI, 2024). For transcription via a service that uses Whisper, also cite the service: Transcription was performed using VexaScribe (Whisper Large-v3; OpenAI, 2024).

Sample methods-section disclosure

"Audio recordings of [N] semi-structured interviews were transcribed using VexaScribe (Whisper Large-v3) and verified verbatim against original recordings by the first author. Transcription accuracy was estimated at approximately 95% on clean audio; all transcripts were manually corrected before coding. We acknowledge documented racial disparities in commercial ASR accuracy (Koenecke et al., 2020) and conducted enhanced verification on interviews with [specific population] participants. VexaScribe's stated privacy policy is that the service does not train models on user audio. A Data Processing Agreement was in place prior to transcription."

Honest note: Citation conventions for AI-assisted transcription are field-emerging. Expect this guidance to evolve as journals develop specific requirements. Consult your target journal's author guidelines and your discipline's methodology literature for current best practice.

Pricing: AI vs human vs in-house

Verified 2026 rates for major research-transcription services. Per-audio-hour pricing varies dramatically — AI services are 100-500× cheaper than human transcription, while human transcription delivers 99%+ accuracy and BAA availability for HIPAA-affected research.

Service	Type	Per audio hour	Per minute	Notes
VexaScribe	AI	$0.20-$0.60	~$0.005-$0.01	Whisper Large-v3, 99 languages, no model training on user audio
Rev (AI)	AI	~$15	$0.25/min	Pay-as-you-go, no subscription
Rev (Human)	Human	~$119	$1.99/min	12-48hr turnaround, 99%+ accuracy
TranscribeMe (AI)	AI	~$4	$0.07/min	Budget AI tier
TranscribeMe (Human)	Human	~$47-$48	$0.79/min	Budget human tier; longer turnaround
GoTranscript (Human)	Human	~$59	$0.99/min	Mid-market human transcription
Verbit	Mixed	Enterprise custom	Custom	Enterprise-only; broadcast and legal-grade
Self-hosted Whisper	AI	$0	$0	Free forever with GPU + Python skills

Cost math by project scale

→PhD thesis (15-30 interviews × 60 min each): 15-30 hours audio = $3-$18 AI vs $1,800-$3,600 human transcription.
→Multi-year longitudinal study (100+ interviews): $20-$60 AI vs $12,000-$24,000 human.
→UX research sprint (10 customer interviews × 45 min): 7.5 hours audio = $1.50-$4.50 AI vs $900-$1,800 human.
→Manual self-transcription cost: 4-6 hours of researcher time per audio hour. For a 20-hour study, that's 80-120 hours of work — typically 2-3 weeks of full-time labor.

For full transcription cost analysis across 14 tools with an interactive calculator, see how much does transcription cost?.

Researcher audience segments

Qualitative research isn't one workflow. Five distinct researcher segments use transcription with different priorities, tools, and budgets.

Academic qualitative researchers

PhDs, postdocs, faculty doing IRB-approved interview studies. Recording conditions vary from quiet interview rooms to home video calls. Accuracy is critical because transcripts often become quoted material in publications.

Typical tools: NVivo, MAXQDA, ATLAS.ti — institutional licenses common

Budget: Modest — grant-funded or department-funded; usually $50-$500 per study for transcription

Workflow: AI transcription + manual verification + qualitative coding in CAQDAS

UX researchers

Industry researchers running usability studies, customer interviews, ethnographic fieldwork. Often pressed for speed (one or two days from interview to insight). Strict verbatim usually not required; intelligent verbatim is typical.

Typical tools: Dovetail, Notably, Condens — research-specific platforms with AI summarization. Otter and Rev for raw transcription

Budget: Company-funded; per-study budgets often $100-$1,000 for transcription

Workflow: AI transcription → AI tagging in research platform → highlights and insights for stakeholders

Market researchers (focus groups)

Agency and in-house researchers running focus groups (4-12 participants) and in-depth interviews (IDIs). Crosstalk-heavy audio is a recurring challenge. Vendor-driven workflows are common.

Typical tools: Verbit, Rev enterprise tiers, sometimes Trint

Budget: Project-funded; transcription often 5-15% of total research budget

Workflow: Studio recording → vendor transcription (often human or hybrid) → thematic synthesis

Ethnographers and fieldwork researchers

Anthropologists, sociologists, organizational researchers doing extended fieldwork. Recordings often noisy (public spaces, vehicles, outdoor environments). Selective transcription is the norm — full audio archived, key exchanges transcribed verbatim.

Typical tools: Atlas.ti, NVivo for the transcribed portions; field notes in Evernote, Obsidian, or paper

Budget: Variable; full transcription often impractical given hours of recording

Workflow: Archival recording → field notes → selective AI transcription of analytically important segments

Oral historians

Researchers creating archival records for permanent collection. Long-form interviews (multiple hours, sometimes multiple sessions). Often involves non-standard accents, elderly speakers, and historical-specific vocabulary. Strict verbatim is the norm per Oral History Association (OHA) standards.

Typical tools: Specialized oral history platforms; OHMS (Oral History Metadata Synchronizer); sometimes manual transcription

Budget: Often grant-funded; archival quality may require human transcription only

Workflow: Audio recording → human transcription (or AI + heavy manual review) → archival deposit with metadata

Tool comparison for research transcription

Six transcription options mapped to research-specific criteria: IRB-friendliness, BAA availability for HIPAA research, CAQDAS-compatible export, and per-audio-hour cost.

Tool	Best for research use	IRB-friendly	BAA available	CAQDAS export	Per audio hour
VexaScribe	Academic + UX research, multi-language, intelligent verbatim	Yes (DPA available)	No (not HIPAA-certified)	DOCX/SRT/JSON/TXT	$0.20-$0.60
Rev (AI)	Mixed research workflows, vendor familiarity	Yes	Yes (Enterprise)	DOCX/TXT	~$15
Rev (Human)	Verbatim accuracy, oral history, PHI research with BAA	Yes	Yes (Enterprise)	DOCX/TXT	~$119
TranscribeMe	Budget-conscious human transcription	Limited	Limited (Enterprise)	DOCX/TXT	$47-$48 (human)
Verbit	Enterprise research, broadcast-grade accuracy	Yes	Yes	Multiple formats	Custom
Self-hosted Whisper	Privacy-critical research, technical teams, high volume	Yes (no third-party)	n/a (you control data)	Any (custom export)	$0

Decision rule. For most academic and UX research with cleanable populations and standard methodology, VexaScribe or comparable AI is genuinely sufficient — $3-$18 for a PhD thesis worth of transcription with manual verification. For PHI-affected research, oral history, or research on populations significantly affected by ASR bias, human transcription via Rev or Verbit is appropriate. Self-hosted Whisper fits technical research teams with privacy-critical content (e.g., research with vulnerable populations, security research, sensitive policy work).

FAQ

Frequently Asked Questions

What's the best transcription method for qualitative research?

Depends on your analytical framework. For thematic analysis (Braun & Clarke), grounded theory, and most narrative inquiry, intelligent verbatim (removing filler words but preserving all meaning-bearing speech) is standard. For conversation analysis (CA) and discourse analysis, Jeffersonian notation with full pauses, overlaps, and prosody markers is required. For interpretative phenomenological analysis (IPA), verbatim with pauses and emotional markers is typical because meaning-making in delivery matters. For oral history, strict verbatim per Oral History Association (OHA) standards is the norm. AI transcription produces a clean transcript that researchers can use directly for thematic and grounded theory work; for CA and discourse analysis, AI output serves as the starting point for manual Jefferson notation.

Can I use AI transcription for IRB-approved research?

Yes, with proper disclosure and consent. Your IRB protocol and informed consent form must name the third-party transcription vendor (or describe the vendor category if vendors may change), disclose the data flow (where audio is processed, where transcripts are stored, retention timeline), and confirm the vendor has appropriate confidentiality terms. Cloud transcription violates protocols that promised 'no third-party sharing' without amendment. If your research involves Protected Health Information (PHI), you need a Business Associate Agreement (BAA) with the vendor — VexaScribe is not currently HIPAA-certified, so PHI research should use HIPAA-compliant alternatives. For EU researchers under GDPR, you need an Article 28 Data Processing Agreement (DPA) and ideally EU data residency. Most IRBs approve AI transcription with these safeguards in place.

Is AI transcription accurate enough for academic publication?

Yes, with verification. AI transcription (Whisper Large-v3 and equivalents) achieves 92-97% word accuracy on clean recordings of native English speakers. However, Koenecke et al. (Proceedings of the National Academy of Sciences, 2020) documented significant racial disparities — average WER of 35% for Black speakers vs 19% for white speakers across major commercial ASR systems. For research on populations affected by ASR bias (including Black speakers, AAVE, heavily accented English, non-native English, regional dialects), accuracy gaps are substantial. Best practice: verify every transcript against the original audio before coding, document accuracy limitations in your methods section, and consider human transcription or verification for populations where AI bias is documented.

How do I cite AI transcription in my methods section?

APA, Chicago, and MLA have no transcription-specific citation entry as of 2026. Current practice (per the APA Style Blog) treats AI tools like software. Example APA 7 reference: 'OpenAI. (2024). Whisper (large-v3) [Speech recognition model]. https://openai.com/whisper'. Sample methods-section disclosure: 'Audio recordings were transcribed using VexaScribe (Whisper Large-v3) and verified verbatim against original recordings by the first author. Transcription accuracy was estimated at approximately 95% on clean audio; all transcripts were corrected before coding. VexaScribe does not train models on user audio per their stated policy.' Field-wide citation standards for AI-assisted transcription are still emerging — expect this guidance to evolve.

Will my transcripts import into NVivo, MAXQDA, or ATLAS.ti?

Yes — DOCX with consistent speaker labels imports cleanly into all major CAQDAS tools. NVivo 14 (Lumivero) accepts DOCX, RTF, TXT, PDF; for timestamp sync, CSV/TSV formatted transcripts work. MAXQDA (VERBI) has the strongest auto-sync — it imports SRT and VTT natively with timestamps preserved. ATLAS.ti accepts DOCX, RTF, TXT, PDF and supports timestamped transcripts via SRT/VTT or RTF with timecodes. Dedoose imports DOCX, TXT, and spreadsheets. Quirkos imports DOCX, ODT, TXT, RTF, PDF, XLSX. Taguette (open-source) accepts PDF, DOCX, TXT, HTML, EPUB. VexaScribe exports DOCX, TXT, JSON, and SRT from every transcription — DOCX is the most portable for CAQDAS workflows; SRT/VTT enables timestamp sync in MAXQDA and ATLAS.ti specifically. The REFI-QDA standard allows project-level interchange between major CAQDAS tools.

Does GDPR allow AI transcription of EU participant interviews?

Yes, with proper safeguards. EU researchers transcribing participant audio need: (1) a lawful basis for processing — typically informed consent for research purposes, (2) an Article 28 Data Processing Agreement (DPA) with the transcription vendor, (3) ideally EU data residency for the audio and transcripts, (4) documented cross-border transfer safeguards if data leaves the EU, (5) participant right to withdraw and request data deletion, (6) retention and deletion policy aligned with your ethics approval. VexaScribe stores data in AWS eu-west-2 (London) with TLS 1.2+ encryption in transit and encrypted at rest. For GDPR-strict research, also verify your institutional ethics committee's specific requirements — some EU universities require additional vendor assessments beyond standard DPAs.

How does AI transcription bias affect my research?

Significantly, for research on populations affected by ASR demographic bias. Koenecke et al. (PNAS 2020) found average word error rate of 35% for Black speakers compared to 19% for white speakers across major commercial ASR systems including Amazon, Apple, Google, IBM, and Microsoft. The disparity is attributed to acoustic models trained on insufficient African American Vernacular English (AAVE) data. Graham & Roll (JASA Express Letters 2024) found similar accent-based disparities in Whisper across American vs British/Australian English, native vs non-native speakers, and read vs conversational speech. Implications: document the limitation in your methods section, verify every transcript manually, consider human transcription for affected populations, and report observed accuracy by participant demographic if accuracy is central to your findings.

What's the difference between verbatim and intelligent verbatim?

Verbatim (also 'true verbatim' or 'strict verbatim') captures every utterance — filler words ('um', 'uh'), stutters, false starts, repetitions, non-verbal sounds, partial words. It preserves the full delivery of speech and is required for conversation analysis, IPA, and oral history. Intelligent verbatim (also 'clean verbatim') removes fillers, stutters, and false starts but preserves all meaning-bearing speech and full sentences. It's the standard for thematic analysis, grounded theory, and most applied qualitative research where content matters more than delivery. Clean read (also 'edited transcript') goes further — smoothing grammar and producing polished prose. Use it when summarizing content rather than analyzing speech. Naturalized vs denaturalized terms are used inconsistently across the literature (Oliver, Serovich & Mason 2005 and Bucholtz 2000 use them in reverse senses) — specify your framework when writing methods.

Do I need a BAA if my research involves health information?

Yes, if your research involves Protected Health Information (PHI) and you're operating under HIPAA. The transcription vendor becomes a Business Associate and a Business Associate Agreement (BAA) is required before any PHI-containing audio leaves your secure environment. VexaScribe is not currently HIPAA-certified and does not sign BAAs — HIPAA-affected research should use a HIPAA-compliant vendor (Rev offers BAAs on Enterprise tier, Verbit offers BAAs, some institutional transcription services). For research involving health information that does not meet the HIPAA PHI threshold (e.g., wellness research without identifiable health data, public health research with anonymized recordings), the BAA requirement may not apply — consult your IRB and institutional compliance office for the specific determination on your project.

Is human transcription still necessary for some research?

Yes, for specific scenarios. Use human transcription when: (1) your research focuses on populations affected by AI bias (Black speakers, AAVE, heavily accented English, regional dialects) and accuracy is central to findings, (2) you're doing conversation analysis with Jeffersonian notation that requires expert transcribers familiar with CA conventions, (3) your audio has heavy crosstalk or overlap (multi-person focus groups, family interviews) where AI diarization fails, (4) you need court-grade or publication-grade verbatim for legal, medical, or archival oral history work, (5) your IRB protocol or institutional policy requires human verification of all transcripts. The hybrid approach works well for many projects: AI for initial transcription of all recordings, human verification for critical interviews you'll quote directly. This typically costs 5-15% of pure human transcription at 95%+ accuracy on quoted passages.

Methodology & disclosure

Verification window. Pricing for VexaScribe, Rev, TranscribeMe, GoTranscript, Verbit verified against each vendor's pricing page between May 28 and June 2, 2026. CAQDAS import format documentation verified against current vendor documentation (NVivo 14 / Lumivero, MAXQDA / VERBI, ATLAS.ti, Dedoose, Quirkos, Taguette) in the same window.

Peer-reviewed sources. Koenecke, A., Nam, A., Lake, E. et al. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684-7689. Graham, C., & Roll, N. (2024). Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Letters, 4(2), 025206. Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13-31). John Benjamins.

Conflict of interest. VexaScribe is our product. We've disclosed pricing for every comparable tool and honestly identified scenarios where each competitor wins — Rev (Human) for verbatim oral history and PHI research with BAA, Verbit for enterprise broadcast-grade work, self-hosted Whisper for privacy-critical research teams, TranscribeMe for budget human transcription.

Honest limitations disclosure. (1) VexaScribe is not currently HIPAA-certified — researchers studying PHI should use HIPAA-compliant vendors with available BAA. (2) VexaScribe uses Whisper Large-v3, which is subject to the demographic accuracy disparities documented by Koenecke et al. (2020) and Graham & Roll (2024). (3) AI transcription does not produce Jeffersonian notation directly — CA researchers will need to add notation manually. (4) Field-wide citation conventions for AI-assisted transcription are still emerging.

No affiliate links. VexaScribe does not earn commissions from any of the alternative tools mentioned on this page. Recommendations reflect honest editorial assessment based on documented features, verified pricing, and peer-reviewed methodology literature.

What changed since last update? First publication, June 2, 2026. Future updates will be reflected in the "Verified" badge and datePublished/dateModified schema fields.

Editorial standards. Full disclosure policy at editorial standards.

Transcription for Qualitative Research: Methodology, Tools, and Ethical Considerations