iPhone Voice Memo Transcription — Apple Built-in vs AI vs Human
Apple's built-in Voice Memos transcription is free and on-device — but only for iPhone 12+ on iOS 18, in 10 languages. For everything else — older iPhones, 89 other languages, longer memos, SRT export, speaker labels — third-party AI covers the gap.
By VexaScribe Editorial · Published July 3, 2026 · Verified July 2026
TL;DR — Which method fits your situation
- ■ iPhone 12+ on iOS 18, English (or 9 other supported languages), short memo: Apple's built-in wins. Free, private, on-device, instant. Skip our tool — use theirs.
- ■ iPhone 11 or earlier, or iOS 17 or earlier: Apple's built-in doesn't exist on your device. Use third-party AI (VexaScribe 30-min free trial).
- ■ Voice memo in Arabic, Hindi, Turkish, or ~85 other languages Apple doesn't support: VexaScribe covers 99 languages via Whisper Large-v3.
- ■ Multi-hour memo, need speaker labels, need SRT/VTT for video: Third-party AI is the answer.
- ■ Verbatim for legally-sensitive content: Rev human transcription at $1.50/min. Not us, not Apple.
Which method fits your situation (decision tree)
Read down until you hit the situation that matches yours. The answer is what to use.
If: iPhone 12 or later, iOS 18, memo in a supported language, under ~30 minutes
→ Use Apple's built-in Voice Memos transcription.
Free, on-device (private), instant, and Apple's tool is genuinely good for this case.
If: iPhone 11 or earlier (11 Pro Max, XS, XR, X, 8, SE 2nd gen)
→ Apple's built-in isn't available on your device. Use third-party AI.
Apple gates on-device Voice Memos transcription to iPhone 12+ hardware. No workaround.
If: Memo is in Arabic, Hindi, Turkish, Russian, Vietnamese, Thai, Polish, Dutch, or ~85 other unsupported languages
→ Apple only supports 10 languages. Use third-party AI (Whisper Large-v3).
VexaScribe covers 99 languages including all the ones Apple's built-in doesn't.
If: Memo is 60+ minutes and Apple's transcript doesn't appear or gets stuck
→ Third-party AI handles multi-hour files reliably.
Users report Apple's built-in occasionally fails on very long memos, especially if the app gets killed by iOS in the background.
If: You need SRT for a video / VTT for the web / DOCX for a document / JSON for a pipeline
→ Apple's built-in exports plain text only. Use third-party.
VexaScribe exports 5 formats including SRT, VTT, JSON — required for video subtitles, web accessibility, or developer workflows.
If: Two-person conversation recorded as a voice memo (interview, meeting notes)
→ Apple's built-in doesn't identify separate speakers. Use third-party AI with diarization.
VexaScribe auto-detects Speaker 1, Speaker 2, and lets you rename them (Host / Guest / whoever).
If: You want a structured summary + action items from the memo
→ Apple's Writing Tools (iPhone 15 Pro or 16) OR VexaScribe transcript-to-summary on any device.
Apple Intelligence Writing Tools work on the latest iPhones only. VexaScribe generates 6 types of structured summary from any iPhone.
If: Legally-sensitive verbatim (deposition prep, medical dictation, court exhibit)
→ Human transcription (Rev at $1.50/min) or for depositions, your certified court reporter.
AI accuracy at 92-95% isn't sufficient for legal admissibility; a certified transcript from a licensed professional is what filings and court exhibits require.
All methods compared at a glance
Pricing verified July 3, 2026 against vendor pages. Accuracy figures are typical ranges from published benchmarks and our own experience with each service.
| Method | Who | Languages | Cost | Accuracy | Speaker labels | Turnaround |
|---|---|---|---|---|---|---|
| Apple built-in (iOS 18 Voice Memos) | iPhone 12+ on iOS 18 | 10 | Free | ~92-95% clean audio | No | Nearly real-time |
| VexaScribe (third-party AI) | Any iPhone (via .m4a export) | 99 | $0 (30-min trial) / $2-$20/mo | ~92-95% clean audio | Yes (auto-diarization) | 2-10 min |
| Rev AI | Any iPhone | ~40 | $0.25/min (~$15/hr) | ~90-95% | Yes | 5-24 hours |
| Rev Human | Any iPhone | English focused | $1.50/min (~$90/hr) | ~99% verbatim | Yes | 24-72 hours |
| Whisper self-hosted | Technical users with Mac/PC and GPU | 99 | Free forever | ~92-95% | With pyannote add-on | 5-30 min depending on hardware |
Method 1: Apple's built-in Voice Memos transcription (iOS 18+)
iOS 18 added on-device transcription to Voice Memos. This is the right answer for most everyday users — no signup, no upload, no cost, and your audio never leaves the phone.
Requirements
- ■ iPhone 12 or later
- ■ iOS 18 or later
- ■ Memo recorded in a supported language (see below)
How to use
- Open Voice Memos (Utilities folder)
- Tap the recording
- Tap the transcript icon
- Select text and Copy, or tap Copy Transcript for the full text
Strengths
- ■ Free, no account needed
- ■ On-device processing — audio never sent to Apple or anyone else
- ■ Nearly real-time transcription
- ■ Auto-transcribes older memos on first open
- ■ Apple Writing Tools can summarize (iPhone 15 Pro / 16 with Apple Intelligence)
Limits
- ■ iPhone 12+ only (11 and earlier: not available)
- ■ 10 languages only (see list below)
- ■ Plain text export only — no SRT, VTT, DOCX, or JSON
- ■ No speaker labels — all speech attributed to the memo without diarization
- ■ Very long memos (60+ min) can fail or lag
- ■ iOS 17 or earlier: no built-in transcription at all
Apple supports these 10 languages
Apple does NOT support these (Whisper does)
Bottom line for Method 1: if you have iPhone 12+ on iOS 18 and your memo is in a supported language, Apple's built-in is the right answer. Free, private, instant, done. The rest of this page is for when it isn't.
Method 2: Third-party AI (VexaScribe and alternatives)
When Apple's built-in doesn't fit, third-party AI transcription covers the gap. Same underlying technology class (large speech-to-text models), but with wider language support, more export formats, speaker labeling, and no device restriction.
When to use Method 2 instead of Apple's built-in
- ■ Non-supported language (89 out of 99 Whisper languages)
- ■ iPhone 11 or earlier
- ■ iOS 17 or earlier
- ■ Multi-hour memo where Apple's built-in fails or lags
- ■ Need SRT/VTT/DOCX/JSON export
- ■ Need speaker labels for a two-person interview or meeting
- ■ Need AI summary + action items on any iPhone (not just iPhone 15 Pro / 16)
- ■ Batch: multiple memos processed at once (Apple has no batch UI)
How to use VexaScribe for iPhone voice memos
- On iPhone: open Voice Memos → tap the memo
- Tap the three-dot (...) menu → Share → Save to Files (or share directly to email, Drive, Dropbox)
- The file saves as .m4a (Apple's voice memo format)
- On any device (phone browser or desktop), sign up at VexaScribe — 30 minutes free, no card
- Upload the .m4a file to VexaScribe
- Auto-transcribes with speaker labels + timestamps in ~2-10 minutes depending on length
- Export as TXT / DOCX / SRT / VTT / JSON
- Optional: generate a structured summary or translate to another language
Pricing
- ■ Free 30-minute trial — no credit card, one-time
- ■ Starter — $2/mo for 200 min (~6-10 typical voice memos)
- ■ Basic — $5/mo for 1,000 min
- ■ Pro — $10/mo for 2,500 min
- ■ Studio — $20/mo for 6,000 min
All plans include speaker labels, 99-language support, all 5 export formats, and AI summary. See full pricing.
Honest alternatives to consider
- ■ Otter.ai — 300 min/month free tier, English-focused, weaker multilingual coverage
- ■ Notta — 120 min/month free tier, multilingual, mobile-first
- ■ Rev AI — $0.25/min pay-as-you-go, developer API
- ■ Whisper self-hosted — free forever, requires Mac/PC with GPU and Python skills
- ■ MacWhisper — open-source Mac app that runs Whisper locally with a UI
Method 3: Human transcription (Rev)
For a small slice of use cases, human transcription is the right answer despite the 10× cost. Trained transcriptionists hit ~99% verbatim accuracy, add non-verbal notation, and handle overlapping speech better than any AI.
When human transcription is the right choice
- ■ Legally-sensitive verbatim (deposition prep — but see our deposition transcription page for the certified path)
- ■ Medical dictation where the 5-8% AI error rate is unacceptable
- ■ Content where non-verbal notation matters (“subject paused before answering”)
- ■ Interviews with heavy accents or specialty jargon where AI accuracy drops below 90%
When it's overkill
- ■ Personal voice memos, brainstorming notes, journal entries
- ■ Interviews you'll edit for a blog post or podcast episode
- ■ Meeting notes for internal use
- ■ Any cost-sensitive workflow (10× more expensive than AI)
- ■ Anything you need faster than 24-72 hours
Cost + turnaround: Rev Human is $1.50/minute (~$90/audio hour), 24-72 hour turnaround. Rev AI is $0.25/minute (~$15/audio hour), 5-24 hour turnaround. See rev.com/pricing for current rates.
Method 4: Desktop workflow (advanced)
For power users who want maximum control, or air-gapped requirements where audio can't touch the cloud.
The workflow
- Voice Memos syncs from iPhone to Mac via iCloud (or export .m4a manually)
- Open Voice Memos on Mac or use Finder to grab the .m4a file
- Run Whisper locally:
whisper voice-memo.m4a --model large-v3 --output_format srt txt - Or use MacWhisper — open-source Mac app with a drag-and-drop UI that runs Whisper locally
- Optional: pair with pyannote.audio for speaker diarization (open source, requires Python)
When Method 4 makes sense
- ■ Batch processing: transcribing dozens of memos overnight on your own machine
- ■ Air-gapped requirement: memo content that can't leave your machine — even encrypted cloud storage isn't allowed by your firm's data policy
- ■ Very long memos: your Mac has faster processing than cloud services for hours-long content
- ■ You already have a GPU: M1/M2/M3/M4 Macs handle Whisper Large-v3 in reasonable time; older Intel Macs can be very slow
Common problems and fixes
Old voice memo won't transcribe on iOS 18
Cause: Some pre-iOS 18 memos don't trigger auto-transcription. Reason unclear; various user reports.
Fix: Export via Share → Save to Files, upload to VexaScribe (or any third-party tool). Works reliably.
Transcription stuck at “Processing”
Cause: Very long memo (60+ min), or Voice Memos app got killed by iOS in the background.
Fix: Force-quit and reopen Voice Memos. If persistent, export the .m4a and use a third-party service.
Voice memo has 2 speakers but Apple's transcript doesn't distinguish them
Cause: Apple's built-in doesn't do speaker diarization — one continuous block of text.
Fix: Upload to VexaScribe. Auto-diarization identifies Speaker 1 and Speaker 2; rename in editor (Host / Guest / whoever).
Non-English memo but I only see English text (or nothing)
Cause: Language isn't in Apple's 10 supported languages, OR iOS transcription defaulted to English.
Fix: Confirm iPhone system language matches memo language for consistent behavior. For any of the ~89 languages Apple doesn't support, use VexaScribe (99 languages, auto-detect).
Need SRT for a video I'm making from the memo
Cause: Apple's built-in exports plain text only, no timestamped subtitle formats.
Fix: Export .m4a → VexaScribe → SRT export. Word-level timestamps included.
I want a summary of my memo — not just the transcript
Cause: Apple Writing Tools (Apple Intelligence) can summarize but require iPhone 15 Pro or iPhone 16 series.
Fix: If you don't have those iPhones, use VexaScribe transcript-to-summary — works on any device, includes 6 summary types (Meeting, Interview, Podcast, and more).
Frequently asked questions
Frequently Asked Questions
Which iPhones can transcribe voice memos on-device?
iPhone 12 or later, running iOS 18 or later. iPhone 11, 11 Pro, 11 Pro Max, XS, XR, X, 8, and SE (2nd generation) do not have on-device Voice Memos transcription. For those devices — and for any iPhone still on iOS 17 or earlier — the memo has to be transcribed by a third-party service. Older memos recorded before you upgraded to iOS 18 will auto-transcribe the first time you open them in the new Voice Memos app, though some users report specific old files fail to transcribe (unclear cause; the fix is to export and use a third-party tool).
What languages does Apple's built-in transcription support?
10 languages as of iOS 18: English (all variants), Spanish, Portuguese, Italian, French, German, Japanese, Korean, Simplified Chinese, and Traditional Chinese. That leaves roughly 89 of Whisper's 99 supported languages uncovered by Apple's built-in — including Arabic, Hindi, Turkish, Russian, Vietnamese, Thai, Indonesian, Polish, Dutch, and dozens of others. If your voice memos are in one of those languages, third-party AI transcription with Whisper Large-v3 is the answer.
How do I export a voice memo to a third-party app?
On the iPhone: open Voice Memos, tap the recording you want to export, tap the three-dot (...) menu, choose Share, and pick either "Save to Files" (which produces a .m4a file you can upload later) or share directly to email, Drive, Dropbox, or another app. The exported file is Apple's M4A (Apple Lossless Audio Codec container). Nearly every transcription service accepts .m4a directly, including VexaScribe. On the Mac: Voice Memos syncs via iCloud; you can export from the Mac's Voice Memos app the same way.
Is my voice memo private if I use a third-party tool?
Depends entirely on the tool. Apple's built-in transcription is on-device — the audio never leaves your iPhone, which is genuinely the strongest privacy answer. VexaScribe stores uploaded files encrypted at rest in AWS eu-west-2 (London, GDPR-compliant), transfers over TLS, and does not use customer data to train AI models. Rev, Otter, Notta, and Trint each have their own data policies — always read them before uploading sensitive content. For maximum privacy on non-supported languages or older iPhones, self-hosted Whisper is the option that keeps audio entirely local (open-source, requires a computer with a GPU).
What accuracy can I expect from AI vs Apple's built-in?
Both hit roughly 92-95% word accuracy on clean audio in English or another well-resourced language. Apple's built-in may be slightly better in supported English variants because it's optimized for iPhone microphones and the specific speech patterns of voice-memo recordings. Third-party AI (Whisper Large-v3) may be slightly better for non-English languages, multi-speaker interviews, or heavily accented English. In practice, the difference is small — both are close enough that other features (language coverage, export formats, speaker labels, length limits) usually decide which tool fits.
Can I get a written summary of my voice memo?
Yes, from two places. Apple's Writing Tools (part of Apple Intelligence, available on iPhone 15 Pro and iPhone 16 series) can summarize a Voice Memos transcript with a tap — but only on those specific devices. On any device, VexaScribe generates a structured summary from the transcript using our six summary types (General, Meeting, Sales Call, Interview, Lecture, Podcast) — including action items, chapters, and key quotes. Details on the summary workflow: see our transcript-to-summary page.
What if my memo is very long — 2 hours or more?
Apple's built-in transcription can process any length, but users report the transcription sometimes fails to appear for very long memos (60+ min), especially if the Voice Memos app gets killed by iOS in the background. If that happens, export the memo (Share → Save to Files) and upload the .m4a to VexaScribe or another third-party tool. Third-party services routinely handle multi-hour recordings — VexaScribe accepts files up to 5 GB, which is about 10 hours of typical voice-memo audio.
Does Apple's built-in work offline?
Yes — Apple's Voice Memos transcription is on-device, meaning it runs locally on your iPhone with no internet connection required. This is a genuine privacy strength: your audio never leaves the phone. Third-party AI services (VexaScribe, Rev, Otter) require uploading the audio to a cloud service, which needs an internet connection. For truly offline transcription with more features, self-hosted Whisper on your own Mac is the option.
Can I transcribe voicemails the same way?
Voicemails are technically a different audio format and get delivered through your carrier's voicemail system, not Voice Memos. Visual Voicemail on iPhone displays transcripts for voicemails automatically (an Apple + carrier feature, English only). For exported voicemails or voicemails in other languages, use our voicemail transcription workflow — different process from voice memos.
Do voice memos work with speaker labels?
Apple's built-in transcription does not produce speaker labels — the whole memo is transcribed as one continuous block of text regardless of how many people are speaking. This matters for interviews, meetings recorded as voice memos, or any two-person conversation. Third-party AI tools with speaker diarization (VexaScribe, Otter) automatically identify Speaker 1, Speaker 2, and so on, and let you rename them (e.g., "Host", "Guest"). Accuracy is best with 2-4 distinct voices in a quiet recording.
Related resources
M4A to text (format-focused)
The format-scoped version of this guide — for M4A files from any source, not just iPhone voice memos.
Voicemail transcription
Voicemails are a different format — Visual Voicemail, exported voicemail files, third-party services.
Voice memo to summary
Turn a voice memo transcript into structured notes — action items, chapters, key quotes.
How accurate is Whisper?
Third-party AI transcription is powered by Whisper Large-v3. Accuracy by language and audio condition.
Transcribe audio to text (main product)
Full VexaScribe overview — all formats, 99 languages, speaker labels, 5 export formats.
Pricing
30-min free trial, $2-$20/month plans. No card required to start.
Verified July 3, 2026 · Apple feature details cross-checked against Apple Support (Voice Memos transcription) · Pricing verified against rev.com/pricing, otter.ai/pricing, and our own pricing page.
Editorial standards: read our policy.