FaceTime Transcription on Mac: How to Get a Real Transcript (Not Just Captions)
Apple's FaceTime supports Live Captions but does not produce a saved transcript. Here's how to capture a complete, exportable FaceTime transcript on macOS, fully on-device.
FaceTime is the default video calling app on Apple devices, and many people use it for conversations that do warrant a transcript — telehealth visits, therapy sessions, calls with elderly family members, journalism interviews, recorded conversations between distant friends. Apple's Live Captions feature in macOS Ventura and later renders captions in real time on Apple Silicon Macs, but those captions vanish at the end of the call. There is no built-in transcript export.
This guide explains how FaceTime captioning actually works, where it falls short, and how to produce a saved, searchable, speaker-labeled FaceTime transcript on macOS — without sending any audio to a cloud service.
What FaceTime Gives You — and What It Doesn't
| Feature | Available in FaceTime | Saved as transcript |
|---|---|---|
| Live Captions (Ventura+, Apple Silicon) | ✅ | ❌ |
| Saved transcript / chat log | ❌ | — |
| Recording | Via QuickTime / screen recording | Yes, as media |
| Speaker labels | ❌ | — |
| Export to TXT / SRT / VTT | ❌ | — |
Live Captions does its job — it makes calls accessible in real time — but it is fundamentally an accessibility surface, not a documentation feature. The captions are computed on-device and discarded as soon as the call ends.
The Two Paths to a FaceTime Transcript on Mac
Path 1: Record the Call, Transcribe the File
macOS supports screen recording with system audio via ScreenCaptureKit (built into macOS 12.3+ and the foundation of QuickTime's screen-recording feature). You can:
- Start a screen recording before joining the FaceTime call
- Capture the call audio as part of the recording
- Run a local transcription tool over the resulting MOV/MP4 to produce a transcript
This works on every Mac that supports modern macOS. The downside: it is a manual two-step process and the recording file is large.
Path 2: Live Capture with Auto-Transcription
A Mac menu-bar app can detect that FaceTime is the active conferencing window and capture both your microphone and system audio in 16 kHz mono — the format speech models actually want. When the call ends, the app runs transcription and diarization locally. No file shuffling, no manual conversion.
This is what Hapi does on macOS Sonoma and later.
Setting Up Local FaceTime Transcription
A practical workflow:
- Install Hapi. Grant Microphone and Screen Recording permissions.
- Start FaceTime normally. No browser extension, no virtual audio device, no participant bot.
- Hapi auto-detects that FaceTime is active and begins capturing audio.
- Talk freely. Live drafts may appear during the call; the final clean transcript runs after the call ends.
- End the call. Hapi runs Parakeet-class transcription and ECAPA diarization for speaker labels. The Mac's Neural Engine handles the work.
- Review and export. Output formats: TXT, Markdown, JSON, SRT, VTT.
Privacy: Why Local Matters Here Specifically
FaceTime is end-to-end encrypted between Apple devices. That guarantee covers the call itself — Apple cannot read the contents of a FaceTime call in transit. The moment you upload a recording to a cloud transcription service, you break that property: the third-party service can read the audio, store it, and (depending on policy) use it to train models.
Local transcription preserves the original property. Audio that was end-to-end encrypted in transit stays on the Mac after the call. The transcription model never transmits anything.
This matters most for:
- Telehealth and therapy calls subject to HIPAA
- Family calls containing health, financial, or relationship information
- Journalist interviews with sources who chose FaceTime specifically for its encryption
- Cross-border family conversations where one or both ends sit in jurisdictions with surveillance concerns
Local vs. Other Transcription Approaches
| Dimension | Live Captions | Cloud transcription | Local Mac capture |
|---|---|---|---|
| Saved transcript | ❌ | ✅ | ✅ |
| Audio destination | On-device only | Vendor cloud | Stays on Mac |
| Speaker labels | ❌ | Sometimes | ✅ (diarized) |
| Cost | Free | Paid SaaS | Free |
| Works offline | ✅ | ❌ | ✅ |
| Preserves E2E privacy property | ✅ | ❌ | ✅ |
| Export formats | None | Varies | TXT/MD/JSON/SRT/VTT |
Bottom Line
FaceTime gives you Live Captions but not a saved transcript. The architecturally honest fix is a local Mac capture tool that preserves the same on-device property the call itself relies on. You get a searchable, speaker-labeled transcript, the audio never leaves the Mac, and the workflow takes one click.
Related

