How to Transcribe Video Files on Mac: Complete Guide (SRT, VTT, Subtitles)
Step-by-step guide to transcribing video files on Mac. Learn to extract audio, generate subtitles (SRT/VTT), transcribe YouTube videos, and compare top tools (Hapi, MacWhisper, Descript, Rev).
Quick Answer: Best Video Transcription Methods for Mac
| Method | Cost | Accuracy | Subtitle Export | Best For |
|---|---|---|---|---|
| Hapi | Free | 95-99% | SRT, VTT | Local files, privacy, unlimited use |
| MacWhisper | $30 | 90-95% | SRT, VTT | One-time purchase, offline |
| Descript | $12-24/mo | 90-95% | SRT, VTT | Video editing + transcription |
| Rev | $1.50/min | 99% (human) | SRT, VTT | Human accuracy, tight deadlines |
| YouTube auto-captions | Free | 70-80% | SRT (auto) | Quick previews, accessibility |
Recommended: Hapi for most users — highest accuracy, free, works offline, no watermarks or upload limits.
Why Transcribe Video Files?
1. Subtitle/Caption Generation
Accessibility: YouTube, Vimeo, social media require captions for deaf/hard-of-hearing viewers (ADA/WCAG compliance).
SEO: Search engines index video captions, improving discoverability. YouTube videos with captions rank 7% higher in search results.
International reach: Translate captions to multiple languages (Hapi's AI can translate transcripts to 50+ languages).
2. Content Repurposing
Turn video content into:
- Blog posts (transcript → article)
- Social media quotes (extract key moments)
- Email newsletters (summarize main points)
- Podcast episodes (extract audio track)
- Documentation (meeting recordings → written minutes)
3. Searchable Video Archive
Problem: Finding specific moments in 100+ hours of footage.
Solution: Transcribe all videos, search transcripts for keywords, jump directly to timestamp.
Example: "Find all mentions of 'quarterly budget' in Q4 2025 board meetings" — instant results vs scrubbing through hours of video.
4. Video Editing Workflow
Editors use transcripts to:
- Find exact quotes without re-watching
- Create highlight reels (search for "best moments")
- Write b-roll shot lists (identify visual needs)
- Collaborate with clients (share transcript for approval before editing)
Video Transcription Methods on Mac
Method 1: Hapi (Local, Private, Free)
Best for: Unlimited transcription, privacy-conscious users, SRT/VTT export, Mac-native workflow.
Step 1: Prepare Video File
Supported formats: MP4, MOV, M4V, AVI, MKV, WebM (all common video formats)
Audio extraction: Hapi auto-extracts audio track from video — no manual conversion needed.
File location: Drag video file directly to Hapi, or File → Open Video.
Step 2: Transcribe
- Open Hapi (menu bar icon or ⌘Space → "Hapi")
- Drag video file into Hapi window
- Hapi shows video metadata (duration, codec, resolution)
- Click "Transcribe"
- Processing starts (WhisperKit engine, 100% local)
Processing time: ~1× real-time (10-minute video = 10 minutes processing on M1 Mac)
Accuracy: 95-99% with WhisperKit Large v3 model
Step 3: Review & Edit
- Transcript appears with timestamps
- Speaker labels if multiple voices detected (auto-diarization)
- Click any timestamp to preview video at that moment
- Edit text directly in Hapi (corrections persist)
Step 4: Generate Subtitles
SRT format (universal compatibility):
1
00:00:02,150 --> 00:00:05,400
Welcome to our Q4 product demo.
2
00:00:05,600 --> 00:00:09,200
Today we'll cover three new features.
VTT format (web styling):
WEBVTT
00:00:02.150 --> 00:00:05.400
<c.speaker1>Welcome to our Q4 product demo.</c>
00:00:05.600 --> 00:00:09.200 align:center
<i>Today we'll cover three new features.</i>
Export steps:
- Click "Export" button
- Select format: SRT or VTT
- Choose output location
- File ready for video editor (Final Cut Pro, Premiere, DaVinci Resolve)
Step 5: Add Subtitles to Video
Final Cut Pro:
- Import SRT file: File → Import → Captions
- Drag SRT to timeline above video clip
- Captions appear burned-in or as separate track
- Style in Captions inspector (font, color, position)
DaVinci Resolve:
- Right-click timeline → Subtitles → Import Subtitle
- Select SRT/VTT file
- Captions auto-sync to video
- Edit in Subtitles panel
iMovie (limited support):
- iMovie doesn't support SRT import
- Workaround: Use Hapi's TXT export → copy/paste into iMovie text overlays manually
Hapi Unique Features for Video
✅ Batch processing — transcribe 10+ videos overnight, export all as SRT batch ✅ Timestamp accuracy — WhisperKit generates frame-accurate timecodes (±100ms) ✅ Multi-language detection — auto-switches between languages in same video (EN ↔ ES) ✅ No file size limits — transcribe 4K 2-hour videos without upload restrictions ✅ 100% local — video never leaves your Mac (GDPR/HIPAA compliant) ✅ AI content generation — "Create YouTube description from this transcript" (Qwen3 local LLM)
Method 2: MacWhisper (Offline, One-Time Purchase)
Best for: Users who want offline transcription without subscription, similar to Hapi.
How to Use MacWhisper
Setup:
- Purchase MacWhisper Pro ($30 one-time, Mac App Store)
- Download Whisper model (Large v2 recommended, 3GB)
- Grant microphone + files permissions
Transcribe:
- Drag video file to MacWhisper dock icon
- Select language (or "Auto-detect")
- Click "Transcribe"
- Export as SRT, VTT, TXT, DOCX
Processing: Slightly slower than Hapi (~1.2× real-time on M1)
Accuracy: 90-95% (using Whisper Large v2, older model than Hapi's v3)
MacWhisper vs Hapi
| Feature | MacWhisper | Hapi |
|---|---|---|
| Cost | $30 one-time | Free |
| Accuracy | 90-95% (Whisper v2) | 95-99% (Whisper v3) |
| Speaker labels | ❌ No | ✅ Yes (auto-diarization) |
| AI features | ❌ No | ✅ Summaries, repurposing, translation |
| Batch export | ❌ No | ✅ Yes |
| Updates | App Store only | Continuous (built-in updater) |
| Interface | Simple single-window | Multi-transcript management |
When to choose MacWhisper: You prefer App Store apps with minimal features, don't need speaker labels or AI tools.
Method 3: Descript (Cloud, Video Editor + Transcription)
Best for: Video creators who edit and transcribe in same tool, team collaboration.
How to Use Descript
Setup:
- Sign up at descript.com
- Download Mac app
- Create project
Transcribe + Edit Workflow:
- Upload video to Descript (cloud storage)
- Auto-transcription starts (5× real-time, processed in cloud)
- Edit video by editing transcript:
- Delete sentence in transcript → video cuts that section
- Rearrange text → video reorders clips
- Add filler word removal ("um", "uh") → video auto-cuts pauses
- Export video + SRT/VTT subtitles
Unique to Descript:
- Overdub — AI voice clone to fix mistakes ("Actually, Q3 revenue..." → re-record just that word)
- Studio Sound — one-click audio enhancement (removes echo, background noise)
- Screen recording — record screen + webcam, auto-transcribe
- Collaboration — team members comment on specific transcript lines
Pricing
- Free: 1 hour transcription/month, watermarked exports
- Creator: $12/month — 10 hours/month, no watermarks
- Pro: $24/month — 30 hours/month, API access
When to choose Descript: You edit videos frequently and want text-based editing workflow. Not ideal if you only need transcription (Hapi is free and private).
Method 4: Rev (Human Transcription, Highest Accuracy)
Best for: Legal depositions, medical consultations, critical accuracy requirements.
How to Use Rev
Upload:
- Go to rev.com
- Upload video file (or paste YouTube URL)
- Select "Transcription" or "Captions"
Turnaround:
- Automated: ~5 minutes (AI transcription, $0.25/min)
- Human: 12 hours typical (99% accuracy, $1.50/min)
Delivery:
- Email notification when ready
- Download TXT, SRT, VTT, DOCX
- Timestamps accurate to ±0.5 seconds
Accuracy comparison:
- Rev AI: 85-90% (same as Descript)
- Rev Human: 99% (professional transcribers)
When to choose Rev Human:
- Legal/medical context (admissible in court)
- Heavy accents or technical jargon
- Absolute accuracy required (no room for errors)
Cost example: 60-minute video = $90 (human) vs $15 (AI) vs $0 (Hapi local)
Method 5: YouTube Auto-Captions (Free, Lowest Accuracy)
Best for: Quick previews, rough drafts, videos already on YouTube.
How to Use YouTube Auto-Captions
Download auto-generated captions:
- Upload video to YouTube (unlisted if private)
- Wait 10-30 minutes for auto-captions
- Download SRT:
- Open YouTube Studio
- Video → Subtitles
- Click 3 dots → Download → .srt
Or use yt-dlp (faster, no upload):
# Download YouTube video + auto-captions
brew install yt-dlp
yt-dlp --write-auto-subs --sub-lang en --skip-download [VIDEO_URL]
# Result: video_id.en.vtt file
Accuracy: 70-80% (struggles with:)
- Technical terms (YouTube misses industry jargon)
- Multiple speakers talking simultaneously
- Background music or noise
- Non-native English accents
Ad limitation: YouTube auto-captions include "[Music]" markers where ads played during playback, contaminating transcript.
When to use: You already have the video on YouTube, need a rough draft quickly, don't mind low accuracy.
YouTube Video Transcription (Private)
Problem: YouTube auto-captions are inaccurate (70-80%) and require uploading video publicly/unlisted.
Better solution: Download video, transcribe locally with Hapi.
Download YouTube Video Locally
Using yt-dlp (best quality, fastest):
# Install yt-dlp
brew install yt-dlp
# Download video (best quality)
yt-dlp -f "bestvideo+bestaudio" [YOUTUBE_URL]
# Download audio only (smaller file)
yt-dlp -f "bestaudio" --extract-audio --audio-format mp3 [YOUTUBE_URL]
# Download with metadata (title, uploader, date)
yt-dlp --add-metadata [YOUTUBE_URL]
# Result: video saved to current directory
Using 4K Video Downloader (GUI app):
- Download 4kdownload.com
- Copy YouTube URL
- Click "Paste Link"
- Select quality (1080p, 4K, etc.)
- Download completes
Transcribe Downloaded YouTube Video
- Open Hapi
- Drag downloaded MP4/MKV file
- Click "Transcribe"
- Hapi processes locally (95-99% accuracy)
- Export SRT/VTT with accurate timestamps
Comparison:
| Method | Accuracy | Privacy | Cost |
|---|---|---|---|
| YouTube auto-captions | 70-80% | Video uploaded to Google | Free |
| Hapi local transcription | 95-99% | Video stays on your Mac | Free |
Why local is better:
- No upload time (transcribe 2GB video without internet)
- No ad markers in transcript
- Speaker labels (YouTube doesn't label speakers)
- Privacy (confidential videos never uploaded)
- No usage limits (YouTube caps auto-caption generation)
Subtitle Format Comparison
SRT (SubRip Text)
Universal standard — works everywhere (YouTube, Vimeo, VLC, Final Cut, Premiere).
Format:
1
00:00:00,500 --> 00:00:02,500
This is the first subtitle.
2
00:00:03,000 --> 00:00:06,000
This is the second subtitle.
It can have multiple lines.
Structure:
- Cue number (1, 2, 3...)
- Timecode (hours:minutes:seconds,milliseconds)
- Text (one or more lines)
- Blank line separator
Limitations:
- No styling (all text looks same)
- No positioning (always bottom-center)
- No color, font, or size control
When to use: General compatibility, upload to social media, work with all video editors.
VTT (WebVTT)
Web standard — HTML5 video player format with styling support.
Format:
WEBVTT
00:00:00.500 --> 00:00:02.500
This is the first subtitle.
00:00:03.000 --> 00:00:06.000 align:center
<c.speaker1>Speaker 1:</c> This has custom styling.
NOTE This is a comment (not displayed)
00:00:07.000 --> 00:00:10.000 line:90% position:50%
<i>Italic text</i> and <b>bold text</b>
Advanced features:
- Styling:
<i>,<b>,<u>,<c.className>for CSS - Positioning:
align:start|center|end,position:X%,line:Y% - Cue settings: Control vertical/horizontal placement
- Comments:
NOTElines for internal notes - Metadata: Custom data embedded in cues
When to use:
- Web video players (Vimeo, custom HTML5 players)
- Need colored speaker labels (e.g., Speaker 1 = blue, Speaker 2 = red)
- Multi-language with styled subtitles
- Accessibility requirements (position captions to avoid obscuring sign language interpreter)
Which Format to Export?
| Scenario | Format | Why |
|---|---|---|
| Upload to YouTube | SRT | Universal compatibility, no styling needed |
| Upload to Vimeo | VTT | Vimeo supports VTT styling |
| Social media (Instagram, TikTok) | SRT | Platforms auto-convert, SRT works everywhere |
| Final Cut Pro | SRT | Native support, easier to style in FCP |
| Web embedding | VTT | HTML5 <track> tag, custom CSS styling |
| Accessibility compliance | VTT | WCAG requires positioning + color contrast |
Advanced Workflows
Workflow 1: Multi-Language Subtitles for YouTube
Goal: Upload video with English, Spanish, French, German subtitles.
Steps:
-
Transcribe in original language (English):
- Open video in Hapi
- Transcribe → Export SRT (English)
-
Translate transcript (using Hapi AI):
- Open transcript in Hapi
- Click "AI Chat"
- Paste prompt:
Translate this transcript to Spanish, preserving all timestamps. Output as SRT format.- Repeat for French, German
-
Upload to YouTube:
- YouTube Studio → Video → Subtitles
- Upload
english.srt→ Language: English - Upload
spanish.srt→ Language: Spanish - Upload
french.srt→ Language: French - Upload
german.srt→ Language: German
-
Viewers choose language in YouTube player settings
Time: ~10 minutes for 4 languages (vs hours of manual translation)
Workflow 2: Video → Blog Post Automation
Goal: Turn 30-minute video into 1,500-word blog post in 5 minutes.
Steps:
-
Transcribe video (Hapi)
-
Open AI Chat in Hapi
-
Paste repurposing prompt:
Convert this video transcript into a blog post. Format: - SEO-optimized title (H1) - 2-3 sentence intro - 5 main sections (H2 headers) - Bullet points for key takeaways - Conclusion with call-to-action Tone: Professional but conversational Length: ~1,500 words -
Hapi AI generates blog post using Qwen3 local LLM
-
Copy output → paste into WordPress/Notion
-
Add images from video screenshots
-
Publish
Result: One video creates 3+ content pieces:
- YouTube video (original)
- Blog post (transcript repurposed)
- Social media quotes (extract key moments from transcript)
Time saved: 90% vs writing from scratch (5 min vs 60 min)
Workflow 3: Searchable Video Library
Problem: 200+ training videos, can't find specific information without watching everything.
Solution: Transcribe all videos, build searchable archive.
Setup:
-
Batch transcribe all videos:
- Open Hapi
- Drag 200 video files into Hapi window
- Click "Transcribe All"
- Processing runs overnight
-
Export as searchable format:
- Select all transcripts
- Export as JSON (includes timestamps + metadata)
-
Build search interface (optional, for teams):
{ "video_id": "training-001", "title": "Onboarding: Customer Support Tools", "transcript": [ {"timestamp": "00:02:15", "text": "To escalate a ticket..."}, {"timestamp": "00:05:40", "text": "Use the priority queue for..."} ] } -
Search using Hapi's built-in search:
- Open Hapi → Transcripts
- Search bar: "priority queue"
- Results show all videos mentioning that term
- Click result → jump to exact timestamp
ROI: Support team finds answers in 10 seconds vs 20 minutes of video scrubbing.
Workflow 4: Podcast Video Repurposing
Goal: Extract podcast clips for social media (Instagram Reels, TikTok, YouTube Shorts).
Steps:
-
Transcribe full podcast episode (90 min)
-
AI extracts highlights:
Find the 5 most engaging 60-second segments from this podcast. Criteria: - Self-contained topic (no context needed) - Strong opening hook - Quotable conclusion Output: - Timestamp range - Topic summary - Why it's shareable -
Hapi AI returns:
Clip 1: 00:12:30 - 00:13:45 Topic: "Why most startups fail at SEO" Hook: "SEO is not about keywords. It's about user intent." Shareable: Contrarian take, actionable insight Clip 2: 00:34:20 - 00:35:10 Topic: "How to validate a business idea in 48 hours" Hook: "You don't need to build the product first." Shareable: Tactical step-by-step process -
Export SRT for each clip (e.g.,
00:12:30 - 00:13:45) -
Edit in Final Cut/Premiere:
- Cut video clip to timestamp range
- Import corresponding SRT
- Style captions (large font, emoji for emphasis)
- Export as 1080×1920 vertical video
-
Upload to social media with auto-generated captions
Time: 10 clips in 30 minutes vs 4 hours of manual scrubbing + editing.
Troubleshooting
"Hapi can't open this video file"
Cause: Unsupported codec or DRM-protected video.
Fix:
-
Convert video using HandBrake (free):
brew install --cask handbrake- Open HandBrake
- Drag video file
- Preset: "Fast 1080p30"
- Start encode
- Transcribe converted MP4
-
Check for DRM: Purchased videos (iTunes, Amazon Prime) are DRM-locked and can't be transcribed legally.
Subtitles Out of Sync with Video
Cause: Video was edited after transcription (cuts added/removed).
Fix:
- Re-transcribe final edited video
- Or shift all subtitles:
- Open SRT in Aegisub (free subtitle editor)
- Select all cues
- Shift → +2.5 seconds (or -2.5 for delay)
- Save adjusted SRT
SRT File Shows Garbled Text in Video Editor
Cause: Character encoding mismatch (UTF-8 vs ANSI).
Fix:
- Open SRT in TextEdit (Mac)
- Format → Make Plain Text
- File → Save As
- Encoding: UTF-8 (bottom dropdown)
- Re-import to video editor
YouTube Rejects My SRT Upload
Common errors:
Error: "Invalid timecode format"
Fix: Ensure SRT uses commas (,), not periods (.):
00:00:02,500 --> 00:00:05,400 ✅ Correct
00:00:02.500 --> 00:00:05.400 ❌ Wrong (VTT format)
Error: "Overlapping cues" Fix: Check for cues that overlap:
1
00:00:02,000 --> 00:00:05,000
First subtitle.
2
00:00:04,000 --> 00:00:06,000 ❌ Overlaps with cue 1
Second subtitle.
Remove overlap by adjusting end timecode of first cue to 00:00:03,999.
Hapi Transcription Takes Too Long
Expected times (M1 Mac, WhisperKit Large v3):
- 10-minute video: ~10 minutes
- 60-minute video: ~60 minutes
- 120-minute video: ~2 hours
If significantly slower:
- Check available RAM (close other apps)
- Restart Hapi (clears model cache)
- Try Medium model (faster, 93-96% accuracy):
- Hapi Settings → Models → WhisperKit Medium
- Upgrade to M1 Pro/Max (2× faster transcription)
Privacy & Security
What Data Leaves Your Mac?
| Tool | Data Sent to Cloud |
|---|---|
| Hapi | Nothing (100% local) |
| MacWhisper | Nothing (100% local) |
| Descript | Video + audio uploaded to Descript servers |
| Rev | Video + audio uploaded to Rev servers |
| YouTube auto-captions | Video uploaded to Google servers |
GDPR/HIPAA Compliance
Hapi & MacWhisper: Compliant by default (no data transmission).
Descript: Requires Business Associate Agreement (BAA) for HIPAA. GDPR-compliant with DPA (Data Processing Agreement).
Rev: Offers HIPAA-compliant service ($1.75/min vs $1.50 standard). GDPR-compliant.
YouTube: Not HIPAA-compliant (no BAA available). GDPR-compliant for non-healthcare use.
Confidential Video Best Practices
- Never upload to cloud services (use Hapi or MacWhisper)
- Store locally on encrypted Mac drive (FileVault enabled)
- Delete cloud copies after download (if you used Descript/Rev)
- Use local AI for repurposing (Hapi's Qwen3 vs ChatGPT/Claude)
Which Video Transcription Tool Should You Choose?
Use Hapi if you:
- Want free unlimited transcription
- Need highest accuracy (95-99%)
- Value privacy (100% local processing)
- Want AI features (summaries, repurposing, translation)
- Transcribe Mac screen recordings, Zoom calls, YouTube videos
- Need SRT/VTT subtitle export
- Use Mac (M1/M2/M3 recommended)
Use MacWhisper if you:
- Prefer App Store purchases ($30 one-time)
- Don't need speaker labels or AI tools
- Want minimal interface (single window)
- Offline-only use case
Use Descript if you:
- Edit videos frequently (text-based editing saves time)
- Need collaboration features (team comments)
- Want overdub voice cloning
- Have budget for subscription ($12-24/mo)
- Don't mind cloud upload (faster processing)
Use Rev Human if you:
- Need 99% accuracy for legal/medical context
- Can afford $1.50/minute ($90/hour)
- Tight deadline (12-hour turnaround)
- Heavy accents or complex terminology
Use YouTube auto-captions if you:
- Need rough draft in 5 minutes
- Video already on YouTube
- Don't care about 70-80% accuracy
- Free is only option
Get Started
For most Mac users who want free, accurate, private video transcription with subtitle export, Hapi is the best choice.
Related

