How to Transcribe Video Files on Mac: Complete Guide (SRT, VTT, Subtitles)
Step-by-step guide to transcribing video files on Mac. Learn to extract audio, generate subtitles (SRT/VTT), transcribe YouTube videos, and compare top tools (Hapi, MacWhisper, Descript, Rev).
Quick Answer: Best Video Transcription Methods for Mac
| Method | Cost | Accuracy | Subtitle Export | Best For |
|---|---|---|---|---|
| Hapi | Free | 95-99% | SRT, VTT | Local files, privacy, unlimited use |
| MacWhisper | $30 | 90-95% | SRT, VTT | One-time purchase, offline |
| Descript | $12-24/mo | 90-95% | SRT, VTT | Video editing + transcription |
| Rev | $1.50/min | 99% (human) | SRT, VTT | Human accuracy, tight deadlines |
| YouTube auto-captions | Free | 70-80% | SRT (auto) | Quick previews, accessibility |
Recommended: Hapi for most users — highest accuracy, free, works offline, no watermarks or upload limits.
Why Transcribe Video Files?
1. Subtitle/Caption Generation
Accessibility: YouTube, Vimeo, social media require captions for deaf/hard-of-hearing viewers (ADA/WCAG compliance).
SEO: Search engines index video captions, improving discoverability. YouTube videos with captions rank 7% higher in search results.
International reach: Translate captions to multiple languages (Hapi's AI can translate transcripts to 50+ languages).
2. Content Repurposing
Turn video content into:
- Blog posts (transcript → article)
- Social media quotes (extract key moments)
- Email newsletters (summarize main points)
- Podcast episodes (extract audio track)
- Documentation (meeting recordings → written minutes)
3. Searchable Video Archive
Problem: Finding specific moments in 100+ hours of footage.
Solution: Transcribe all videos, search transcripts for keywords, jump directly to timestamp.
Example: "Find all mentions of 'quarterly budget' in Q4 2025 board meetings" — instant results vs scrubbing through hours of video.
4. Video Editing Workflow
Editors use transcripts to:
- Find exact quotes without re-watching
- Create highlight reels (search for "best moments")
- Write b-roll shot lists (identify visual needs)
- Collaborate with clients (share transcript for approval before editing)
Video Transcription Methods on Mac
Method 1: Hapi (Local, Private, Free)
Best for: Unlimited transcription, privacy-conscious users, SRT/VTT export, Mac-native workflow.
Step 1: Prepare Video File
Supported formats: MP4, MOV, M4V, AVI, MKV, WebM (all common video formats)
Audio extraction: Hapi auto-extracts audio track from video — no manual conversion needed.
File location: Drag video file directly to Hapi, or File → Open Video.
Step 2: Transcribe
- Open Hapi (menu bar icon or ⌘Space → "Hapi")
- Drag video file into Hapi window
- Hapi shows video metadata (duration, codec, resolution)
- Click "Transcribe"
- Processing starts (WhisperKit engine, 100% local)
Processing time: ~1× real-time (10-minute video = 10 minutes processing on M1 Mac)
Accuracy: 95-99% with WhisperKit Large v3 model
Step 3: Review & Edit
- Transcript appears with timestamps
- Speaker labels if multiple voices detected (auto-diarization)
- Click any timestamp to preview video at that moment
- Edit text directly in Hapi (corrections persist)
Step 4: Generate Subtitles
SRT format (universal compatibility):
1
00:00:02,150 --> 00:00:05,400
Welcome to our Q4 product demo.
2
00:00:05,600 --> 00:00:09,200
Today we'll cover three new features.
VTT format (web styling):
WEBVTT
00:00:02.150 --> 00:00:05.400
<c.speaker1>Welcome to our Q4 product demo.</c>
00:00:05.600 --> 00:00:09.200 align:center
<i>Today we'll cover three new features.</i>
Export steps:
- Click "Export" button
- Select format: SRT or VTT
- Choose output location
- File ready for video editor (Final Cut Pro, Premiere, DaVinci Resolve)
Step 5: Add Subtitles to Video
Final Cut Pro:
- Import SRT file: File → Import → Captions
- Drag SRT to timeline above video clip
- Captions appear burned-in or as separate track
- Style in Captions inspector (font, color, position)
DaVinci Resolve:
- Right-click timeline → Subtitles → Import Subtitle
- Select SRT/VTT file
- Captions auto-sync to video
- Edit in Subtitles panel
iMovie (limited support):
- iMovie doesn't support SRT import
- Workaround: Use Hapi's TXT export → copy/paste into iMovie text overlays manually
Hapi Unique Features for Video
✅ Batch processing — transcribe 10+ videos overnight, export all as SRT batch ✅ Timestamp accuracy — WhisperKit generates frame-accurate timecodes (±100ms) ✅ Multi-language detection — auto-switches between languages in same video (EN ↔ ES) ✅ No file size limits — transcribe 4K 2-hour videos without upload restrictions ✅ 100% local — video never leaves your Mac (GDPR/HIPAA compliant) ✅ AI content generation — "Create YouTube description from this transcript" (Qwen3 local LLM)
Method 2: MacWhisper (Offline, One-Time Purchase)
Best for: Users who want offline transcription without subscription, similar to Hapi.
How to Use MacWhisper
Setup:
- Purchase MacWhisper Pro ($30 one-time, Mac App Store)
- Download Whisper model (Large v2 recommended, 3GB)
- Grant microphone + files permissions
Transcribe:
- Drag video file to MacWhisper dock icon
- Select language (or "Auto-detect")
- Click "Transcribe"
- Export as SRT, VTT, TXT, DOCX
Processing: Slightly slower than Hapi (~1.2× real-time on M1)
Accuracy: 90-95% (using Whisper Large v2, older model than Hapi's v3)
MacWhisper vs Hapi
| Feature | MacWhisper | Hapi |
|---|---|---|
| Cost | $30 one-time | Free |
| Accuracy | 90-95% (Whisper v2) | 95-99% (Whisper v3) |
| Speaker labels | ❌ No | ✅ Yes (auto-diarization) |
| AI features | ❌ No | ✅ Summaries, repurposing, translation |
| Batch export | ❌ No | ✅ Yes |
| Updates | App Store only | Continuous (built-in updater) |
| Interface | Simple single-window | Multi-transcript management |
When to choose MacWhisper: You prefer App Store apps with minimal features, don't need speaker labels or AI tools.
Method 3: Descript (Cloud, Video Editor + Transcription)
Best for: Video creators who edit and transcribe in same tool, team collaboration.
How to Use Descript
Setup:
- Sign up at descript.com
- Download Mac app
- Create project
Transcribe + Edit Workflow:
- Upload video to Descript (cloud storage)
- Auto-transcription starts (5× real-time, processed in cloud)
- Edit video by editing transcript:
- Delete sentence in transcript → video cuts that section
- Rearrange text → video reorders clips
- Add filler word removal ("um", "uh") → video auto-cuts pauses
- Export video + SRT/VTT subtitles
Unique to Descript:
- Overdub — AI voice clone to fix mistakes ("Actually, Q3 revenue..." → re-record just that word)
- Studio Sound — one-click audio enhancement (removes echo, background noise)
- Screen recording — record screen + webcam, auto-transcribe
- Collaboration — team members comment on specific transcript lines
Pricing
- Free: 1 hour transcription/month, watermarked exports
- Creator: $12/month — 10 hours/month, no watermarks
- Pro: $24/month — 30 hours/month, API access
When to choose Descript: You edit videos frequently and want text-based editing workflow. Not ideal if you only need transcription (Hapi is free and private).
Method 4: Rev (Human Transcription, Highest Accuracy)
Best for: Legal depositions, medical consultations, critical accuracy requirements.
How to Use Rev
Upload:
- Go to rev.com
- Upload video file (or paste YouTube URL)
- Select "Transcription" or "Captions"
Turnaround:
- Automated: ~5 minutes (AI transcription, $0.25/min)
- Human: 12 hours typical (99% accuracy, $1.50/min)
Delivery:
- Email notification when ready
- Download TXT, SRT, VTT, DOCX
- Timestamps accurate to ±0.5 seconds
Accuracy comparison:
- Rev AI: 85-90% (same as Descript)
- Rev Human: 99% (professional transcribers)
When to choose Rev Human:
- Legal/medical context (admissible in court)
- Heavy accents or technical jargon
- Absolute accuracy required (no room for errors)
Cost example: 60-minute video = $90 (human) vs $15 (AI) vs $0 (Hapi local)
Method 5: YouTube Auto-Captions (Free, Lowest Accuracy)
Best for: Quick previews, rough drafts, videos already on YouTube.
How to Use YouTube Auto-Captions
Download auto-generated captions:
- Upload video to YouTube (unlisted if private)
- Wait 10-30 minutes for auto-captions
- Download SRT:
- Open YouTube Studio
- Video → Subtitles
- Click 3 dots → Download → .srt
Or use yt-dlp (faster, no upload):
# Download YouTube video + auto-captions
brew install yt-dlp
yt-dlp --write-auto-subs --sub-lang en --skip-download [VIDEO_URL]
# Result: video_id.en.vtt file
Accuracy: 70-80% (struggles with:)
- Technical terms (YouTube misses industry jargon)
- Multiple speakers talking simultaneously
- Background music or noise
- Non-native English accents
Ad limitation: YouTube auto-captions include "[Music]" markers where ads played during playback, contaminating transcript.
When to use: You already have the video on YouTube, need a rough draft quickly, don't mind low accuracy.
YouTube Video Transcription (Private)
Problem: YouTube auto-captions are inaccurate (70-80%) and require uploading video publicly/unlisted.
Better solution: Download video, transcribe locally with Hapi.
Download YouTube Video Locally
Using yt-dlp (best quality, fastest):
# Install yt-dlp
brew install yt-dlp
# Download video (best quality)
yt-dlp -f "bestvideo+bestaudio" [YOUTUBE_URL]
# Download audio only (smaller file)
yt-dlp -f "bestaudio" --extract-audio --audio-format mp3 [YOUTUBE_URL]
# Download with metadata (title, uploader, date)
yt-dlp --add-metadata [YOUTUBE_URL]
# Result: video saved to current directory
Using 4K Video Downloader (GUI app):
- Download 4kdownload.com
- Copy YouTube URL
- Click "Paste Link"
- Select quality (1080p, 4K, etc.)
- Download completes
Transcribe Downloaded YouTube Video
- Open Hapi
- Drag downloaded MP4/MKV file
- Click "Transcribe"
- Hapi processes locally (95-99% accuracy)
- Export SRT/VTT with accurate timestamps
Comparison:
| Method | Accuracy | Privacy | Cost |
|---|---|---|---|
| YouTube auto-captions | 70-80% | Video uploaded to Google | Free |
| Hapi local transcription | 95-99% | Video stays on your Mac | Free |
Why local is better:
- No upload time (transcribe 2GB video without internet)
- No ad markers in transcript
- Speaker labels (YouTube doesn't label speakers)
- Privacy (confidential videos never uploaded)
- No usage limits (YouTube caps auto-caption generation)
Subtitle Format Comparison
SRT (SubRip Text)
Universal standard — works everywhere (YouTube, Vimeo, VLC, Final Cut, Premiere).
Format:
1
00:00:00,500 --> 00:00:02,500
This is the first subtitle.
2
00:00:03,000 --> 00:00:06,000
This is the second subtitle.
It can have multiple lines.
Structure:
- Cue number (1, 2, 3...)
- Timecode (hours:minutes:seconds,milliseconds)
- Text (one or more lines)
- Blank line separator
Limitations:
- No styling (all text looks same)
- No positioning (always bottom-center)
- No color, font, or size control
When to use: General compatibility, upload to social media, work with all video editors.
VTT (WebVTT)
Web standard — HTML5 video player format with styling support.
Format:
WEBVTT
00:00:00.500 --> 00:00:02.500
This is the first subtitle.
00:00:03.000 --> 00:00:06.000 align:center
<c.speaker1>Speaker 1:</c> This has custom styling.
NOTE This is a comment (not displayed)
00:00:07.000 --> 00:00:10.000 line:90% position:50%
<i>Italic text</i> and <b>bold text</b>
Advanced features:
- Styling:
<i>,<b>,<u>,<c.className>for CSS - Positioning:
align:start|center|end,position:X%,line:Y% - Cue settings: Control vertical/horizontal placement
- Comments:
NOTElines for internal notes - Metadata: Custom data embedded in cues
When to use:
- Web video players (Vimeo, custom HTML5 players)
- Need colored speaker labels (e.g., Speaker 1 = blue, Speaker 2 = red)
- Multi-language with styled subtitles
- Accessibility requirements (position captions to avoid obscuring sign language interpreter)
Which Format to Export?
| Scenario | Format | Why |
|---|---|---|
| Upload to YouTube | SRT | Universal compatibility, no styling needed |
| Upload to Vimeo | VTT | Vimeo supports VTT styling |
| Social media (Instagram, TikTok) | SRT | Platforms auto-convert, SRT works everywhere |
| Final Cut Pro | SRT | Native support, easier to style in FCP |
| Web embedding | VTT | HTML5 <track> tag, custom CSS styling |
| Accessibility compliance | VTT | WCAG requires positioning + color contrast |
Advanced Workflows
Workflow 1: Multi-Language Subtitles for YouTube
Goal: Upload video with English, Spanish, French, German subtitles.
Steps:
-
Transcribe in original language (English):
- Open video in Hapi
- Transcribe → Export SRT (English)
-
Translate transcript (using Hapi AI):
- Open transcript in Hapi
- Click "AI Chat"
- Paste prompt:
Translate this transcript to Spanish, preserving all timestamps. Output as SRT format.- Repeat for French, German
-
Upload to YouTube:
- YouTube Studio → Video → Subtitles
- Upload
english.srt→ Language: English - Upload
spanish.srt→ Language: Spanish - Upload
french.srt→ Language: French - Upload
german.srt→ Language: German
-
Viewers choose language in YouTube player settings
Time: ~10 minutes for 4 languages (vs hours of manual translation)
Workflow 2: Video → Blog Post Automation
Goal: Turn 30-minute video into 1,500-word blog post in 5 minutes.
Steps:
-
Transcribe video (Hapi)
-
Open AI Chat in Hapi
-
Paste repurposing prompt:
Convert this video transcript into a blog post. Format: - SEO-optimized title (H1) - 2-3 sentence intro - 5 main sections (H2 headers) - Bullet points for key takeaways - Conclusion with call-to-action Tone: Professional but conversational Length: ~1,500 words -
Hapi AI generates blog post using Qwen3 local LLM
-
Copy output → paste into WordPress/Notion
-
Add images from video screenshots
-
Publish
Result: One video creates 3+ content pieces:
- YouTube video (original)
- Blog post (transcript repurposed)
- Social media quotes (extract key moments from transcript)
Time saved: 90% vs writing from scratch (5 min vs 60 min)
Workflow 3: Searchable Video Library
Problem: 200+ training videos, can't find specific information without watching everything.
Solution: Transcribe all videos, build searchable archive.
Setup:
-
Batch transcribe all videos:
- Open Hapi
- Drag 200 video files into Hapi window
- Click "Transcribe All"
- Processing runs overnight
-
Export as searchable format:
- Select all transcripts
- Export as JSON (includes timestamps + metadata)
-
Build search interface (optional, for teams):
{ "video_id": "training-001", "title": "Onboarding: Customer Support Tools", "transcript": [ {"timestamp": "00:02:15", "text": "To escalate a ticket..."}, {"timestamp": "00:05:40", "text": "Use the priority queue for..."} ] } -
Search using Hapi's built-in search:
- Open Hapi → Transcripts
- Search bar: "priority queue"
- Results show all videos mentioning that term
- Click result → jump to exact timestamp
ROI: Support team finds answers in 10 seconds vs 20 minutes of video scrubbing.
Workflow 4: Podcast Video Repurposing
Goal: Extract podcast clips for social media (Instagram Reels, TikTok, YouTube Shorts).
Steps:
-
Transcribe full podcast episode (90 min)
-
AI extracts highlights:
Find the 5 most engaging 60-second segments from this podcast. Criteria: - Self-contained topic (no context needed) - Strong opening hook - Quotable conclusion Output: - Timestamp range - Topic summary - Why it's shareable -
Hapi AI returns:
Clip 1: 00:12:30 - 00:13:45 Topic: "Why most startups fail at SEO" Hook: "SEO is not about keywords. It's about user intent." Shareable: Contrarian take, actionable insight Clip 2: 00:34:20 - 00:35:10 Topic: "How to validate a business idea in 48 hours" Hook: "You don't need to build the product first." Shareable: Tactical step-by-step process -
Export SRT for each clip (e.g.,
00:12:30 - 00:13:45) -
Edit in Final Cut/Premiere:
- Cut video clip to timestamp range
- Import corresponding SRT
- Style captions (large font, emoji for emphasis)
- Export as 1080×1920 vertical video
-
Upload to social media with auto-generated captions
Time: 10 clips in 30 minutes vs 4 hours of manual scrubbing + editing.
Troubleshooting
"Hapi can't open this video file"
Cause: Unsupported codec or DRM-protected video.
Fix:
-
Convert video using HandBrake (free):
brew install --cask handbrake- Open HandBrake
- Drag video file
- Preset: "Fast 1080p30"
- Start encode
- Transcribe converted MP4
-
Check for DRM: Purchased videos (iTunes, Amazon Prime) are DRM-locked and can't be transcribed legally.
Subtitles Out of Sync with Video
Cause: Video was edited after transcription (cuts added/removed).
Fix:
- Re-transcribe final edited video
- Or shift all subtitles:
- Open SRT in Aegisub (free subtitle editor)
- Select all cues
- Shift → +2.5 seconds (or -2.5 for delay)
- Save adjusted SRT
SRT File Shows Garbled Text in Video Editor
Cause: Character encoding mismatch (UTF-8 vs ANSI).
Fix:
- Open SRT in TextEdit (Mac)
- Format → Make Plain Text
- File → Save As
- Encoding: UTF-8 (bottom dropdown)
- Re-import to video editor
YouTube Rejects My SRT Upload
Common errors:
Error: "Invalid timecode format"
Fix: Ensure SRT uses commas (,), not periods (.):
00:00:02,500 --> 00:00:05,400 ✅ Correct
00:00:02.500 --> 00:00:05.400 ❌ Wrong (VTT format)
Error: "Overlapping cues" Fix: Check for cues that overlap:
1
00:00:02,000 --> 00:00:05,000
First subtitle.
2
00:00:04,000 --> 00:00:06,000 ❌ Overlaps with cue 1
Second subtitle.
Remove overlap by adjusting end timecode of first cue to 00:00:03,999.
Hapi Transcription Takes Too Long
Expected times (M1 Mac, WhisperKit Large v3):
- 10-minute video: ~10 minutes
- 60-minute video: ~60 minutes
- 120-minute video: ~2 hours
If significantly slower:
- Check available RAM (close other apps)
- Restart Hapi (clears model cache)
- Try Medium model (faster, 93-96% accuracy):
- Hapi Settings → Models → WhisperKit Medium
- Upgrade to M1 Pro/Max (2× faster transcription)
Privacy & Security
What Data Leaves Your Mac?
| Tool | Data Sent to Cloud |
|---|---|
| Hapi | Nothing (100% local) |
| MacWhisper | Nothing (100% local) |
| Descript | Video + audio uploaded to Descript servers |
| Rev | Video + audio uploaded to Rev servers |
| YouTube auto-captions | Video uploaded to Google servers |
GDPR/HIPAA Compliance
Hapi & MacWhisper: Compliant by default (no data transmission).
Descript: Requires Business Associate Agreement (BAA) for HIPAA. GDPR-compliant with DPA (Data Processing Agreement).
Rev: Offers HIPAA-compliant service ($1.75/min vs $1.50 standard). GDPR-compliant.
YouTube: Not HIPAA-compliant (no BAA available). GDPR-compliant for non-healthcare use.
Confidential Video Best Practices
- Never upload to cloud services (use Hapi or MacWhisper)
- Store locally on encrypted Mac drive (FileVault enabled)
- Delete cloud copies after download (if you used Descript/Rev)
- Use local AI for repurposing (Hapi's Qwen3 vs ChatGPT/Claude)
Which Video Transcription Tool Should You Choose?
Use Hapi if you:
- Want free unlimited transcription
- Need highest accuracy (95-99%)
- Value privacy (100% local processing)
- Want AI features (summaries, repurposing, translation)
- Transcribe Mac screen recordings, Zoom calls, YouTube videos
- Need SRT/VTT subtitle export
- Use Mac (M1/M2/M3 recommended)
Use MacWhisper if you:
- Prefer App Store purchases ($30 one-time)
- Don't need speaker labels or AI tools
- Want minimal interface (single window)
- Offline-only use case
Use Descript if you:
- Edit videos frequently (text-based editing saves time)
- Need collaboration features (team comments)
- Want overdub voice cloning
- Have budget for subscription ($12-24/mo)
- Don't mind cloud upload (faster processing)
Use Rev Human if you:
- Need 99% accuracy for legal/medical context
- Can afford $1.50/minute ($90/hour)
- Tight deadline (12-hour turnaround)
- Heavy accents or complex terminology
Use YouTube auto-captions if you:
- Need rough draft in 5 minutes
- Video already on YouTube
- Don't care about 70-80% accuracy
- Free is only option
Get Started
For most Mac users who want free, accurate, private video transcription with subtitle export, Hapi is the best choice.
Related Posts
How to Transcribe Lectures for Students: Free Tools, Study Workflows, Legal Tips (2026)
Complete guide to lecture transcription for students. Free transcription tools (Hapi, Otter.ai), recording setup, study workflows, professor consent, accessibility accommodations, and note-taking integration.
Rev.com Alternative: Free vs $1.50/Min Transcription Comparison (2026)
Compare Rev.com ($1.50/min human, $0.25/min AI) with free alternatives (Hapi, Otter.ai, Descript). Cost analysis, accuracy benchmarks, turnaround times, and when Rev.com is worth paying for.
How to Automate Meeting Minutes with AI: Complete Guide (2026)
Step-by-step guide to automating meeting minutes with AI. Compare cloud tools (ChatGPT, Otter.ai) vs local solutions (Hapi) for privacy-focused workflows.