Productivity Workflows16 min read·

How to Transcribe Video Files on Mac: Complete Guide (SRT, VTT, Subtitles)

Step-by-step guide to transcribing video files on Mac. Learn to extract audio, generate subtitles (SRT/VTT), transcribe YouTube videos, and compare top tools (Hapi, MacWhisper, Descript, Rev).

video transcriptionsubtitle generationSRT filesVTT captionsMac workflowYouTube transcription

Quick Answer: Best Video Transcription Methods for Mac

MethodCostAccuracySubtitle ExportBest For
HapiFree95-99%SRT, VTTLocal files, privacy, unlimited use
MacWhisper$3090-95%SRT, VTTOne-time purchase, offline
Descript$12-24/mo90-95%SRT, VTTVideo editing + transcription
Rev$1.50/min99% (human)SRT, VTTHuman accuracy, tight deadlines
YouTube auto-captionsFree70-80%SRT (auto)Quick previews, accessibility

Recommended: Hapi for most users — highest accuracy, free, works offline, no watermarks or upload limits.

Why Transcribe Video Files?

1. Subtitle/Caption Generation

Accessibility: YouTube, Vimeo, social media require captions for deaf/hard-of-hearing viewers (ADA/WCAG compliance).

SEO: Search engines index video captions, improving discoverability. YouTube videos with captions rank 7% higher in search results.

International reach: Translate captions to multiple languages (Hapi's AI can translate transcripts to 50+ languages).

2. Content Repurposing

Turn video content into:

  • Blog posts (transcript → article)
  • Social media quotes (extract key moments)
  • Email newsletters (summarize main points)
  • Podcast episodes (extract audio track)
  • Documentation (meeting recordings → written minutes)

3. Searchable Video Archive

Problem: Finding specific moments in 100+ hours of footage.

Solution: Transcribe all videos, search transcripts for keywords, jump directly to timestamp.

Example: "Find all mentions of 'quarterly budget' in Q4 2025 board meetings" — instant results vs scrubbing through hours of video.

4. Video Editing Workflow

Editors use transcripts to:

  • Find exact quotes without re-watching
  • Create highlight reels (search for "best moments")
  • Write b-roll shot lists (identify visual needs)
  • Collaborate with clients (share transcript for approval before editing)

Video Transcription Methods on Mac

Method 1: Hapi (Local, Private, Free)

Best for: Unlimited transcription, privacy-conscious users, SRT/VTT export, Mac-native workflow.

Step 1: Prepare Video File

Supported formats: MP4, MOV, M4V, AVI, MKV, WebM (all common video formats)

Audio extraction: Hapi auto-extracts audio track from video — no manual conversion needed.

File location: Drag video file directly to Hapi, or File → Open Video.

Step 2: Transcribe

  1. Open Hapi (menu bar icon or ⌘Space → "Hapi")
  2. Drag video file into Hapi window
  3. Hapi shows video metadata (duration, codec, resolution)
  4. Click "Transcribe"
  5. Processing starts (WhisperKit engine, 100% local)

Processing time: ~1× real-time (10-minute video = 10 minutes processing on M1 Mac)

Accuracy: 95-99% with WhisperKit Large v3 model

Step 3: Review & Edit

  1. Transcript appears with timestamps
  2. Speaker labels if multiple voices detected (auto-diarization)
  3. Click any timestamp to preview video at that moment
  4. Edit text directly in Hapi (corrections persist)

Step 4: Generate Subtitles

SRT format (universal compatibility):

1
00:00:02,150 --> 00:00:05,400
Welcome to our Q4 product demo.

2
00:00:05,600 --> 00:00:09,200
Today we'll cover three new features.

VTT format (web styling):

WEBVTT

00:00:02.150 --> 00:00:05.400
<c.speaker1>Welcome to our Q4 product demo.</c>

00:00:05.600 --> 00:00:09.200 align:center
<i>Today we'll cover three new features.</i>

Export steps:

  1. Click "Export" button
  2. Select format: SRT or VTT
  3. Choose output location
  4. File ready for video editor (Final Cut Pro, Premiere, DaVinci Resolve)

Step 5: Add Subtitles to Video

Final Cut Pro:

  1. Import SRT file: File → Import → Captions
  2. Drag SRT to timeline above video clip
  3. Captions appear burned-in or as separate track
  4. Style in Captions inspector (font, color, position)

DaVinci Resolve:

  1. Right-click timeline → Subtitles → Import Subtitle
  2. Select SRT/VTT file
  3. Captions auto-sync to video
  4. Edit in Subtitles panel

iMovie (limited support):

  1. iMovie doesn't support SRT import
  2. Workaround: Use Hapi's TXT export → copy/paste into iMovie text overlays manually

Hapi Unique Features for Video

Batch processing — transcribe 10+ videos overnight, export all as SRT batch ✅ Timestamp accuracy — WhisperKit generates frame-accurate timecodes (±100ms) ✅ Multi-language detection — auto-switches between languages in same video (EN ↔ ES) ✅ No file size limits — transcribe 4K 2-hour videos without upload restrictions ✅ 100% local — video never leaves your Mac (GDPR/HIPAA compliant) ✅ AI content generation — "Create YouTube description from this transcript" (Qwen3 local LLM)

Method 2: MacWhisper (Offline, One-Time Purchase)

Best for: Users who want offline transcription without subscription, similar to Hapi.

How to Use MacWhisper

Setup:

  1. Purchase MacWhisper Pro ($30 one-time, Mac App Store)
  2. Download Whisper model (Large v2 recommended, 3GB)
  3. Grant microphone + files permissions

Transcribe:

  1. Drag video file to MacWhisper dock icon
  2. Select language (or "Auto-detect")
  3. Click "Transcribe"
  4. Export as SRT, VTT, TXT, DOCX

Processing: Slightly slower than Hapi (~1.2× real-time on M1)

Accuracy: 90-95% (using Whisper Large v2, older model than Hapi's v3)

MacWhisper vs Hapi

FeatureMacWhisperHapi
Cost$30 one-timeFree
Accuracy90-95% (Whisper v2)95-99% (Whisper v3)
Speaker labels❌ No✅ Yes (auto-diarization)
AI features❌ No✅ Summaries, repurposing, translation
Batch export❌ No✅ Yes
UpdatesApp Store onlyContinuous (built-in updater)
InterfaceSimple single-windowMulti-transcript management

When to choose MacWhisper: You prefer App Store apps with minimal features, don't need speaker labels or AI tools.

Method 3: Descript (Cloud, Video Editor + Transcription)

Best for: Video creators who edit and transcribe in same tool, team collaboration.

How to Use Descript

Setup:

  1. Sign up at descript.com
  2. Download Mac app
  3. Create project

Transcribe + Edit Workflow:

  1. Upload video to Descript (cloud storage)
  2. Auto-transcription starts (5× real-time, processed in cloud)
  3. Edit video by editing transcript:
    • Delete sentence in transcript → video cuts that section
    • Rearrange text → video reorders clips
    • Add filler word removal ("um", "uh") → video auto-cuts pauses
  4. Export video + SRT/VTT subtitles

Unique to Descript:

  • Overdub — AI voice clone to fix mistakes ("Actually, Q3 revenue..." → re-record just that word)
  • Studio Sound — one-click audio enhancement (removes echo, background noise)
  • Screen recording — record screen + webcam, auto-transcribe
  • Collaboration — team members comment on specific transcript lines

Pricing

  • Free: 1 hour transcription/month, watermarked exports
  • Creator: $12/month — 10 hours/month, no watermarks
  • Pro: $24/month — 30 hours/month, API access

When to choose Descript: You edit videos frequently and want text-based editing workflow. Not ideal if you only need transcription (Hapi is free and private).

Method 4: Rev (Human Transcription, Highest Accuracy)

Best for: Legal depositions, medical consultations, critical accuracy requirements.

How to Use Rev

Upload:

  1. Go to rev.com
  2. Upload video file (or paste YouTube URL)
  3. Select "Transcription" or "Captions"

Turnaround:

  • Automated: ~5 minutes (AI transcription, $0.25/min)
  • Human: 12 hours typical (99% accuracy, $1.50/min)

Delivery:

  • Email notification when ready
  • Download TXT, SRT, VTT, DOCX
  • Timestamps accurate to ±0.5 seconds

Accuracy comparison:

  • Rev AI: 85-90% (same as Descript)
  • Rev Human: 99% (professional transcribers)

When to choose Rev Human:

  • Legal/medical context (admissible in court)
  • Heavy accents or technical jargon
  • Absolute accuracy required (no room for errors)

Cost example: 60-minute video = $90 (human) vs $15 (AI) vs $0 (Hapi local)

Method 5: YouTube Auto-Captions (Free, Lowest Accuracy)

Best for: Quick previews, rough drafts, videos already on YouTube.

How to Use YouTube Auto-Captions

Download auto-generated captions:

  1. Upload video to YouTube (unlisted if private)
  2. Wait 10-30 minutes for auto-captions
  3. Download SRT:
    • Open YouTube Studio
    • Video → Subtitles
    • Click 3 dots → Download → .srt

Or use yt-dlp (faster, no upload):

# Download YouTube video + auto-captions
brew install yt-dlp
yt-dlp --write-auto-subs --sub-lang en --skip-download [VIDEO_URL]

# Result: video_id.en.vtt file

Accuracy: 70-80% (struggles with:)

  • Technical terms (YouTube misses industry jargon)
  • Multiple speakers talking simultaneously
  • Background music or noise
  • Non-native English accents

Ad limitation: YouTube auto-captions include "[Music]" markers where ads played during playback, contaminating transcript.

When to use: You already have the video on YouTube, need a rough draft quickly, don't mind low accuracy.

YouTube Video Transcription (Private)

Problem: YouTube auto-captions are inaccurate (70-80%) and require uploading video publicly/unlisted.

Better solution: Download video, transcribe locally with Hapi.

Download YouTube Video Locally

Using yt-dlp (best quality, fastest):

# Install yt-dlp
brew install yt-dlp

# Download video (best quality)
yt-dlp -f "bestvideo+bestaudio" [YOUTUBE_URL]

# Download audio only (smaller file)
yt-dlp -f "bestaudio" --extract-audio --audio-format mp3 [YOUTUBE_URL]

# Download with metadata (title, uploader, date)
yt-dlp --add-metadata [YOUTUBE_URL]

# Result: video saved to current directory

Using 4K Video Downloader (GUI app):

  1. Download 4kdownload.com
  2. Copy YouTube URL
  3. Click "Paste Link"
  4. Select quality (1080p, 4K, etc.)
  5. Download completes

Transcribe Downloaded YouTube Video

  1. Open Hapi
  2. Drag downloaded MP4/MKV file
  3. Click "Transcribe"
  4. Hapi processes locally (95-99% accuracy)
  5. Export SRT/VTT with accurate timestamps

Comparison:

MethodAccuracyPrivacyCost
YouTube auto-captions70-80%Video uploaded to GoogleFree
Hapi local transcription95-99%Video stays on your MacFree

Why local is better:

  • No upload time (transcribe 2GB video without internet)
  • No ad markers in transcript
  • Speaker labels (YouTube doesn't label speakers)
  • Privacy (confidential videos never uploaded)
  • No usage limits (YouTube caps auto-caption generation)

Subtitle Format Comparison

SRT (SubRip Text)

Universal standard — works everywhere (YouTube, Vimeo, VLC, Final Cut, Premiere).

Format:

1
00:00:00,500 --> 00:00:02,500
This is the first subtitle.

2
00:00:03,000 --> 00:00:06,000
This is the second subtitle.
It can have multiple lines.

Structure:

  1. Cue number (1, 2, 3...)
  2. Timecode (hours:minutes:seconds,milliseconds)
  3. Text (one or more lines)
  4. Blank line separator

Limitations:

  • No styling (all text looks same)
  • No positioning (always bottom-center)
  • No color, font, or size control

When to use: General compatibility, upload to social media, work with all video editors.

VTT (WebVTT)

Web standard — HTML5 video player format with styling support.

Format:

WEBVTT

00:00:00.500 --> 00:00:02.500
This is the first subtitle.

00:00:03.000 --> 00:00:06.000 align:center
<c.speaker1>Speaker 1:</c> This has custom styling.

NOTE This is a comment (not displayed)

00:00:07.000 --> 00:00:10.000 line:90% position:50%
<i>Italic text</i> and <b>bold text</b>

Advanced features:

  • Styling: <i>, <b>, <u>, <c.className> for CSS
  • Positioning: align:start|center|end, position:X%, line:Y%
  • Cue settings: Control vertical/horizontal placement
  • Comments: NOTE lines for internal notes
  • Metadata: Custom data embedded in cues

When to use:

  • Web video players (Vimeo, custom HTML5 players)
  • Need colored speaker labels (e.g., Speaker 1 = blue, Speaker 2 = red)
  • Multi-language with styled subtitles
  • Accessibility requirements (position captions to avoid obscuring sign language interpreter)

Which Format to Export?

ScenarioFormatWhy
Upload to YouTubeSRTUniversal compatibility, no styling needed
Upload to VimeoVTTVimeo supports VTT styling
Social media (Instagram, TikTok)SRTPlatforms auto-convert, SRT works everywhere
Final Cut ProSRTNative support, easier to style in FCP
Web embeddingVTTHTML5 <track> tag, custom CSS styling
Accessibility complianceVTTWCAG requires positioning + color contrast

Advanced Workflows

Workflow 1: Multi-Language Subtitles for YouTube

Goal: Upload video with English, Spanish, French, German subtitles.

Steps:

  1. Transcribe in original language (English):

    • Open video in Hapi
    • Transcribe → Export SRT (English)
  2. Translate transcript (using Hapi AI):

    • Open transcript in Hapi
    • Click "AI Chat"
    • Paste prompt:
    Translate this transcript to Spanish, preserving all timestamps.
    Output as SRT format.
    
    • Repeat for French, German
  3. Upload to YouTube:

    • YouTube Studio → Video → Subtitles
    • Upload english.srt → Language: English
    • Upload spanish.srt → Language: Spanish
    • Upload french.srt → Language: French
    • Upload german.srt → Language: German
  4. Viewers choose language in YouTube player settings

Time: ~10 minutes for 4 languages (vs hours of manual translation)

Workflow 2: Video → Blog Post Automation

Goal: Turn 30-minute video into 1,500-word blog post in 5 minutes.

Steps:

  1. Transcribe video (Hapi)

  2. Open AI Chat in Hapi

  3. Paste repurposing prompt:

    Convert this video transcript into a blog post.
    
    Format:
    - SEO-optimized title (H1)
    - 2-3 sentence intro
    - 5 main sections (H2 headers)
    - Bullet points for key takeaways
    - Conclusion with call-to-action
    
    Tone: Professional but conversational
    Length: ~1,500 words
    
  4. Hapi AI generates blog post using Qwen3 local LLM

  5. Copy output → paste into WordPress/Notion

  6. Add images from video screenshots

  7. Publish

Result: One video creates 3+ content pieces:

  • YouTube video (original)
  • Blog post (transcript repurposed)
  • Social media quotes (extract key moments from transcript)

Time saved: 90% vs writing from scratch (5 min vs 60 min)

Workflow 3: Searchable Video Library

Problem: 200+ training videos, can't find specific information without watching everything.

Solution: Transcribe all videos, build searchable archive.

Setup:

  1. Batch transcribe all videos:

    • Open Hapi
    • Drag 200 video files into Hapi window
    • Click "Transcribe All"
    • Processing runs overnight
  2. Export as searchable format:

    • Select all transcripts
    • Export as JSON (includes timestamps + metadata)
  3. Build search interface (optional, for teams):

    {
      "video_id": "training-001",
      "title": "Onboarding: Customer Support Tools",
      "transcript": [
        {"timestamp": "00:02:15", "text": "To escalate a ticket..."},
        {"timestamp": "00:05:40", "text": "Use the priority queue for..."}
      ]
    }
    
  4. Search using Hapi's built-in search:

    • Open Hapi → Transcripts
    • Search bar: "priority queue"
    • Results show all videos mentioning that term
    • Click result → jump to exact timestamp

ROI: Support team finds answers in 10 seconds vs 20 minutes of video scrubbing.

Workflow 4: Podcast Video Repurposing

Goal: Extract podcast clips for social media (Instagram Reels, TikTok, YouTube Shorts).

Steps:

  1. Transcribe full podcast episode (90 min)

  2. AI extracts highlights:

    Find the 5 most engaging 60-second segments from this podcast.
    
    Criteria:
    - Self-contained topic (no context needed)
    - Strong opening hook
    - Quotable conclusion
    
    Output:
    - Timestamp range
    - Topic summary
    - Why it's shareable
    
  3. Hapi AI returns:

    Clip 1: 00:12:30 - 00:13:45
    Topic: "Why most startups fail at SEO"
    Hook: "SEO is not about keywords. It's about user intent."
    Shareable: Contrarian take, actionable insight
    
    Clip 2: 00:34:20 - 00:35:10
    Topic: "How to validate a business idea in 48 hours"
    Hook: "You don't need to build the product first."
    Shareable: Tactical step-by-step process
    
  4. Export SRT for each clip (e.g., 00:12:30 - 00:13:45)

  5. Edit in Final Cut/Premiere:

    • Cut video clip to timestamp range
    • Import corresponding SRT
    • Style captions (large font, emoji for emphasis)
    • Export as 1080×1920 vertical video
  6. Upload to social media with auto-generated captions

Time: 10 clips in 30 minutes vs 4 hours of manual scrubbing + editing.

Troubleshooting

"Hapi can't open this video file"

Cause: Unsupported codec or DRM-protected video.

Fix:

  1. Convert video using HandBrake (free):

    brew install --cask handbrake
    
    • Open HandBrake
    • Drag video file
    • Preset: "Fast 1080p30"
    • Start encode
    • Transcribe converted MP4
  2. Check for DRM: Purchased videos (iTunes, Amazon Prime) are DRM-locked and can't be transcribed legally.

Subtitles Out of Sync with Video

Cause: Video was edited after transcription (cuts added/removed).

Fix:

  1. Re-transcribe final edited video
  2. Or shift all subtitles:
    • Open SRT in Aegisub (free subtitle editor)
    • Select all cues
    • Shift → +2.5 seconds (or -2.5 for delay)
    • Save adjusted SRT

SRT File Shows Garbled Text in Video Editor

Cause: Character encoding mismatch (UTF-8 vs ANSI).

Fix:

  1. Open SRT in TextEdit (Mac)
  2. Format → Make Plain Text
  3. File → Save As
  4. Encoding: UTF-8 (bottom dropdown)
  5. Re-import to video editor

YouTube Rejects My SRT Upload

Common errors:

Error: "Invalid timecode format" Fix: Ensure SRT uses commas (,), not periods (.):

00:00:02,500 --> 00:00:05,400  ✅ Correct
00:00:02.500 --> 00:00:05.400  ❌ Wrong (VTT format)

Error: "Overlapping cues" Fix: Check for cues that overlap:

1
00:00:02,000 --> 00:00:05,000
First subtitle.

2
00:00:04,000 --> 00:00:06,000  ❌ Overlaps with cue 1
Second subtitle.

Remove overlap by adjusting end timecode of first cue to 00:00:03,999.

Hapi Transcription Takes Too Long

Expected times (M1 Mac, WhisperKit Large v3):

  • 10-minute video: ~10 minutes
  • 60-minute video: ~60 minutes
  • 120-minute video: ~2 hours

If significantly slower:

  1. Check available RAM (close other apps)
  2. Restart Hapi (clears model cache)
  3. Try Medium model (faster, 93-96% accuracy):
    • Hapi Settings → Models → WhisperKit Medium
  4. Upgrade to M1 Pro/Max (2× faster transcription)

Privacy & Security

What Data Leaves Your Mac?

ToolData Sent to Cloud
HapiNothing (100% local)
MacWhisperNothing (100% local)
DescriptVideo + audio uploaded to Descript servers
RevVideo + audio uploaded to Rev servers
YouTube auto-captionsVideo uploaded to Google servers

GDPR/HIPAA Compliance

Hapi & MacWhisper: Compliant by default (no data transmission).

Descript: Requires Business Associate Agreement (BAA) for HIPAA. GDPR-compliant with DPA (Data Processing Agreement).

Rev: Offers HIPAA-compliant service ($1.75/min vs $1.50 standard). GDPR-compliant.

YouTube: Not HIPAA-compliant (no BAA available). GDPR-compliant for non-healthcare use.

Confidential Video Best Practices

  1. Never upload to cloud services (use Hapi or MacWhisper)
  2. Store locally on encrypted Mac drive (FileVault enabled)
  3. Delete cloud copies after download (if you used Descript/Rev)
  4. Use local AI for repurposing (Hapi's Qwen3 vs ChatGPT/Claude)

Which Video Transcription Tool Should You Choose?

Use Hapi if you:

  • Want free unlimited transcription
  • Need highest accuracy (95-99%)
  • Value privacy (100% local processing)
  • Want AI features (summaries, repurposing, translation)
  • Transcribe Mac screen recordings, Zoom calls, YouTube videos
  • Need SRT/VTT subtitle export
  • Use Mac (M1/M2/M3 recommended)

Use MacWhisper if you:

  • Prefer App Store purchases ($30 one-time)
  • Don't need speaker labels or AI tools
  • Want minimal interface (single window)
  • Offline-only use case

Use Descript if you:

  • Edit videos frequently (text-based editing saves time)
  • Need collaboration features (team comments)
  • Want overdub voice cloning
  • Have budget for subscription ($12-24/mo)
  • Don't mind cloud upload (faster processing)

Use Rev Human if you:

  • Need 99% accuracy for legal/medical context
  • Can afford $1.50/minute ($90/hour)
  • Tight deadline (12-hour turnaround)
  • Heavy accents or complex terminology

Use YouTube auto-captions if you:

  • Need rough draft in 5 minutes
  • Video already on YouTube
  • Don't care about 70-80% accuracy
  • Free is only option

Get Started

For most Mac users who want free, accurate, private video transcription with subtitle export, Hapi is the best choice.

Dictate 3x faster than typing.

Works in any app.

Download Hapi — Free

Transcribe anything on your Mac.

100% local. No cloud. No subscription.

Download Hapi — Free

Related Posts