Hapi

Arabic Voice to Text on Mac: MSA, Dialects, and Right-to-Left Workflow

How to transcribe Modern Standard Arabic and regional dialects on a Mac, locally and privately. RTL handling, diacritic policy, and dialect coverage.

6 min read·Voice notes

Arabic is the world's fifth-most-spoken language, with roughly 420 million speakers across 22 countries and a long-tail of diaspora communities. It is also one of the most challenging languages for automatic speech recognition because of three structural features: a writing system that omits short vowels, a wide gulf between formal Modern Standard Arabic and the regional spoken dialects, and bidirectional text that breaks naive editor pipelines.

This guide walks through what realistically works for Arabic transcription on macOS in 2026, with a focus on private, on-device tools.

The Three Arabic Problems Speech Recognition Has to Solve

1. Diglossia: MSA vs. dialects

Native Arabic speakers grow up bilingual within their own language. Modern Standard Arabic (المعيارية / فصحى) is the language of news, formal speeches, religious sermons, books, and most published media. Dialects (العامية) — Egyptian, Levantine, Gulf, Iraqi, Maghrebi, Sudanese — are what people actually speak at home, in WhatsApp voice notes, and in conversations between friends.

Most training corpora skew heavily toward MSA because broadcast and lecture content is plentiful and well-aligned. The practical result:

VarietyRealistic accuracy on Mac modelsUse case
Modern Standard ArabicStrong — comparable to other major languagesNews, lectures, formal interviews
EgyptianGood — large media corpusFilms, TV, daily speech in Egypt
Levantine (Syrian/Lebanese/Palestinian/Jordanian)Good — significant media presenceDaily speech in the Levant
Gulf (Khaleeji)Workable, with errors on regional vocabularyDaily speech in GCC countries
Maghrebi (Moroccan/Algerian/Tunisian)Weaker — large French/Berber loan-vocabularyDaily speech in North Africa
IraqiWeaker — smaller training footprintDaily speech in Iraq

If you record dialectal voice notes and need names or place references to land correctly, expect to do a quick edit pass.

2. Diacritics and vocalization

Modern Arabic writing usually omits short vowels (the marks above and below letters that disambiguate pronunciation). Speech-to-text models follow that convention: output is unvocalized by default. This is correct for almost every practical use — emails, articles, transcripts, captions — and matches how native readers write.

You only need vocalized (مَشْكُول) output for specific use cases: Quranic recitation, classical poetry, second-language learning materials, or text-to-speech inputs that need pronunciation hints. Those flows typically run a separate text-to-text diacritization model after the speech step.

3. Right-to-left rendering

Arabic is written right-to-left, but numbers, dates, code, English brand names, and URLs run left-to-right inside Arabic text. macOS handles this well in modern apps via Unicode bidirectional algorithm rules. In practice:

  • Works cleanly: Pages, Notes, Mail, Safari forms, modern editors (VS Code with the right config)
  • Quirky: older Cocoa apps, some terminal contexts, embedded chat widgets that hard-code direction

If you dictate Arabic into a native macOS text field and it renders correctly there, every other normal app will too.

How Local Arabic Dictation Works on Apple Silicon

Two on-device approaches dominate on Mac in 2026:

  1. Apple's built-in dictation — uses Apple's on-device speech models. Arabic is supported on Apple Silicon Macs and runs locally for offline-supported configurations.
  2. Third-party local apps — Hapi runs Parakeet (Nvidia's open multilingual model) and WhisperKit-derived models on the Neural Engine. Audio is captured by a menu-bar app, transcribed locally, and pasted at the cursor.

Both keep audio on the device. The differences show up in workflow:

DimensionApple DictationHapi (local)
ActivationFn-key shortcut, requires text fieldGlobal hotkey, works system-wide
Auto-paste anywhereNoYes
Filler-word cleanup ("يعني", "مم")NoYes (heuristic)
Code-switching EN/ARManual language toggleAutomatic per segment
Dialect handlingMSA-leaningMultilingual model handles dialects
CostFree, built-inFree, separate install

A Realistic Arabic Transcription Workflow

A workflow that holds up across the kinds of recordings real users have:

  1. Press the hotkey, dictate naturally in your dialect. Don't try to "switch to MSA" mid-thought — the model handles dialect input better than forced-MSA hybrid speech.
  2. Let the engine output unvocalized text. Don't ask for harakat unless you actually need them.
  3. Do a quick read-through pass. Names, brands, and code-switched English need most of your attention.
  4. For long-form recordings (interviews, lectures), record the full file first, then batch-transcribe. Streaming dictation is for short bursts.

Common Failure Modes and How to Recover

  • Numbers spelled in Arabic words instead of Hindu-Arabic digits. Some models output "خمسة وعشرون" instead of "25". A simple find/replace dictionary or post-processing rule fixes this.
  • English brand names transliterated into Arabic letters. "أبل" instead of "Apple." Multilingual segmentation models avoid this; older monolingual Arabic models do not.
  • Lost question marks at the end of dialectal questions. Arabic intonation for questions is subtler than English; punctuation post-processing helps.
  • Hamza placement on alif. أ vs إ vs ا are an active failure mode for most models. Hand correction is faster than model retraining for individual users.

When You Should Stay Off the Cloud

Arabic content has a higher-than-average privacy stake for many users:

  • Journalists working on stories about state actors
  • Activists, lawyers, or NGO workers in restrictive jurisdictions
  • Family or community recordings shared in voice notes
  • Religious or political discussions whose context may be misread by automated systems

For all of these, a fully local pipeline — audio captured on the Mac, transcribed on the Neural Engine, stored in a local file — has a meaningful threat-model advantage over cloud transcription. Your audio does not transit a third-party sub-processor and is not retained on infrastructure subject to a foreign jurisdiction.

Bottom Line

On-device Arabic transcription on a modern Mac is good enough today for daily use across MSA and the major dialects. It will not replace a human transcriber for Maghrebi colloquial or for vocalized classical text, but it will save serious time on interviews, voice notes, and meeting summaries — without the audio ever leaving your machine.

Related