If you've ever tried to extract text from a scanned Arabic document, you already know the pain. Most OCR tooling is built English-first. Arabic adds three problems on top:
- Right-to-left (RTL) text that breaks naive layout assumptions.
- Connected letters (ligatures) — the same letter changes shape depending on its position in the word.
- **Diacritics and a different numeral…