Studio feature

Captions in 100+ languages. Right script. Right direction.

Most screen recorders cap captions at 5-10 European languages. El Ojo Studio uses Whisper Large v3 — the same model OpenAI ships with ChatGPT — plus Llama 3.3 70B for correction. RTL for Urdu, Arabic, Hebrew. CJK and Devanagari fonts. Roman transliteration if your audience reads English but speaks Hindi.

How auto-captions work

Hit “Generate transcript” in Studio and three things happen in sequence:

Whisper transcribes

Audio extracted at 16 kHz mono, sent to Groq's Whisper Large v3 endpoint. Returns word-level timestamps, per-word confidence, and detected language.

LLM corrects

Llama 3.3 70B reads each segment and fixes punctuation, proper nouns, recognition errors. Especially helpful for Urdu, Hindi, Pashto where Whisper sometimes mis-segments. Per-segment, deterministic.

Captions render

Pick a style (Karaoke / Hormozi / MrBeast / Minimal). Studio renders the overlay with the right font and direction. Auto-emoji on key words. Filters out words you've deleted in the transcript editor.

Language coverage

Whisper Large v3 supports 99 languages out of the box. El Ojo adds language hints for the ones most users ask about:

Urdu, Arabic, Hebrew, Farsi

Captions render with direction: rtl and a font stack tuned for Naskh / Nastaliq scripts. Word order stays correct — no “reversed letters” bug.

Chinese, Japanese, Korean

Automatic font fallback to "Noto Sans CJK" so glyphs render correctly. Per-character timing for Karaoke style.

Hindi, Marathi, Bengali, Tamil

Devanagari and other Indic font stacks. Word-level timing with diacritic preservation. Optional Roman transliteration for English-script audiences.

Hinglish, Roman Urdu

Toggle on “Transliterate to Roman” and Llama converts the Devanagari / Arabic-script transcript to Roman. Useful for diaspora audiences and accessibility.

Spanish, French, German, Italian, Portuguese

Whisper Large v3 is most accurate on European languages. Captions ship with the right accented characters and punctuation style.

90+ more languages

Auto-detect handles the rest. From Vietnamese to Swahili to Welsh — if Whisper supports it, El Ojo captions it.

Caption styles

Four built-in styles. Pick once, reuse forever. Each style scales with the player — embedded player, fullscreen, or the editor preview all render at the right size.

Word-by-word fill

Each word fills in as it's spoken — reads like a karaoke prompter. Best for music or fast-paced talks.

Bold yellow highlight

Black bold text with a yellow highlight on the current word. Popular on TikTok and Instagram Reels. Aggressive attention-grabber.

Huge centered text

Large white text, centered, drop shadow. The classic YouTube creator look. Reads even at thumbnail size.

Clean and understated

Small white text at the bottom. Doesn't compete with your content. Best for technical or formal recordings.

Frequently asked

Caption FAQ

Which Whisper model does El Ojo use?

Whisper Large v3 via Groq. Word-level timestamps via verbose_json. Languages auto-detected; you can also hint the language for higher accuracy on Urdu, Hindi, Pashto.

How does LLM correction work?

After Whisper returns the transcript, Llama 3.3 70B reads each segment and corrects spelling, punctuation, proper nouns, and obvious recognition errors. Runs only if you opt in. Deterministic per segment.

What is Roman / Hinglish transliteration?

If your audio is in Urdu, Hindi, or another Indic language, El Ojo can output the transcript in Roman / Latin script — what users call "Hinglish" for Hindi and "Roman Urdu" for Urdu. Useful for audiences who speak the language but don't read the native script.

Which caption styles are available?

Four: Karaoke (word-by-word fill), Hormozi (bold black-and-yellow highlight), MrBeast (huge centered text), and Minimal (clean understated). Toggle in the editor.

Do captions burn into the final MP4?

Yes when you render with the burn-in flag. Otherwise they live as overlay metadata you can toggle on/off in the editor preview.

Which languages have RTL support?

Arabic, Urdu, Hebrew, Farsi/Persian. Captions render right-to-left with the correct font stack. CJK (Chinese, Japanese, Korean) and Devanagari (Hindi, Marathi) also use the appropriate font automatically.

Caption your next video in any language

Free tier of Studio includes auto-captions on your first 10 videos. Pro removes the cap.