This is the workflow that made HeyGen famous and is now in Lumen. Upload a video in one language, get back the same video — same person, same gestures, same voice — speaking a different language.

How it differs from AI Dub

AI Dub replaces audio. Video Translate replaces audio and mouth shapes. Use Video Translate when you'll see the face. Use AI Dub when audio is enough.

Setup

  1. Upload your source video.
  2. Pick source language (or let auto-detect).
  3. Pick target language.
  4. Leave Lip-sync, Preserve voice, and Enhance speech all on.

Speaker count

If multiple people speak, Lumen needs to know how many. Diarisation is automatic but the count helps the model split correctly. For most one-person videos, leave at 1.

Cost & runtime

Roughly 18 credits per source minute, ~60 seconds to render a 30-second clip. The lip-sync pass is the most expensive part — turn it off if you don't need it and you'll cut cost in half.