AI Avatars used to take half a day of studio recording. With Lumen's Avatar V model (the same family as HeyGen's flagship), you need one clear photo and a script. That's it.

Pick a reference

Two options work best:

Write a script that sounds like you

Read it out loud first. If you wouldn't actually say it, the avatar will sound stilted. Use contractions, short sentences, and occasional CAPS to emphasise words — Avatar V is audio-driven, so emphasis in your typed script becomes emphasis on the avatar's face.

Voice + language

Pick a voice that matches the energy. For 175+ language coverage, Lumen automatically cross-renders the voice in the language you select — so a friendly American voice can deliver the same script in Japanese with the same warmth.

Background and motion

Motion options are Still, Subtle, or Expressive. Subtle is the right default for almost everything. Use Expressive only for high-energy content like ads or sales pitches.

Generate and review

A 30-second avatar renders in roughly 90 seconds. Your first generation is for review — pay attention to: hand gestures (do they distract?), eye contact (does the avatar look at the right spot?), and mouth shapes (do the lips genuinely match the audio?).

Tip — drive realism with audio

The single biggest lever for avatar quality is the audio you feed it. A monotone audio = a flat avatar. A warm, expressive read = a warm, expressive avatar. If your generations feel lifeless, re-record the voiceover in Lumen with the same voice but more energy in the read.