Avatars
April 16, 2026
Avatar V: live webinar recap + top questions answered

# Avatar V
-

This guide recaps the HeyGen live webinar hosted by Adam (Product Manager, Avatars & Voice) and Nick (Head of AI Solutions), introducing Avatar V the latest generation of HeyGen's digital twin technology. It covers what was demonstrated, key tips from the presenters, and the most common questions from the 1,200+ attendees.
HeyGen's mission is to make visual storytelling accessible to everyone, with a focus on business communication videos. Avatar V is the next major step in that direction.
What Is Avatar V?
Avatar V is HeyGen's newest avatar model. It combines three ingredients to create realistic, customizable videos of yourself:
- Your photo for your appearance and identity
- Your voice cloned from your recording
- Your video reference 15 seconds of footage that teaches the model how you move
The key innovation over Avatar IV is that Avatar V builds a fine-tuned model based on your reference video. Instead of predicting your motion, teeth, facial expressions, and gestures from a single image alone, it watches how you actually move and uses that as the foundation for every video you generate.
This means no matter what outfit, background, or setting you choose, the motion and delivery will always look authentically like you.
What's new vs. Avatar IV
Previous avatar models offered either realism or customizability but not both. Avatar V combines them:
- Photo avatars (old): Customisable appearance, but not realistic enough because the model had to guess your motion from a single image.
- Video avatars (old): Realistic motion, but you couldn't change your outfit or background without re-filming.
- Avatar V (new): Fully customisable appearance AND realistic motion learned from your reference video. The best of both worlds.
Other improvements highlighted during the webinar:
- Improved script-to-gesture alignment Avatar V is more expressive and more naturally in sync with your audio than Avatar IV
- Better hand motion Avatar IV's gesturing was sometimes random and uncontrolled; Avatar V is smoother and more natural
- Stable long-form generation one-shot videos up to 60 minutes (on Business plan), with scenes up to 3 minutes each
- Teeth consistency the model now preserves your teeth from the reference video (a known issue in Avatar IV)
- Landscape and portrait support
- Selfie/handheld camera angle support with improved hand tracking
- Same cost as Avatar IV
Tips for best results
Recording Your 15-Second Reference Video
- Be expressive and natural your motion, energy, and gestures are all learned from this video. A flat, reserved delivery will produce a stiff avatar.
- Include natural hand gestures if you want your avatar to gesture. What you do in those 15 seconds is what it will replicate.
- You do NOT need a professional setup. Make sure your face is clearly lit with no heavy shadows the model needs to clearly see your face to generate accurate new looks.
- You can record via webcam or upload a pre-recorded video. Both work. Uploading is fine as long as you still complete the consent code step.
- You can say whatever feels natural during the 15 seconds. The only required part is reading the consent code at the end: "For safety purposes, my unique code is [number]."
- Use your phone camera if it is higher quality than your webcam. Phone cameras generally produce better results.
Voice clone
- A voice clone is automatically generated from your 15-second video, but a standalone voice clone recording usually produces better quality.
- For the standalone clone, focus only on your voice speak clearly and enunciate. You do not need to worry about gestures.
- If your laptop mic isn't great, record on your phone (iPhone Voice Memos works well enable lossless recording) and upload the audio file.
- If you have an accent, use a script that includes words that naturally capture your accent. You can ask ChatGPT to adapt the sample script for your specific accent.
- Your voice clone can speak multiple languages you don't need separate recordings per language.
- HeyGen offers 11 Labs (V2 for stability, V3 for expressiveness), Fish (often better for non-American English accents like British or Australian), and other engines.
- You can integrate an existing ElevenLabs voice using your Voice ID.
Generating looks
- Three looks are automatically generated from your 15-second footage as a starting point.
- If you are not satisfied, try changing your base look upload different photos of yourself and see which produces the most accurate likeness.
- For best results, train a Personal Model: upload 10-80 photos (30+ recommended). This takes 15-20 minutes but produces consistently better similarity.
- When training, include a variety: different angles (front, side, 45 degrees), different expressions (happy, serious), different lighting, different distances. The more context the model has, the less it has to guess.
- Side angle photos are especially helpful without them, the model has to predict how you look from the side, which can be off.
- Use real uploaded photos for training, not AI-generated ones (though you can include a few generated ones you're happy with).
Script & motion
- Avatar V is audio-driven. The energy and emotion in your voice directly controls how your avatar looks and moves. Excited voice = excited avatar. Monotone audio = flat, underperforming avatar.
- You can use CAPS in your script to emphasise words and prompt more energy.
- In AI Studio, you can assign different motion reference videos to different scenes for example, a more expressive motion style for your opening, and a calmer professional style for your main content.
- For side-angle shots, record a separate 15-second clip at that angle and use it as the motion reference for those scenes.
- Motion prompts are available to further guide how the avatar moves.
Frequently asked questions
Questions submitted by attendees during the webinar. Answers from the HeyGen team are included where provided.
Getting started with Avatar V
Q How do I access Avatar V? I couldn't find it on the app. |
A Go to the Avatar tab and select 'Quick Create', then tap 'Clone Me'. Avatar V is now the default model for real human avatars once your digital twin is created, it will be automatically selected whenever you generate a video. |
Q If I already created an avatar with Avatar IV, can I make a new one with Avatar V? |
A Yes. You can create a new digital twin using the Avatar V pipeline. The process only takes about a minute record 15 seconds, clone your voice, and generate your looks. |
Q If I went through all the setup steps for Avatar IV, do I have to repeat everything for Avatar V? |
A Yes, you will need to re-record a 15-second reference video because Avatar V's pipeline is different it uses that video as a motion reference to animate your photo. This is the key change from Avatar IV. |
Q I created an avatar recently and there was no security code. Can I use that to create an Avatar V? |
A You will still need to go through the new Avatar V consent process. The security code (consent) step is required for Avatar V. Previously, video footage and consent were separate steps Avatar V now combines them into a single recording. |
Reference video & recording
Q Can I upload a pre-recorded video instead of recording live? Is 15 seconds really the limit? |
A Yes, you can upload a pre-recorded video. The recommended input is 15 seconds. For Avatar V's technology, there is no quality difference between 15 seconds and a longer video the model does not benefit from more footage at this stage. What matters much more is the quality and expressiveness of those 15 seconds. |
Q Do inputs longer than 15 seconds create better quality avatars? |
A No not for Avatar V. The technology is designed around 15 seconds and does not improve with longer footage for the base digital twin. However, in AI Studio you can assign different motion reference videos to different scenes, which gives you more control over motion style per scene. |
Q I already uploaded a reference video with client permission. Will that work for Avatar V? |
A It depends on whether the consent code was read aloud in that video. Avatar V requires the subject to read the security code at the end of the recording. If your existing video does not include this, a new recording will be needed. |
Q What is the best recording setup phone or PC? Horizontal or vertical? |
A Use your phone camera if it is better than your webcam, as phone cameras typically produce superior image quality. For voice, use your PC mic or record separately on your phone and upload the audio. Good lighting with no heavy shadows is the most important factor. You do not need a professional studio. |
Q How does my reference video affect the motion of my avatar? |
A Directly and significantly. Motion is learned entirely from your 15-second reference video. If you record with flat, reserved body language, your avatar will move that way. If you record with natural energy and hand gestures, your avatar will replicate that style. If you are unhappy with the motion, you can redo your avatar once per billing cycle with a new, more expressive recording. |
Q My Avatar V output moved excessively and looked unrealistic compared to Avatar IV. Why? |
A The avatar's motion is driven by your reference video. If the result feels exaggerated, try re-recording your reference video with more controlled, deliberate movements. You can also adjust the motion reference per scene in AI Studio to fine-tune the energy level. |
Avatar looks & appearance
Q How can I change outfits without HeyGen altering my teeth? |
A This is a known issue that is actively being worked on. Nick confirmed that Avatar V now preserves your teeth from the reference video, which was a major problem in Avatar IV. If you are still experiencing tooth changes with look generation, try training a Personal Model with a variety of photos the more reference material the model has of your face, the more accurately it maintains your features. |
Q The AI-generated look changes my clothes but the result looks unnatural. How do I improve this? |
A Try using a different base photo that better matches the angle and lighting you want, or train a Personal Model with 30+ photos. Experimenting with different remix templates and prompts can also help results vary depending on the base look used. |
Q My AI-generated look made me look much older or younger than I am. How do I fix this? |
A Upload better base photos of yourself ideally well-lit, clearly showing your face at the angle you want to generate. Training a Personal Model on 30+ real photos of yourself is the most reliable way to get consistent, accurate likeness. |
Q Does Avatar V support full-body (head to toe) avatars? |
A Yes, full-body is possible. However, Nick noted that smaller faces in the frame still show some quality drop-off, and this is being actively improved in the next version. For best results currently, use a half-body or upper-body framing where your face is clearly visible. |
Q Can Avatar V make me look different from how I actually look (e.g. slimmer)? |
A Avatar V is designed to faithfully represent your appearance. It is not intended as a body-modification tool. AI-generated looks can vary, but the system is built to maintain your identity. |
Q Can I use a real photo of my own environment (office, home) as a background? |
A Yes. You can upload your own background photos when generating a look, or use AI-generated backgrounds. Backgrounds are part of the look customisation you can prompt for specific environments or upload your own. |
Q Can Avatar V generate different angles not just front-facing? |
A Yes, side angles and 45-degree shots are supported. Nick noted that HeyGen is actively improving side-angle quality (an upcoming fix will address the issue of the avatar looking at the camera when it should be looking sideways). Including side-angle photos of yourself when training a Personal Model helps significantly. |
Voice & language
Q Do I need to record a separate reference video for each language? |
A No. A single voice clone recording supports multiple languages. You do not need separate recordings per language. |
Q HeyGen has struggled to capture my subtle accent. Does Avatar V allow longer audio input for voice cloning? |
A A longer standalone voice clone recording can help. Adam recommended using a script that includes words and phrases that naturally capture your specific accent you can ask ChatGPT to adapt the sample script to include your accent markers. This helps the model capture the full range of your speech. |
Q Do I still need ElevenLabs, or does HeyGen's voice clone replace it? |
A HeyGen natively offers 11 Labs integration as one of its voice engines, as well as Fish and other options. For most users, HeyGen's built-in cloning will be sufficient. If you already have an ElevenLabs voice you're happy with, you can continue using it via the Voice ID integration. Nick noted that 11 Labs V2 is stable and V3 is more expressive but slightly less consistent. |
Q Has Avatar V improved accent handling compared to Avatar IV? |
A Yes. The voice clone quality is improved, and using a script that captures your accent markers during the clone recording will help further. Adam specifically called this out as something users can improve through their own script choices. |
Credits, plans & pricing
Q How many credits does Avatar V use? |
A Avatar V costs 20 Premium Credits per minute of generated video the same as Avatar IV. The cost did not increase with the new model. |
Q How much content can I create on the Creator plan? |
A The Creator plan includes 200 Premium Credits per month. At 20 credits per minute, that gives you approximately 10 minutes of Avatar V video per month with included credits. You can purchase additional Premium Credit packs if you need more. |
Creating videos with Avatar V
Q Can Avatar V add personalised text, lower-thirds, or supers to videos? |
A Yes. You can add text overlays, supers, and other elements in AI Studio after generating your avatar video. Nick showed an example of using an avatar positioned to the left of frame with an overlay on the right side. |
Q Can I control pauses and timing in the script for more realistic delivery? |
A You can use punctuation (commas, ellipses, full stops) to create natural pauses. The avatar's delivery is primarily driven by your audio, so recording or uploading audio with the exact pacing you want will give you the most control. |
Q Is there a list of script commands or prompts I can use to control the avatar? |
A Motion prompts are available to guide how the avatar moves. CAPS in the script can help emphasise certain words. A full list of supported commands was not covered in this webinar check the HeyGen Academy and Community for documentation. |
Q Can I direct the avatar to do specific gestures like nodding or raising a hand? |
A This is an upcoming feature. Nick previewed custom prompting functionality (e.g. 'be expressive here', 'show a thumbs up here') that is currently in development. It is not yet available but is on the roadmap. |
Q How does the avatar adapt its emotion and movement? |
A Avatar V is primarily audio-driven. The energy, emotion, and delivery style of your voice directly controls how the avatar moves and looks. Excited audio = excited avatar. Calm, professional audio = calm avatar. You can also assign different motion reference videos to different scenes in AI Studio to further shape the motion style. |
Q Can I create a video using just a photo and audio, without a video reference? |
A A video reference is required for Avatar V's digital twin pipeline. However, HeyGen also offers photo avatar and virtual avatar options that do not require a reference video. |
Q Can I upload an audio file instead of typing a script? |
A Yes. In addition to typing a script, you can record audio directly, upload an audio file, or use voice mirroring (where you provide audio in another voice and it is delivered in your cloned voice). |
API & integrations
Q Is Avatar V available through the API? |
A Not fully yet. Avatar V is available via Video Agent when using your real human avatar, and Video Agent can be called via the API. Full direct API support for Avatar V (including the photo + voice + video reference pipeline) is not yet publicly available but is coming. |
For questions not answered in this guide, post in the HeyGen Community Forum or contact [email protected].
Like
Comments (0)
Popular
Table Of Contents
Dive in
Related
Guide
June 2025 Product Updates Webinar Recap: Avatar IV, Gesture Control, Product Placement, and More
By Joyce Kei • Jul 1st, 2025 • Views 923
Guide
June 2025 Product Updates Webinar Recap: Avatar IV, Gesture Control, Product Placement, and More
By Joyce Kei • Jul 1st, 2025 • Views 923

