HeyGen Hub
+00:00 GMT
Avatars
January 8, 2025 · Last updated on January 25, 2025

Best practices for creating your AI avatar and voice

Best practices for creating your AI avatar and voice
# Video Avatars
# Voices
# Photo Avatars

Avatar & Voice shooting tips and tricks

Jay Richardson
Jay Richardson
Best practices for creating your AI avatar and voice

Want to create a professional and engaging avatar? These simple steps will ensure your avatars look authentic and sound natural, making them perfect for presentations, marketing, and more.

This guide was written by HeyGen customer and creative technologist Jay Richardson to help you achieve the best results for both video and voice recordings. In case you missed it, check out the recap of Jay's presentation and demonstration on this topic at Visionary Voices, HeyGen's first ever community event, and check out his shooting tips video below.


What’s covered in this guide

  1. Video recording guidelines for avatars
  2. Common mistakes to avoid
  3. Audio recording tips for voice cloning
  4. Additional resources for perfecting your avatar


Creating high-quality avatar footage

Follow these steps for optimized video recording:

🎥 Camera: Use a 4K camera or a modern smartphone with good resolution. If you are using a smartphone, you’ll get best results using cinematic mode in 4K.

Maintain a natural line of sight by positioning the camera at eye level. Disable auto-focus and use manual focus instead. On most smartphones, you can tap and hold on your face to lock focus.

Disable auto-exposure in order to maintain consistent exposure. On most smartphones, you can lock exposure by tapping and holding on your face until a lock icon appears.

Before recording, check for consistency by recording a short video with movement to determine whether the brightness or focus changes during the recording. If so, try locking focus and exposure.


💡 Lighting: Utilize natural lighting for the best video quality by positioning yourself facing a window with soft, indirect sunlight. Sit or stand with the window directly in front of you, avoiding direct sunlight to prevent harsh shadows and overexposure. Ensure your background is evenly lit, simple, and free of shadows to maintain focus and enhance overall video quality.


🎬 Background: Opt for a clean and simple background, especially if you intend to remove your avatar’s original background in future videos. Avoid backgrounds with excessive detail or motion to ensure the focus remains on you. 


📏 Stability: Consider using a tripod or stabilize your phone or camera.


🧍🏾 Movement: Avoid quick movements and limit head turns to 30 degrees. For videos taken while walking, try to stabilize the phone by gripping it tightly.


⏺️ Recording: Position yourself at the center of the frame, ensuring your head and shoulders are clearly visible.

Keep a slight distance from the camera (2-3 feet) to avoid distortion and maintain a balanced perspective.

Sit or stand with good posture, avoiding slouching, and make steady eye contact with the camera to create a sense of connection.

The first 15 seconds of your video are critical—minimize blinking and excessive movement to maintain focus and clarity when speaking. Afterward, feel free to incorporate natural movements and gestures.

Aim to record for 2.5 to 3 minutes for the best results. Speak clearly at a moderate pace, using natural facial expressions and gestures, just as you would in a face-to-face conversation. Once you’ve created and uploaded your video, follow the consent prompts to give HeyGen clearance to create your avatar!


Common mistakes to avoid

Avoid these common pitfalls to ensure high-quality results:

  1. Fast movements: Avoid quick, jerky actions.
  2. Poor lighting: Steer clear of backlighting or harsh shadows.
  3. Camera angle: Keep the camera level; avoid tilting.
  4. Background noise: Record in a quiet space to prevent disturbances.


Audio recording guidelines

To create a natural-sounding voice clone, follow these audio recording tips:

🎙️ Microphone: Use a high-quality external or Bluetooth mic. Record in a quiet area to ensure there is limited background noise. If using a wireless microphone, it should be positioned about 6-8 inches from your mouth. Ensure the microphone is unobstructed by your hand and if using an external microphone, ensure it does not rub against any fabric.


🤫 Environment: Record in a quiet, noise-free space.


🗣️ Speech: Speak naturally and clearly, with slight emotional exaggeration. Discuss familiar topics for authenticity, like daily routines, hobbies, or personal stories. This approach helps you naturally express the tone and energy you’d use in a relaxed setting. Avoid reading from a script to convey authentic vocal variation and emotion.

Include breaks and pauses to ensure authenticity. Maintain a consistent, clear tone throughout the recording for the best results. Speak at a steady, conversational volume—neither too soft nor too loud. Keep a balanced pace, avoiding rushing or speaking too slowly, and ensure your voice remains controlled and even throughout.


Common issues and solutions

Below are suggestions for how to improve the most common issues our users face in creating custom avatars and voices.


Unnatural body movements

Ensure natural and subtle hand movements throughout the recording, avoiding excessive body or head movement. These gestures will help make your avatar look more natural and at ease.


Lip sync

Maintain clear pronunciation and a steady tone. Pause with your lips closed at appropriate moments to improve lip-syncing accuracy.


Video quality

Use high-resolution cameras (smartphones or professional cameras are better than laptop webcams) and ensure proper lighting without harsh shadows.


Too monotone

  1. Provide training data with natural inflections and expressive tones.
  2. First, try cloning your voice separately if you haven't already.
  3. If you have, then try recording a new audio sample, sounding very excited and expressive, and add that as a "custom emotion" to your voice.
  4. Also try adjusting the stability, clarity/similarity and style exaggeration settings until you like what you hear.


The accent

  1. Use Professional Voice Cloning (PVC) for unique accents and record samples consistently in the desired accent.
  2. For non-English TTS, use the multilingual v2 model. For English, use turbo v2.
  3. If issues persist, try re-creating your voice clone with a new sample and choose the "remove background noise" option in HeyGen's voice clone.
  4. If issues continue, try regenerating the voice clone a couple of times with the same audio sample.
  5. If still unresolved, you can set the accent in the studio. While this fixes the accent, it may make it sound less like your original voice.


Doesn't sound like me

Use Professional Voice Cloning (PVC) and ensure high-quality, consistent audio with enough training data.

  1. First, try cloning your voice separately if you haven't already.
  2. If you have, then try recording a new audio sample with no background noise and clean audio quality.
  3. When you upload the audio on HeyGen, choose the 'remove background noise' option when submitting.
  4. If still having issues, try re-recording a new audio sample.
  5. If issues persist, try regenerating the voice clone with the same sample.


Incorrect pause

  1. Adjust script to simulate pauses (e.g., using punctuation or ellipses).
  2. Click the “Add pause section” or use in-text pause by clicking “Add 0.5s pause” to manually add your own pauses.


Pace

  1. Modify the script to with punctuation or sentence structure to test different pacing effects.
  2. Adjust the speed in Voice settings


Incorrect Rise/Fall Pitch

If you’re encountering issues with incorrect rise or fall in pitch, Enterprise users can adjust the punctuation in Proofread to achieve the desired pitch. Modifying punctuation, such as adding commas or question marks, can help refine the intonation.


Recap

By following these guidelines, you can create high-quality avatars with realistic voice and visual fidelity. Have questions or more tips and tricks to share? Join the discussion in this forum post.


This guide covered:

  1. Video recording guidelines for avatars
  2. Common mistakes to avoid
  3. Audio recording tips for voice cloning
  4. Additional resources for perfecting your avatar

Now that you’ve learned the most important aspects of avatar creation, check out these detailed guides to help you create your personal avatar:

  1. How to use Video Avatars to make your digital twin
  2. How to use Motion Avatars to create an animated digital twin
  3. How to use HeyGen AI Voice for high-quality video narration


About Jay Richardson

Jay Richardson is a leader in AI-driven media and immersive content creation, blending technology with storytelling to elevate brand experiences globally. He has been featured on ABC Nightline News for travel content and was the executive producer of several Vogue covers around the world. His company AskMeAi.co has been leading AI technology for cutting-edge AI assistants in the hospitality sector, working with renowned hotels like St. Regis Resorts and 5-star resorts such as Sripanwa. Additionally, they have produced over 1,000 innovative avatars using HeyGen technology for training, social media, and data for top tech companies like ClickUp. Richardson is also the founder of LiveRichTravel, a team of high-profile content creators and influencers whose work showcases luxury travel destinations with the aim of boosting tourism and celebrating the beauty and diversity of the world. LiveRichTravel’s partners and clients include Netflix, Hulu, Maldives Tourism Board, and more.

Table Of Contents
Dive in
Related
Guide
Using custom and stock voices for video narration
Oct 4th, 2024 Views 1.2K
Guide
Using custom and stock voices for video narration
Oct 4th, 2024 Views 1.2K
Guide
How to use Video Avatars to make your digital twin
Oct 25th, 2024 Views 1.2K
Guide
Prompting best practices for generating looks
Jan 22nd, 2025 Views 417