Voices

October 4, 2024 · Last updated on May 31, 2025

Creating and generating custom Voices for lifelike video narration

# Voices

-

Looking for a fast and easy way to create professional voiceovers for your videos?

Meet HeyGen AI Voice, your go-to tool for high-quality video narration. Let’s walk through the simple steps to bring your videos to life with HeyGen’s AI-powered voice engine.

Did you know learners retain up to 80% more information from videos compared to text?

What’s covered in this guide

In this guide, we’ll cover:

What is HeyGen AI Voice?

How to create your own voice

How to customize your voice with multiple Emotions

How to add a new voice to a video

How to generate AI Voices

Prompting guide

Common issues and solutions

What is HeyGen AI Voice?

HeyGen Voice offers a vast library of AI-generated voices that cover a whopping 175 languages and capture a variety of emotions that range from friendly, to serious and everything in between. Plus, there are numerous tone, accent, and style options to work with.

Whether you're narrating a corporate presentation, a YouTube video, or an educational course, there's a voice that fits perfectly. You can even create your own emotions for your voice by uploading simple audio files.

For example, if you want to add a sad version of your voice to the mix, you can upload a separate audio file of you speaking in a sad voice. The options are endless!

How to create your own voice

Start by reviewing our ultimate guide to hyper-realistic Custom Voice Cloning. Next, log into your HeyGen account.

From the dashboard, click ‘AI Voice’ to discover our library of AI voices.

Next, click ‘Create New Voice’ but note that you can also select an existing public voice from the library.

Next you’ll be asked to upload your own audio file.

You’ll then be able to choose between uploading your audio file.

For best results, upload a file that’s 2 minutes long. You can also create an avatar and voice through the Hyper Realistic Avatar creation process.

Finally, click on ‘Create new voice’.

How to customize your voice with multiple Emotions

Once the voice is generated, you’ll find it in the AI Voice library.

Click into your new voice, and you’ll notice a handful of emotions that are available. Choose one that you’d like to be the default, and you’re ready to go.

Next, let’s add a custom emotion to the mix. When you click into the new voice you’ve created, you can click ‘Add emotion’ and then upload a specific audio file.

Examples of emotions include sad, silly, whispering, excitement, and more.

Once you’ve uploaded your specific audio file, all you have to do is name it, and you’re set!

How to add a new voice to a video

When you’re in the process of creating a video, all you have to do is add the voice to any portion of your script.

Finally, click the portion of the script you’d like to add it to and select the voice you’ve created.

How to generate AI Voices

In addition to HeyGen's high-quality AI voiceover options, users on a paid Creator, Teams, Agency or Enterprise plan have the ability to generate custom voices using text prompts.

Start by clicking on the "Create New Voice" button and click on the "Generate Voice" option.

You can name your voice, select the desired age, gender, and ethnicity. These details help shape the overall tone and character of the voice.

Write a short description explaining how you want the voice to sound—whether it's professional, friendly, calm, or energetic. If you’re unsure, you can click on "Try a Sample" to hear example voices for inspiration.

Once you’re satisfied with your customizations, click "Generate Voice." HeyGen will provide you with three voice options to choose from. Listen to each and select the one that best matches your needs.

By generating an AI voice, you add a new layer of personalization to your videos, enhancing how your message is delivered!

Prompting guide

Voice Design Types

Realistic Voice Design:

To create an original, realistic voice, you can specify attributes such as age, accent/nationality, gender, tone, pitch, intonation, speed, and emotion. Example prompts include:

“A young Indian female with a soft, high voice. Conversational, slow, and calm.”

“An old British male with a raspy, deep voice. Professional, relaxed, and assertive.”

“A middle-aged Australian female with a warm, low voice. Corporate, fast, and happy.”

Character Voice Design:

For creative characters, simpler prompts work well to generate unique voices. Example prompts include:

“A massive evil ogre, troll.”

“A sassy little squeaky mouse.”

“An angry old pirate, shouting.”

Other characters we’ve had success with include Goblin, Vampire, Elf, Troll, Werewolf, Ghost, Alien, Giant, Witch, Wizard, Zombie, Demon, Devil, Pirate, Genie, Ogre, Orc, Knight, Samurai, Banshee, Yeti, Druid, Robot, Elf, Monkey, Monster, and Dracula.

Voice Attributes

Each attribute varies in importance when designing your AI voice:

Age (High Importance): Choose from options like Young, Teenage, Adult, Middle-Aged, Old, etc.

Accent/Nationality (High Importance): Options include British, Indian, Polish, American, and more.

Gender (High Importance): Select from Male, Female, or Gender Neutral.

Tone (Optional): Examples include Gruff, Soft, Warm, and Raspy.

Pitch (Optional): Options like Deep, Low, High, and Squeaky are available.

Intonation (Optional): Options include Conversational, Professional, Corporate, Urban, and Posh.

Speed (Optional): You can set the speed to Fast, Quick, Slow, or Relaxed.

Emotion/Delivery (Optional): Choose emotions such as Angry, Calm, Scared, Happy, Assertive, Whispering, or Shouting.

Common issues and solutions

Doesn’t sound like me

Voice clones may not perfectly replicate the source voice due to issues in the training data or audio quality. To address this:

Use Professional Voice Cloning (PVC) with high-quality, consistent audio.

Record clean audio samples with no background noise.

Enable the Remove Background Noise option when submitting your audio.

If issues persist, re-record new samples or regenerate the clone with the same input.

Pronunciation

Pronunciation accuracy depends on the language of the training data. To improve this:

Record training samples in the language you plan to use.

Ensure natural inflection and clear articulation during recording.

Voice instability

Voice instability can result from inconsistent training data or audio quality. Improve stability by:

Recording high-quality training samples with consistent tone and volume.

Using sufficient training data to provide the AI with varied but stable input.

Incorrect tone

Achieving the desired tone requires tonal consistency in the input samples. To correct tone issues:

Record training samples in the target tone.

Ensure that all samples maintain a consistent emotional or tonal expression.

Too monotone

Monotone voice clones can lack natural expression. Enhance expressiveness by:

Including training data with natural inflections and varied tones.

Recording new samples with expressive delivery and adding them as a custom emotion to your voice.

Adjusting settings like stability, clarity/similarity, and style exaggeration for a more dynamic result.

Emotions don’t sound like me

To capture emotional nuances, training samples must reflect the desired emotional expressions. Include diverse samples showcasing different emotions to enhance the clone's ability to replicate them.

Accent

For unique accents, consistent training data is essential. To address accent issues:

Use Professional Voice Cloning and record samples in the desired accent.

For non-English TTS, use the Multilingual v2 model. For English, use Turbo v2.

If problems persist, regenerate the voice clone using the same or new samples.

Adjust the accent in HeyGen Studio if necessary, keeping in mind this may slightly alter the voice’s authenticity.

Best practices for successful voice cloning

Use high-quality audio: Ensure training samples are free from background noise and recorded in a quiet environment.

Provide sufficient data: Include a diverse range of samples, covering various tones, emotions, and expressions.

Maintain consistency: Ensure all samples are recorded with consistent tone, volume, and quality.

Enable background noise removal: Use HeyGen’s Remove Background Noise option for cleaner input.

Test and adjust: Experiment with settings such as stability, clarity, and style exaggeration to fine-tune the output.

Recap

In this guide, you learned:

What is HeyGen AI Voice?

How to create your own voice

How to customize your voice with multiple Emotions

How to add a new voice to a video

Common issues and solutions

We’re looking forward to seeing what you’ll create with HeyGen!

Comments (0)

Popular

Table Of Contents