Voice & Language Configuration

Voice Configuration controls how your assistant sounds and how it listens during a live call. These settings directly affect the caller experience a misconfigured voice feels robotic, and wrong sensitivity causes the assistant to cut callers off mid-sentence.

Language

Set the primary language the assistant speaks and listens in. This determines which voice models are available for selection and how the speech recognition engine interprets the caller's input.

Set the assistant's primary operating language here. This determines:

Which voice models are available for selection in Voice Configuration
How the speech recognition engine interprets and transcribes the caller's input
Which system phrases and fallback messages are used

Voice

Your assistant's voice is powered by a selected model from a provider. In the example above, the assistant uses Apollo from Deepgram. Each provider has a distinct character:

Provider	Character
Deepgram	Optimized for low-latency telephony fast, clear, and reliable on real phone calls
ElevenLabs	More expressive and human-sounding, best for brand experiences where richness matters
Sarvam	Designed for Indian languages and multilingual support natural, clear, and localized voice output

Two actions are available directly on the voice card:

Action	Description
Listen	Plays a preview of the selected voice so you can hear exactly how it sounds before saving
Edit	Opens voice selection to browse and switch to a different voice model or provider

Always click Listen before saving. The same script sounds completely different across providers and voice models what reads well on paper may not sound right out loud.

Patience Level

Patience Level controls how long the assistant waits after the caller goes silent whether they finished speaking, are still thinking, or didn't respond at all before it re-prompts them.

Level	Best For
Low	Fast-paced sales or support flows where hesitation is uncommon
Medium	General use balanced for most audiences (recommended default)
High	Elderly callers, complex topics, non-native speakers, or anyone who needs more time to gather their thoughts

⚠️

Setting patience too low causes the assistant to re-prompt callers who are still thinking, which feels rude and abrupt. Setting it too high creates awkward dead air on calls with fast, confident speakers. Start with Medium and adjust based on real call recordings.

Advanced Settings

Click Advanced Settings to expand three slider controls that fine-tune the expressive quality of the assistant's voice. These settings are specific to the voice model and provider selected.

Stability `0.5` (default)

Controls how consistent and predictable the assistant's voice sounds across sentences.

Value	Effect
Low (toward 0.0)	More expressive and varied delivery the voice has more natural inflection but can sound inconsistent across long calls
High (toward 1.0)	More stable and uniform delivery reliable and predictable, but can feel flat or monotone

A value of 0.5 balances natural variation with consistency, which works well for most call types.

Style Exaggeration `0.0` (default)

Amplifies the stylistic characteristics of the selected voice its energy, personality, and expressiveness.

Value	Effect
0.0	The voice performs neutrally, closest to how it sounds in preview
Higher values	The voice becomes more dramatic and energetic useful for high-energy sales or engagement calls, but can sound unnatural if pushed too far

Keep Style Exaggeration at 0.0 for professional or formal call types such as customer support, feedback collection, or appointment reminders. Increase it slightly only for outbound campaigns where energy and enthusiasm are intentional.

Similarity `0.80` (default)

Controls how closely the assistant's generated speech adheres to the original trained voice model.

Value	Effect
Low (toward 0.0)	The voice drifts further from the original model can introduce artifacts or inconsistency
High (toward 1.0)	The voice stays very close to the original trained model cleaner and more consistent output

A value of 0.80 is recommended for most use cases. Only reduce this if you intentionally want more variation from the base voice.

⚠️

Stability, Style Exaggeration, and Similarity interact with each other. A high Style Exaggeration combined with a low Stability can produce unpredictable and inconsistent audio quality. Change one slider at a time and use Listen to preview the result before saving.

Prompt Call Configuration