Parameters
General Parameters- Server URL: Fireworks’ default co-located endpoint is “wss://audio-agent.link.fireworks.ai/v1/audio/agent”. Swap URLs if you’ve been instructed to use a personalized endpoint
- API Key: Use your Fireworks API key
- System prompt: System prompt for the LLM
- Tools:
- Tools for the LLM. Defined in OpenAI’s function calling JSON format. See OpenAI or Fireworks docs for instructions on defining tools. Inline comments are not supported
- Specify instructions for calling the tool in the LLM system prompt
- Voice agent UI assumes that tools will always result in a response. Provide the response to the LLM in JSON. For example, a scheduling tool could result in
{"scheduling": "succeeded"}
or{"suggested_time": "3 pm"}
.
- Echo Cancellation (AEC) - Echo cancellation if you are not using headphones. Disable it otherwise, to avoid introducing artifacts.
- High pass filter and noise suppression - both help eliminate noise if you anticipate usage in noisy environments. Disable if in non-noisy environments
- Autogain control - helps stabilize user volume if users are expected to have a large or changing distance to to the mic Disable this setting if volume is stable
- Minimum delay - Minimum time in seconds before the agent can respond. Lower time reduces fastest possible response time but may lead to users being interrupted between sentences
- Max interrupt delay - Maximum time in seconds before the responds if user has spoken anything. Lower time reduces slowest possible response time but may lead to users being interrupted between sentences.
- Max follow-up delay - Max delay in seconds before agent starts speaking if user has not spoken anything. Lower times means the agent follows up more aggressively in periods of silence.
- Choosing voices: Voices that begin with “fw” are Fireworks voice models. Voices that do not begin with “fw” come are powered by the open-source Kokoro model. Fireworks voice models can be prompted via IPA for specific pronunciation(see guide) while Kokoro models support non-English languages.
- Changing voice language: Different Kokoro voices correspond to different languages. For example, voices starting with ‘e’ are Spanish. See full list for specifics. To have text generated in a different language, write your system prompt in that language. You may also need to explicitly instruct the LLM to respond in a particular language.
- TTS Speed: Change how quickly the voice model speaks
Custom pronunciation via International Phonetic Alphabet
Have words that need to be pronounced a particular way? For example, let’s say you want to use the British pronounciation for Nike (one syllable) instead of the American pronunciation. Fireworks TTS supports vocalizing precise pronounciation via the International Phonetic Alphabet (IPA), a notation that represents the individual sounds of spoken language. For example, the British pronunciation of Nike is represented as <ipa>nˈaɪk</ipa> in IPA. To use custom pronounciation, you’ll need to prompt your LLM to output IPA every place a word would have been generated. For example, we use the prompt:Generating IPA
To generate the IPA, we have a Python script (see below) or we’ve had success prompting ChatGPT to generate IPA when providing it with our syntax reference (see example prompt).Syntax Reference
Allowed Symbols
Stressˈ
(primary) ˌ
(secondary)
WARNING: primary stress is not a normal apostrophe.
Consonants b
, d
, f
, h
, j
, k
, l
, m
, n
, p
, s
, t
, v
, w
, z
, ɡ
, ŋ
, ɹ
, ʃ
, ʒ
, ð
, θ
, ɾ
Vowels ə
, i
, u
, ɑ
, ɔ
, ɛ
, ɜ
, ɪ
, ʊ
, ʌ
, æ
, a
, o
, ɒ
, ᵻ
, ɐ
, ː
Stress-Mark Rules
- Place
ˈ
orˌ
immediately before the vowel that carries stress. - Never put a stress mark after the vowel or before a consonant.
- Use one primary stress for any monosyllable; add secondary stress only when needed in longer words.
<ipa>
Tag Syntax
Wrap each transcription in angle-bracket tags:
Examples
Word / Phrase | IPA options |
---|---|
GIF | <ipa>dʒˈɪf</ipa> or <ipa>ɡˈɪf</ipa> |
SQL | <ipa>sˈikwəl</ipa> or <ipa>ɛskjuːɛl</ipa> |
JSON | <ipa>dʒˈeɪsɑn</ipa> or <ipa>dʒˈeɪsᵊn</ipa> |
Nike | <ipa>nˈaɪki</ipa> (US) or <ipa>nˈaɪk</ipa> (UK) |
the quick brown fox | <ipa>ðə kwˈɪk bɹˈaʊn fˈɑks</ipa> |
Python Script for generating IPA
You can easily generate valid eSpeak IPA strings from English by installing themisaki
Python package and using the following code snippet: