24-12-2025 , 24-12-2025

Imagine this: you're building a voice menu for a bank. A customer calls, and the system needs to read out a verification code: "Your code is A-7-K-2-M." You type the text, hit "generate"… and hear something like "Your code is aksm." The TTS read the letters as a word.
Or another scenario: you're creating voiceover for a language course where the teacher spells out a name: "My name is Pieter. P-I-E-T-E-R." But the synthesizer stubbornly pronounces "Pieter" as a whole word, ignoring the hyphens.
Sound familiar? Then this article is for you. We'll break down 5 ways to make TTS spell out text letter by letter — from simple to advanced. All methods work, but each one shines in different situations.
Let's start with a real task. One of our SpeechGen users asked support: how do I voice a phrase so that the name is spelled out with a 1-second pause between letters?
"The name is Pieter. P-I-E-T-E-R. Pieter."
Here are 5 ways to do it.
The simplest option — just add pauses between letters:
The name is Pieter.
P <break time="1s"/>
I <break time="1s"/>
E <break time="1s"/>
T <break time="1s"/>
E <break time="1s"/>
R. <break time="1s"/>
Pieter.
Pros: Simple, works everywhere. Cons: Some letters may be pronounced differently than you expect. For example, "I" as "ih" instead of "eye," "E" as "eh" instead of "ee," "C" as "k" instead of "see." Depends on the voice and language.
Add an explicit instruction: "this is a letter, pronounce it as a letter":
The name is Pieter.
<say-as interpret-as="characters">P</say-as> <break time="1s"/>
<say-as interpret-as="characters">I</say-as> <break time="1s"/>
<say-as interpret-as="characters">E</say-as> <break time="1s"/>
<say-as interpret-as="characters">T</say-as> <break time="1s"/>
<say-as interpret-as="characters">E</say-as> <break time="1s"/>
<say-as interpret-as="characters">R</say-as>. <break time="1s"/>
Pieter.
Pros: More reliable letter pronunciation, standard SSML. Cons: More code to write. Doesn't work with HD voices — they don't support this tag.
Write out the letters as they sound in the English alphabet:
The name is Pieter.
Pee <break time="1s"/>
Eye <break time="1s"/>
Ee <break time="1s"/>
Tee <break time="1s"/>
Ee <break time="1s"/>
Are. <break time="1s"/>
Pieter.
Pros: Works on any engine, even without SSML support. Cons: You need to know how letter names are spelled. Not universal across languages.
The radio alphabet — each letter is replaced with a code word:
The name is Pieter.
Papa <break time="1s"/>
India <break time="1s"/>
Echo <break time="1s"/>
Tango <break time="1s"/>
Echo <break time="1s"/>
Romeo. <break time="1s"/>
Pieter.
Pros: Maximum clarity, perfect for noisy environments or poor connections. Cons: Sounds specific, not suitable for every context.
For maximum control — specify exact pronunciation using the International Phonetic Alphabet:
The name is Pieter.
<phoneme alphabet="ipa" ph="piː">P</phoneme> <break time="1s"/>
<phoneme alphabet="ipa" ph="aɪ">I</phoneme> <break time="1s"/>
<phoneme alphabet="ipa" ph="iː">E</phoneme> <break time="1s"/>
<phoneme alphabet="ipa" ph="tiː">T</phoneme> <break time="1s"/>
<phoneme alphabet="ipa" ph="iː">E</phoneme> <break time="1s"/>
<phoneme alphabet="ipa" ph="ɑːr">R</phoneme>. <break time="1s"/>
Pieter.
Pros: Absolute control over pronunciation. Cons: Requires IPA knowledge, not all engines support it.
| Method | Simplicity | Reliability | Naturalness | When to Use |
|---|---|---|---|---|
| <break> | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Quick results, trusted voice |
| <say-as> | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Standard solution for most tasks |
| Phonetic | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | Engine without SSML or need compatibility |
| NATO | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ | IVR, radio, critical clarity |
| <phoneme> | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Full control, custom pronunciation |
Our recommendation: Start with <say-as> — it's the sweet spot between simplicity and reliability. We tested all five methods with various voices in SpeechGen — each one works, so choose based on your task.
The same approach works for any situation where you need to "break up" text into individual elements.
Sometimes you need the number to sound digit by digit, not "eight hundred five fifty-five."
With <say-as> tag:
Call us at:
<say-as interpret-as="characters">1</say-as> <break time="300ms"/>
<say-as interpret-as="characters">8</say-as> <break time="300ms"/>
<say-as interpret-as="characters">0</say-as> <break time="300ms"/>
<say-as interpret-as="characters">0</say-as> <break time="500ms"/>
<say-as interpret-as="characters">5</say-as> <break time="300ms"/>
<say-as interpret-as="characters">5</say-as> <break time="300ms"/>
<say-as interpret-as="characters">5</say-as> <break time="500ms"/>
<say-as interpret-as="characters">7</say-as> <break time="300ms"/>
<say-as interpret-as="characters">2</say-as> <break time="300ms"/>
<say-as interpret-as="characters">3</say-as> <break time="300ms"/>
<say-as interpret-as="characters">4</say-as>.
Phonetic spelling (works with any voice, including HD):
Call us at:
one <break time="300ms"/>
eight <break time="300ms"/>
zero <break time="300ms"/>
zero <break time="500ms"/>
five <break time="300ms"/>
five <break time="300ms"/>
five <break time="500ms"/>
seven <break time="300ms"/>
two <break time="300ms"/>
three <break time="300ms"/>
four.
Notice: pauses between digit groups are slightly longer — makes it easier to follow.
For codes, a mix of letters and numbers works great.
With <say-as> tag:
Your verification code is:
<say-as interpret-as="characters">A</say-as> <break time="400ms"/>
<say-as interpret-as="characters">7</say-as> <break time="400ms"/>
<say-as interpret-as="characters">K</say-as> <break time="400ms"/>
<say-as interpret-as="characters">2</say-as> <break time="400ms"/>
<say-as interpret-as="characters">M</say-as> <break time="400ms"/>
<say-as interpret-as="characters">9</say-as>.
IPA transcription (maximum control over pronunciation):
Your verification code is:
<phoneme alphabet="ipa" ph="eɪ">A</phoneme> <break time="400ms"/>
<phoneme alphabet="ipa" ph="ˈsɛvən">7</phoneme> <break time="400ms"/>
<phoneme alphabet="ipa" ph="keɪ">K</phoneme> <break time="400ms"/>
<phoneme alphabet="ipa" ph="tuː">2</phoneme> <break time="400ms"/>
<phoneme alphabet="ipa" ph="ɛm">M</phoneme> <break time="400ms"/>
<phoneme alphabet="ipa" ph="naɪn">9</phoneme>.
Phonetic spelling (universal for all voices):
Your verification code is:
Ay <break time="400ms"/>
seven <break time="400ms"/>
Kay <break time="400ms"/>
two <break time="400ms"/>
Em <break time="400ms"/>
nine.
Pro tip — repeat the code twice so the user has time to write it down:
Your verification code is:
Ay <break time="400ms"/> seven <break time="400ms"/> Kay <break time="400ms"/>
two <break time="400ms"/> Em <break time="400ms"/> nine.
<break time="800ms"/>
Again: Ay <break time="400ms"/> seven <break time="400ms"/> Kay <break time="400ms"/>
two <break time="400ms"/> Em <break time="400ms"/> nine.
Email is a classic case where you need to dictate:
Send your application to:
mike <break time="300ms"/>
dot <break time="300ms"/>
johnson <break time="300ms"/>
at <break time="300ms"/>
gmail <break time="300ms"/>
dot <break time="300ms"/>
com.
<break time="500ms"/>
That's:
<say-as interpret-as="characters">M</say-as> <break time="300ms"/>
<say-as interpret-as="characters">I</say-as> <break time="300ms"/>
<say-as interpret-as="characters">K</say-as> <break time="300ms"/>
<say-as interpret-as="characters">E</say-as> <break time="500ms"/>
dot <break time="300ms"/>
<say-as interpret-as="characters">J</say-as> <break time="300ms"/>
<say-as interpret-as="characters">O</say-as> <break time="300ms"/>
<say-as interpret-as="characters">H</say-as> <break time="300ms"/>
<say-as interpret-as="characters">N</say-as> <break time="300ms"/>
<say-as interpret-as="characters">S</say-as> <break time="300ms"/>
<say-as interpret-as="characters">O</say-as> <break time="300ms"/>
<say-as interpret-as="characters">N</say-as> <break time="500ms"/>
at gmail dot com.
First pronounce it naturally, then spell it out — double insurance.
Some abbreviations are read as words (NASA, NATO), others letter by letter (FBI, CEO). TTS doesn't always guess correctly.
FBI — letter by letter with <say-as>:
The <say-as interpret-as="characters">FBI</say-as> investigation revealed new evidence.
FBI — phonetic spelling (for HD voices):
The Eff Bee Eye investigation revealed new evidence.
NASA — as a word (usually TTS handles this, but if not):
The <phoneme alphabet="ipa" ph="ˈnæsə">NASA</phoneme> mission launched successfully.
Tricky case — URL:
Visit our website at:
<say-as interpret-as="characters">AWS</say-as>
dot amazon dot com.
Or phonetically:
Visit our website at:
Ay Double-You Ess
dot amazon dot com.
Spelling in voiceover is needed more often than you'd think:
IVR and voice assistants — verification codes, order numbers, product SKUs. When a bot says "your tracking number," the customer needs to hear each letter and digit separately.
E-learning and language courses — spelling practice, dictations, spelling bees. Students need to hear clear pronunciation of each letter.
Podcasts and videos — when you need to dictate an email, unusual name, or website URL. "Write to us at info@..." — and then spell it out.
Audiobooks and documentation — abbreviations that are read letter by letter (FBI, CEO), serial numbers in technical texts.
SSML gives you full control over how TTS pronounces text. For spelling, you have at least 5 tools — from simple pauses to precise IPA transcription.
The golden rule: listen to the result. Different voices and engines behave differently. What sounds perfect with one voice may need adjustments for another.
Try these methods in the SpeechGen editor — paste the code, choose a voice, and hear the result instantly.
Have questions about SSML or need help with a specific case? Reach out to support — we're happy to help.