How to Spell Out Text in TTS: 5 Ways to Make Your Voice Sound Letter by Letter

, 24-12-2025

Imagine this: you're building a voice menu for a bank. A customer calls, and the system needs to read out a verification code: "Your code is A-7-K-2-M." You type the text, hit "generate"… and hear something like "Your code is aksm." The TTS read the letters as a word.

Or another scenario: you're creating voiceover for a language course where the teacher spells out a name: "My name is Pieter. P-I-E-T-E-R." But the synthesizer stubbornly pronounces "Pieter" as a whole word, ignoring the hyphens.

Sound familiar? Then this article is for you. We'll break down 5 ways to make TTS spell out text letter by letter — from simple to advanced. All methods work, but each one shines in different situations.

The Core Case: Spelling PIETER

Let's start with a real task. One of our SpeechGen users asked support: how do I voice a phrase so that the name is spelled out with a 1-second pause between letters?

"The name is Pieter. P-I-E-T-E-R. Pieter."

Here are 5 ways to do it.

Method 1: Pauses Only (<break>)

The simplest option — just add pauses between letters:

The name is Pieter. 
  P <break time="1s"/> 
  I <break time="1s"/> 
  E <break time="1s"/> 
  T <break time="1s"/> 
  E <break time="1s"/> 
  R. <break time="1s"/> 
  Pieter.
Voice: AndrewPRO View & edit source

Pros: Simple, works everywhere. Cons: Some letters may be pronounced differently than you expect. For example, "I" as "ih" instead of "eye," "E" as "eh" instead of "ee," "C" as "k" instead of "see." Depends on the voice and language.

Method 2: Pauses + <say-as> Tag

Add an explicit instruction: "this is a letter, pronounce it as a letter":

The name is Pieter. 
  <say-as interpret-as="characters">P</say-as> <break time="1s"/>
  <say-as interpret-as="characters">I</say-as> <break time="1s"/>
  <say-as interpret-as="characters">E</say-as> <break time="1s"/>
  <say-as interpret-as="characters">T</say-as> <break time="1s"/>
  <say-as interpret-as="characters">E</say-as> <break time="1s"/>
  <say-as interpret-as="characters">R</say-as>. <break time="1s"/> 
  Pieter.
Voice: AndrewPRO View & edit source

Pros: More reliable letter pronunciation, standard SSML. Cons: More code to write. Doesn't work with HD voices — they don't support this tag.

Method 3: Phonetic Spelling

Write out the letters as they sound in the English alphabet:

The name is Pieter. 
  Pee <break time="1s"/>
  Eye <break time="1s"/>
  Ee <break time="1s"/>
  Tee <break time="1s"/>
  Ee <break time="1s"/>
  Are. <break time="1s"/>
  Pieter.
Voice: AndrewPRO View & edit source

Pros: Works on any engine, even without SSML support. Cons: You need to know how letter names are spelled. Not universal across languages.

Method 4: NATO Alphabet

The radio alphabet — each letter is replaced with a code word:

The name is Pieter. 
  Papa <break time="1s"/>
  India <break time="1s"/>
  Echo <break time="1s"/>
  Tango <break time="1s"/>
  Echo <break time="1s"/>
  Romeo. <break time="1s"/>
  Pieter.
Voice: AndrewPRO View & edit source

Pros: Maximum clarity, perfect for noisy environments or poor connections. Cons: Sounds specific, not suitable for every context.

Method 5: IPA Transcription (<phoneme>)

For maximum control — specify exact pronunciation using the International Phonetic Alphabet:

The name is Pieter. 
  <phoneme alphabet="ipa" ph="piː">P</phoneme> <break time="1s"/>
  <phoneme alphabet="ipa" ph="aɪ">I</phoneme> <break time="1s"/>
  <phoneme alphabet="ipa" ph="iː">E</phoneme> <break time="1s"/>
  <phoneme alphabet="ipa" ph="tiː">T</phoneme> <break time="1s"/>
  <phoneme alphabet="ipa" ph="iː">E</phoneme> <break time="1s"/>
  <phoneme alphabet="ipa" ph="ɑːr">R</phoneme>. <break time="1s"/>
  Pieter.
Voice: AndrewPRO View & edit source

Pros: Absolute control over pronunciation. Cons: Requires IPA knowledge, not all engines support it.

Which Method Should You Choose?

Method Simplicity Reliability Naturalness When to Use
<break> ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐ Quick results, trusted voice
<say-as> ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ Standard solution for most tasks
Phonetic ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐ Engine without SSML or need compatibility
NATO ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐ IVR, radio, critical clarity
<phoneme> ⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ Full control, custom pronunciation

Our recommendation: Start with <say-as> — it's the sweet spot between simplicity and reliability. We tested all five methods with various voices in SpeechGen — each one works, so choose based on your task.

Other Use Cases

The same approach works for any situation where you need to "break up" text into individual elements.

Phone Number

Sometimes you need the number to sound digit by digit, not "eight hundred five fifty-five."

With <say-as> tag:

Call us at: 
  <say-as interpret-as="characters">1</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">8</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">0</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">0</say-as> <break time="500ms"/>
  <say-as interpret-as="characters">5</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">5</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">5</say-as> <break time="500ms"/>
  <say-as interpret-as="characters">7</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">2</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">3</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">4</say-as>.
Voice: AndrewPRO View & edit source

Phonetic spelling (works with any voice, including HD):

Call us at: 
  one <break time="300ms"/>
  eight <break time="300ms"/>
  zero <break time="300ms"/>
  zero <break time="500ms"/>
  five <break time="300ms"/>
  five <break time="300ms"/>
  five <break time="500ms"/>
  seven <break time="300ms"/>
  two <break time="300ms"/>
  three <break time="300ms"/>
  four.
Voice: Achernar USHD View & edit source

Notice: pauses between digit groups are slightly longer — makes it easier to follow.

Verification Code

For codes, a mix of letters and numbers works great.

With <say-as> tag:

Your verification code is: 
  <say-as interpret-as="characters">A</say-as> <break time="400ms"/>
  <say-as interpret-as="characters">7</say-as> <break time="400ms"/>
  <say-as interpret-as="characters">K</say-as> <break time="400ms"/>
  <say-as interpret-as="characters">2</say-as> <break time="400ms"/>
  <say-as interpret-as="characters">M</say-as> <break time="400ms"/>
  <say-as interpret-as="characters">9</say-as>.
Voice: AndrewPRO View & edit source

IPA transcription (maximum control over pronunciation):

Your verification code is: 
  <phoneme alphabet="ipa" ph="eɪ">A</phoneme> <break time="400ms"/>
  <phoneme alphabet="ipa" ph="ˈsɛvən">7</phoneme> <break time="400ms"/>
  <phoneme alphabet="ipa" ph="keɪ">K</phoneme> <break time="400ms"/>
  <phoneme alphabet="ipa" ph="tuː">2</phoneme> <break time="400ms"/>
  <phoneme alphabet="ipa" ph="ɛm">M</phoneme> <break time="400ms"/>
  <phoneme alphabet="ipa" ph="naɪn">9</phoneme>.
Voice: AndrewPRO View & edit source

Phonetic spelling (universal for all voices):

Your verification code is: 
  Ay <break time="400ms"/>
  seven <break time="400ms"/>
  Kay <break time="400ms"/>
  two <break time="400ms"/>
  Em <break time="400ms"/>
  nine.
Voice: AndrewPRO View & edit source

Pro tip — repeat the code twice so the user has time to write it down:

Your verification code is: 
  Ay <break time="400ms"/> seven <break time="400ms"/> Kay <break time="400ms"/> 
  two <break time="400ms"/> Em <break time="400ms"/> nine.
  <break time="800ms"/>
  Again: Ay <break time="400ms"/> seven <break time="400ms"/> Kay <break time="400ms"/> 
  two <break time="400ms"/> Em <break time="400ms"/> nine.
Voice: AndrewPRO View & edit source

Email Address

Email is a classic case where you need to dictate:

Send your application to: 
  mike <break time="300ms"/>
  dot <break time="300ms"/>
  johnson <break time="300ms"/>
  at <break time="300ms"/>
  gmail <break time="300ms"/>
  dot <break time="300ms"/>
  com.
  <break time="500ms"/>
  That's: 
  <say-as interpret-as="characters">M</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">I</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">K</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">E</say-as> <break time="500ms"/>
  dot <break time="300ms"/>
  <say-as interpret-as="characters">J</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">O</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">H</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">N</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">S</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">O</say-as> <break time="300ms"/>
  <say-as interpret-as="characters">N</say-as> <break time="500ms"/>
  at gmail dot com.
Voice: AndrewPRO View & edit source

First pronounce it naturally, then spell it out — double insurance.

Abbreviations: Word or Letters?

Some abbreviations are read as words (NASA, NATO), others letter by letter (FBI, CEO). TTS doesn't always guess correctly.

FBI — letter by letter with <say-as>:

The <say-as interpret-as="characters">FBI</say-as> investigation revealed new evidence.
Voice: AndrewPRO View & edit source

FBI — phonetic spelling (for HD voices):

The Eff Bee Eye investigation revealed new evidence.
Voice: Achernar USHD View & edit source

NASA — as a word (usually TTS handles this, but if not):

The <phoneme alphabet="ipa" ph="ˈnæsə">NASA</phoneme> mission launched successfully.
Voice: AndrewPRO View & edit source

Tricky case — URL:

Visit our website at: 
  <say-as interpret-as="characters">AWS</say-as> 
  dot amazon dot com.
Voice: AndrewPRO View & edit source

Or phonetically:

Visit our website at: 
  Ay Double-You Ess 
  dot amazon dot com.
Voice: AndrewPRO View & edit source

Who Will Find This Useful

Spelling in voiceover is needed more often than you'd think:

IVR and voice assistants — verification codes, order numbers, product SKUs. When a bot says "your tracking number," the customer needs to hear each letter and digit separately.

E-learning and language courses — spelling practice, dictations, spelling bees. Students need to hear clear pronunciation of each letter.

Podcasts and videos — when you need to dictate an email, unusual name, or website URL. "Write to us at info@..." — and then spell it out.

Audiobooks and documentation — abbreviations that are read letter by letter (FBI, CEO), serial numbers in technical texts.

Conclusion

SSML gives you full control over how TTS pronounces text. For spelling, you have at least 5 tools — from simple pauses to precise IPA transcription.

The golden rule: listen to the result. Different voices and engines behave differently. What sounds perfect with one voice may need adjustments for another.

Try these methods in the SpeechGen editor — paste the code, choose a voice, and hear the result instantly.


Have questions about SSML or need help with a specific case? Reach out to support — we're happy to help.

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies