Using SSML's <Emphasis> Tag: A Guide

, 05-09-2023

This guide elaborates on the SSML emphasis tag, illustrating its utility in adding stress or prominence to specific words or phrases in text-to-speech synthesis. By understanding its distinction from the prosody tag and tackling potential issues, users can effectively leverage the emphasis tag for more dynamic speech outputs.

Yesterday the weather was rainy. <emphasis level="strong">And today the weather is sunny.</emphasis>




Use the <emphasis> tags exclusively for entire sentences. Wrapping individual words inside a sentence with these tags might introduce undesired breaks in the spoken output. You can apply the <emphasis> tag to a specific word, but you might then have to eliminate any unnecessary pauses using audio editing software. If you're not skilled in audio editing, it's advisable not to use the emphasis tag for isolated phrases within a sentence.

How it works

When you want to stress a sentence in text-to-speech synthesis, the <emphasis> tag is your go-to. Using this tag, you can alter both the loudness and the speed of the speech. For instance, adding more emphasis via the <emphasis> tag prompts SpeechGen to pronounce the content in a more pronounced, slower manner. On the other hand, minimizing the emphasis results in a faster, softer utterance. To dictate how much emphasis you want, you'll utilize the level attribute.

The possible values for the level attribute are:

  • Strong: This heightens the volume and reduces the speed, making the speech notably louder and more drawn-out.

  • Moderate: This value also raises the volume and lowers the pace, but not as intensely as 'Strong'. It's the default setting.

  • Reduced: Here, the volume drops while the speed rises, leading to a gentler, quicker speech delivery.

  • None. There's no effect.

Differences from Prosody

While both <emphasis> and <prosody> tags modify speech output, they serve different purposes:

  1. Emphasis: The emphasis tag adds stress or prominence to the enclosed text. You can specify the level of emphasis.
  2. Prosody: The prosody tag adjusts pitch, rate, and volume. It provides more granular control over speech attributes.


International Telegram chat @speechgen

Personal support in Telegram @speechgen_alex


We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies