SSML Markup Capabilities for Speech Synthesis
05-09-2023 ,
11-10-2023
SSML (Speech Synthesis Markup Language) is a markup language. It is used to describe text for converting into speech by neural networks.
- What is its purpose? With SSML, you can control tone, accent, pronunciation, and add pauses and other audio effects. This makes the generated speech sound more natural and expressive.
- Usage goals: The main goal is to make synthesized speech sound natural and expressive. SSML also ensures accurate pronunciation of numbers, dates, phone numbers, and other specific information.
- Who created it? SSML was developed by the World Wide Web Consortium (W3C). This organization sets web standards.
- What is its mission? SSML aims to standardize and enhance speech synthesis methods in the digital space.
For SSML documentation on the official W3C website: https://www.w3.org/TR/speech-synthesis/
Basic Rules for Writing SSML Tags
- SSML tags are usually enclosed in angle brackets, like in HTML. Example: <speak>text</speak>.
- Typically, there should be an opening and closing tag (except for <break>).
- Within tags, you can use attributes to adjust pronunciation settings.
- Some tags can be nested within others.
- SSML tag and attribute syntax follows XML standards.
Supported Tags
SpeechGen supports the most common SSML tags. Some voices might not follow certain tag attributes. Specific details are in the documentation for each parameter.
Below is a list of main tags with links to detailed documentation for each.
Break
Documentation link
Break – Pause This is the most popular tag on SpeechGen. It allows you to control pause duration.
<break time="2s"/>
Example:
Say-as
The primary SSML tag with many settings is say-as. It manages the pronunciation of various types of information.
Spell-out – Spell the text letter by letter
Documentation link
<say-as interpret-as="spell-out">Ashlee</say-as>
Example:
Cardinal
Documentation link
<say-as interpret-as="cardinal">123456789</say-as>
Example:
Ordinal
Documentation link
<say-as interpret-as="ordinal">3</say-as>
Example:
Fraction
Documentation link
<say-as interpret-as="fraction">3 1/2</say-as>
Example:
Date
Documentation link
<say-as interpret-as="date" format="ymd" detail="1">1969.07.21</say-as>
Example:
Time
Documentation link
<say-as interpret-as="time" format="hms12">2:30</say-as>
Example:
Telephone
Documentation link
My number is <say-as interpret-as="telephone">8883451715</say-as>
Example:
Currency
Documentation link
<say-as interpret-as="currency">79.4 USD</say-as>
Example:
Alias
Documentation link
<sub alias="Doctor">Dr.</sub> Smith.
Example:
Prosody
Documentation link
<prosody pitch="x-low">I'm speaking this text with an x-low constant pitch</prosody>
Example:
Emphasis
Documentation link
<emphasis level="strong">And today the weather is sunny.</emphasis>
Example:
Phoneme
Documentation link
<phoneme alphabet="ipa" ph="haɪˈpɜːrbəli">Hyperbole</phoneme>
Example: