SSML Markup Capabilities for Speech Synthesis

, 11-10-2023

SSML (Speech Synthesis Markup Language) is a markup language. It is used to describe text for converting into speech by neural networks.

  • What is its purpose? With SSML, you can control tone, accent, pronunciation, and add pauses and other audio effects. This makes the generated speech sound more natural and expressive.
  • Usage goals: The main goal is to make synthesized speech sound natural and expressive. SSML also ensures accurate pronunciation of numbers, dates, phone numbers, and other specific information.
  • Who created it? SSML was developed by the World Wide Web Consortium (W3C). This organization sets web standards.
  • What is its mission? SSML aims to standardize and enhance speech synthesis methods in the digital space.

For SSML documentation on the official W3C website: https://www.w3.org/TR/speech-synthesis/

Basic Rules for Writing SSML Tags

  • SSML tags are usually enclosed in angle brackets, like in HTML. Example: <speak>text</speak>.
  • Typically, there should be an opening and closing tag (except for <break>).
  • Within tags, you can use attributes to adjust pronunciation settings.
  • Some tags can be nested within others.
  • SSML tag and attribute syntax follows XML standards.

Supported Tags

SpeechGen supports the most common SSML tags. Some voices might not follow certain tag attributes. Specific details are in the documentation for each parameter.

Below is a list of main tags with links to detailed documentation for each.

Break

Documentation link

Break – Pause This is the most popular tag on SpeechGen. It allows you to control pause duration.

<break time="2s"/>

Example:

 
 
00:04

Say-as

The primary SSML tag with many settings is say-as. It manages the pronunciation of various types of information.

Spell-out – Spell the text letter by letter

Documentation link

<say-as interpret-as="spell-out">Ashlee</say-as>

Example:

 
 
00:03

Cardinal

Documentation link

<say-as interpret-as="cardinal">123456789</say-as>

Example:

 
 
00:06

Ordinal

Documentation link

<say-as interpret-as="ordinal">3</say-as>

Example:

 
 
00:01

Fraction

Documentation link

<say-as interpret-as="fraction">3 1/2</say-as>

Example:

 
 
00:02

 

Date

Documentation link

<say-as interpret-as="date" format="ymd" detail="1">1969.07.21</say-as>

Example:

 
 
00:04

Time

Documentation link

<say-as interpret-as="time" format="hms12">2:30</say-as>

Example:

 
 
00:02

Telephone

Documentation link

My number is <say-as interpret-as="telephone">8883451715</say-as>

Example:

 
 
00:05

Currency

Documentation link

<say-as interpret-as="currency">79.4 USD</say-as>

Example:

 
 
00:03

Alias

Documentation link

<sub alias="Doctor">Dr.</sub> Smith.

Example:

 
 
00:02

Prosody

Documentation link

<prosody pitch="x-low">I'm speaking this text with an x-low constant pitch</prosody>

Example:

 
 
00:04

Emphasis

Documentation link

<emphasis level="strong">And today the weather is sunny.</emphasis>

Example:

 
 
00:05

Phoneme

Documentation link

<phoneme alphabet="ipa" ph="haɪˈpɜːrbəli">Hyperbole</phoneme>

Example:

 
 
00:01
 

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies