SSMD Syntax Reference

This page provides a complete reference for SSMD markup syntax.

Text and Emphasis

SSMD supports all four SSML emphasis levels for fine-grained control over speech emphasis.

Moderate Emphasis

Use single asterisks for moderate (default) emphasis:

ssmd.to_ssml("This is *important*")
# → <speak>This is <emphasis>important</emphasis></speak>

Strong Emphasis

Use double asterisks for strong emphasis:

ssmd.to_ssml("This is **very important**")
# → <speak>This is <emphasis level="strong">very important</emphasis></speak>

Reduced Emphasis

Use single underscores for reduced (subtle) emphasis:

ssmd.to_ssml("This is _less important_")
# → <speak>This is <emphasis level="reduced">less important</emphasis></speak>

No Emphasis

Use explicit annotation syntax for no emphasis (rarely used):

ssmd.to_ssml('[monotone reading]{emphasis="none"}')
# → <speak><emphasis level="none">monotone reading</emphasis></speak>

Note

The “none” emphasis level is rarely needed in practice. It explicitly instructs the TTS engine to speak without any emphasis, which can be useful for robotic or monotone speech effects.

Breaks and Pauses

Time-Based Breaks

Specify duration in milliseconds or seconds using … followed by a time value:

ssmd.to_ssml("Wait ...500ms please")
# → <speak>Wait <break time="500ms"/> please</speak>

ssmd.to_ssml("Wait ...2s please")
# → <speak>Wait <break time="2s"/> please</speak>

Note

Bare … (without a time or strength code) is NOT treated as a break. It will be preserved as literal ellipsis in your text.

Strength-Based Breaks

Use strength codes for semantic pauses:

ssmd.to_ssml("Hello ...n world")   # none
ssmd.to_ssml("Hello ...w world")   # weak (x-weak)
ssmd.to_ssml("Hello ...c world")   # comma (medium)
ssmd.to_ssml("Hello ...s world")   # sentence (strong)
ssmd.to_ssml("Hello ...p world")   # paragraph (x-strong)

Strength codes:

n - none
w - weak (x-weak)
c - comma (medium)
s - sentence (strong)
p - paragraph (x-strong)

Paragraphs

Blank lines separate paragraphs:

text = """
This is the first paragraph.
Still in first paragraph.

This is the second paragraph.
"""

ssmd.to_ssml(text)
# → <speak>This is the first paragraph.
#    Still in first paragraph.
#    This is the second paragraph.</speak>

Headings

Use hash marks for headings (configurable):

from ssmd import Document

text = """
# Main Title
Content here.

## Subtitle
More content.
"""

doc = Document(text, config={
   'heading_levels': {
      1: [('emphasis', 'strong'), ('pause', '500ms')],
      2: [('emphasis', 'moderate')]
   }
})


ssml = doc.to_ssml()

Annotations

Annotations use the format [text]{key="value"} where annotations can be:

Language Codes

Specify language with ISO codes:

# Auto-complete to full locale
ssmd.to_ssml('[Bonjour]{lang="fr"}')
# → <speak><lang xml:lang="fr-FR">Bonjour</lang></speak>

# Explicit locale
ssmd.to_ssml('[Hello]{lang="en-GB"}')
# → <speak><lang xml:lang="en-GB">Hello</lang></speak>

Common language codes:

en → en-US
fr → fr-FR
de → de-DE
es → es-ES
it → it-IT
ja → ja-JP
zh → zh-CN
ru → ru-RU

Voice Selection

SSMD supports two ways to specify voices: inline annotations for short phrases and block directives for longer passages (ideal for dialogue and scripts).

Inline Voice Annotations

Perfect for short voice changes within a sentence:

# Simple voice name
ssmd.to_ssml('[Hello]{voice="Joanna"}')
# → <speak><voice name="Joanna">Hello</voice></speak>

# Cloud TTS voice (e.g., Google Wavenet, AWS Polly)
ssmd.to_ssml('[Hello]{voice="en-US-Wavenet-A"}')
# → <speak><voice name="en-US-Wavenet-A">Hello</voice></speak>

# Language and gender attributes
ssmd.to_ssml('[Bonjour]{voice-lang="fr-FR" gender="female"}')
# → <speak><voice language="fr-FR" gender="female">Bonjour</voice></speak>

# All attributes (language, gender, variant)
ssmd.to_ssml('[Text]{voice-lang="en-GB" gender="male" variant="1"}')
# → <speak><voice language="en-GB" gender="male" variant="1">Text</voice></speak>

Voice attributes:

voice="NAME" - Voice name (e.g., Joanna, en-US-Wavenet-A)
voice-lang="LANG" - Language code (e.g., en-GB)
gender="GENDER" - male, female, or neutral
variant="NUMBER" - Variant number for tiebreaking

Voice Directives (Block Syntax)

Perfect for dialogue, podcasts, and scripts with multiple speakers:

script = """
<div voice="af_sarah">
Welcome to Tech Talk! I'm Sarah, and today we're diving into the
fascinating world of text-to-speech technology.
...s
</div>

<div voice="am_michael">
And I'm Michael! We've got an amazing episode lined up. The advances
in neural TTS have been incredible lately.
...s
</div>

<div voice="af_sarah">
So what are we covering today?
</div>
"""

ssmd.to_ssml(script)
# Each voice directive creates a separate voice block in SSML

Voice directives support all voice attributes:

# Language and gender
multilingual = """
<div voice-lang="fr-FR" gender="female">
Bonjour! Comment allez-vous aujourd'hui?
</div>

<div voice-lang="en-GB" gender="male">
Hello there! Lovely weather we're having.
</div>

<div voice-lang="es-ES" gender="female" variant="1">
¡Hola! ¿Cómo estás?
</div>
"""
Voice directive features:

Use <div voice="name"> block syntax

Supports all attributes: language, gender, variant

Applies to all text until the next directive or paragraph break

Automatically detected on SSML→SSMD conversion for long voice blocks

Much more readable than inline annotations for dialogue

Mixing inline and directive syntax:
# Block directive for main speaker, inline for interruptions
text = """
<div voice="sarah">
Hello everyone, [but wait!]{voice="michael"} Michael interrupts...
</div>

<div voice="michael">
Sorry, I had to jump in there!
</div>
"""

Phonetic Pronunciation

IPA (International Phonetic Alphabet)

ssmd.to_ssml('[tomato]{ph="təˈmeɪtoʊ"}')
# → <speak><phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme></speak>

ssmd.to_ssml('[hello]{ipa="həˈloʊ"}')
# → <speak><phoneme alphabet="ipa" ph="həˈloʊ">hello</phoneme></speak>

X-SAMPA (Extended Speech Assessment Methods Phonetic Alphabet)

ssmd.to_ssml('[dictionary]{sampa="dIkS@n@ri"}')
# → <speak><phoneme alphabet="x-sampa" ph="dIkS@n@ri">dictionary</phoneme></speak>

Substitution (Aliases)

Replace text with alternative pronunciation:

ssmd.to_ssml('[H2O]{sub="water"}')
# → <speak><sub alias="water">H2O</sub></speak>

ssmd.to_ssml('[AWS]{sub="Amazon Web Services"}')
# → <speak><sub alias="Amazon Web Services">AWS</sub></speak>

ssmd.to_ssml('[NATO]{sub="North Atlantic Treaty Organization"}')

Say-As Interpretations

Control how text is interpreted:

# Telephone number
ssmd.to_ssml('[+1-555-0123]{as="telephone"}')

# Date with format
ssmd.to_ssml('[31.12.2024]{as="date" format="dd.mm.yyyy"}')

# Say-as with detail attribute (verbosity control)
ssmd.to_ssml('[123]{as="cardinal" detail="2"}')
# → <speak><say-as interpret-as="cardinal" detail="2">123</say-as></speak>

ssmd.to_ssml('[12/31/2024]{as="date" format="mdy" detail="1"}')
# → <speak><say-as interpret-as="date" format="mdy" detail="1">12/31/2024</say-as></speak>

# Spell out characters
ssmd.to_ssml('[NASA]{as="character"}')

# Number types
ssmd.to_ssml('[123]{as="cardinal"}')     # one hundred twenty-three
ssmd.to_ssml('[1st]{as="ordinal"}')      # first
ssmd.to_ssml('[123]{as="digits"}')       # one two three
ssmd.to_ssml('[3.14]{as="fraction"}')    # three point one four

# Time
ssmd.to_ssml('[14:30]{as="time"}')

# Expletive (censored/beeped)
ssmd.to_ssml('[damn]{as="expletive"}')

Supported interpret-as values:

character - Spell out
cardinal - Number
ordinal - First, second, etc.
digits - Individual digits
fraction - Decimal numbers
unit - Measurements
date - Dates
time - Time values
telephone - Phone numbers
address - Street addresses
expletive - Censored words

The detail attribute (1-2) controls verbosity level and is platform-specific. Higher values generally provide more detailed pronunciation.

Prosody (Voice Control)

Use prosody annotations with explicit key/value pairs:

ssmd.to_ssml('[loud]{volume="loud"}')
ssmd.to_ssml('[slow]{rate="slow"}')
ssmd.to_ssml('[high]{pitch="high"}')
ssmd.to_ssml('[loud and fast]{volume="loud" rate="fast"}')

Scale-Based Values (1-5)

ssmd.to_ssml('[extra loud]{volume="5"}')
ssmd.to_ssml('[extra fast]{rate="5"}')
ssmd.to_ssml('[extra high]{pitch="5"}')

Scale mapping:

Volume: 0=silent, 1=x-soft, 2=soft, 3=medium, 4=loud, 5=x-loud
Rate: 1=x-slow, 2=slow, 3=medium, 4=fast, 5=x-fast
Pitch: 1=x-low, 2=low, 3=medium, 4=high, 5=x-high

Relative Values

# Decibels for volume
ssmd.to_ssml('[louder]{volume="+6dB"}')
ssmd.to_ssml('[quieter]{volume="-3dB"}')

# Percentages for rate and pitch
ssmd.to_ssml('[faster]{rate="+20%"}')
ssmd.to_ssml('[slower]{rate="-10%"}')
ssmd.to_ssml('[higher]{pitch="+15%"}')
ssmd.to_ssml('[lower]{pitch="-5%"}')

Audio Files

Basic Audio

# With description
ssmd.to_ssml('[doorbell]{src="https://example.com/sounds/bell.mp3"}')
# → <audio src="https://example.com/sounds/bell.mp3"><desc>doorbell</desc></audio>

# No description
ssmd.to_ssml('[]{src="beep.mp3"}')
# → <audio src="beep.mp3"></audio>

Audio with Fallback

ssmd.to_ssml('[cat purring]{src="cat.ogg" alt="Sound file not loaded"}')
# → <audio src="cat.ogg"><desc>cat purring</desc>Sound file not loaded</audio>

The fallback text is spoken if the audio file can’t be played.

Advanced Audio Attributes

SSMD supports advanced audio control through SSML attributes:

Audio Clipping

Play a portion of an audio file by specifying start and end times:

ssmd.to_ssml('[music]{src="song.mp3" clip="5s-30s"}')
# → <audio src="song.mp3" clipBegin="5s" clipEnd="30s"><desc>music</desc></audio>

ssmd.to_ssml('[intro]{src="podcast.mp3" clip="0s-10s"}')
# → <audio src="podcast.mp3" clipBegin="0s" clipEnd="10s"><desc>intro</desc></audio>

Speed Control

Adjust playback speed using percentages:

ssmd.to_ssml('[announcement]{src="speech.mp3" speed="150%"}')
# → <audio src="speech.mp3" speed="150%"><desc>announcement</desc></audio>

ssmd.to_ssml('[slow]{src="message.mp3" speed="80%"}')
# → <audio src="message.mp3" speed="80%"><desc>slow</desc></audio>

Repeat Audio

Repeat audio playback a specific number of times:

ssmd.to_ssml('[jingle]{src="ad.mp3" repeat="3"}')
# → <audio src="ad.mp3" repeatCount="3"><desc>jingle</desc></audio>

ssmd.to_ssml('[beep]{src="alert.mp3" repeat="5"}')
# → <audio src="alert.mp3" repeatCount="5"><desc>beep</desc></audio>

Volume Adjustment

Control audio volume using decibel adjustment:

ssmd.to_ssml('[alarm]{src="alert.mp3" level="+6dB"}')
# → <audio src="alert.mp3" soundLevel="+6dB"><desc>alarm</desc></audio>

ssmd.to_ssml('[background]{src="music.mp3" level="-3dB"}')
# → <audio src="music.mp3" soundLevel="-3dB"><desc>background</desc></audio>

Combining Attributes

Multiple audio attributes can be combined with fallback text:

ssmd.to_ssml('[bg music]{src="music.mp3" clip="0s-10s" speed="120%" level="-3dB" alt="Fallback text"}')
# → <audio src="music.mp3" clipBegin="0s" clipEnd="10s" speed="120%" soundLevel="-3dB">
#    <desc>bg music</desc>Fallback text</audio>

ssmd.to_ssml('[effect]{src="sound.mp3" clip="2s-5s" repeat="2" alt="Sound unavailable"}')
# → <audio src="sound.mp3" clipBegin="2s" clipEnd="5s" repeatCount="2">
#    <desc>effect</desc>Sound unavailable</audio>

Note

Audio attribute support varies by TTS platform. Amazon Polly and Google Cloud TTS support most of these features. Always test with your specific TTS engine.

Markers

Markers create synchronization points for events:

ssmd.to_ssml('I always wanted a @animal cat as a pet.')
# → <speak>I always wanted a <mark name="animal"/> cat as a pet.</speak>

ssmd.to_ssml('Click @here to continue.')
# → <speak>Click <mark name="here"/> to continue.</speak>

Markers are removed when stripping to plain text:

ssmd.to_text('Click @here now')
# → Click now

Extensions

Platform-specific extensions allow you to use TTS features beyond standard SSML.

Amazon Polly Extensions

Amazon Polly provides effects like whispering and dynamic range compression:

# Whisper effect
ssmd.to_ssml('[secret message]{ext="whisper"}')
# → <amazon:effect name="whispered">secret message</amazon:effect>

# Dynamic range compression (for voice over music)
ssmd.to_ssml('[announcement]{ext="drc"}')
# → <amazon:effect name="drc">announcement</amazon:effect>

Google Cloud TTS Speaking Styles

Google Cloud TTS supports speaking styles for Neural2 and Studio voices. You can configure these using SSMD’s extension system:

from ssmd import Document

# Configure Google TTS styles as extensions
doc = Document(config={
    'extensions': {
        'cheerful': lambda text: f'<google:style name="cheerful">{text}</google:style>',
        'calm': lambda text: f'<google:style name="calm">{text}</google:style>',
        'empathetic': lambda text: f'<google:style name="empathetic">{text}</google:style>',
        'apologetic': lambda text: f'<google:style name="apologetic">{text}</google:style>',
        'firm': lambda text: f'<google:style name="firm">{text}</google:style>',
    }
})

# Use styles in your content
doc.add_sentence("[Welcome to our service!]{ext=\"cheerful\"}")
doc.add_sentence("[We apologize for the inconvenience.]{ext=\"apologetic\"}")
doc.add_sentence("[Please remain calm.]{ext=\"calm\"}")

ssml = doc.to_ssml()
# → <speak>
#    <google:style name="cheerful">Welcome to our service!</google:style>
#    <google:style name="apologetic">We apologize for the inconvenience.</google:style>
#    <google:style name="calm">Please remain calm.</google:style>
#    </speak>

Available Google TTS speaking styles:

cheerful - Upbeat and positive tone
calm - Relaxed and soothing tone
empathetic - Understanding and compassionate tone
apologetic - Sorry and regretful tone
firm - Confident and authoritative tone
news - Professional news anchor tone (some voices)
conversational - Natural conversation tone (some voices)

Note

Google TTS speaking styles are only supported by specific Neural2 and Studio voices. See the Google Cloud TTS documentation for voice compatibility.

Custom Extensions

You can define your own extensions for any custom SSML tags your TTS platform supports:

from ssmd import Document

doc = Document(config={
    'extensions': {
        'robotic': lambda text: f'<voice-transformation type="robot">{text}</voice-transformation>',
        'echo': lambda text: f'<audio-effect type="echo">{text}</audio-effect>',
    }
})

doc.add_sentence("[Hello]{ext=\"robotic\"}")
doc.add_sentence("[world]{ext=\"echo\"}")

For a complete Google TTS styles example, see examples/google_tts_styles.py.

Combining Multiple Annotations

Multiple annotations can be space-separated inside the braces:

ssmd.to_ssml('[Bonjour]{lang="fr" volume="5" rate="2"}')
# → <lang xml:lang="fr-FR"><prosody volume="x-loud" rate="slow">Bonjour</prosody></lang>

ssmd.to_ssml('[important]{volume="5" as="character"}')
# → <prosody volume="x-loud"><say-as interpret-as="character">important</say-as></prosody>

Escaping

XML Special Characters

XML special characters are automatically escaped:

ssmd.to_ssml('5 < 10 & 10 > 5')
# → <speak>5 &lt; 10 &amp; 10 &gt; 5</speak>

Security

All user input is automatically sanitized to prevent XML injection attacks. Special characters in both text content and annotation parameters are properly escaped:

# Malicious input is safely escaped
ssmd.to_ssml('[text]{sub="value<script>alert(1)</script>"}')
# → <speak><sub alias="value&lt;script&gt;alert(1)&lt;/script&gt;">text</sub></speak>

The library ensures:

XML validity: Output is always valid, well-formed XML
Injection prevention: User input cannot break out of attribute values or inject tags
Automatic escaping: All special characters (<, >, &, ", ') are escaped

You can safely use SSMD with untrusted user input in TTS applications.

Literal Asterisks

To include literal asterisks without emphasis, escape them or use different patterns:

# These won't be treated as emphasis
ssmd.to_ssml('2 * 3 = 6')
# → <speak>2 * 3 = 6</speak>

ssmd.to_ssml('* list item')
# → <speak>* list item</speak>