SSMD Syntax Reference
This page provides a complete reference for SSMD markup syntax.
Text and Emphasis
SSMD supports all four SSML emphasis levels for fine-grained control over speech emphasis.
Moderate Emphasis
Use single asterisks for moderate (default) emphasis:
ssmd.to_ssml("This is *important*")
# → <speak>This is <emphasis>important</emphasis></speak>
Strong Emphasis
Use double asterisks for strong emphasis:
ssmd.to_ssml("This is **very important**")
# → <speak>This is <emphasis level="strong">very important</emphasis></speak>
Reduced Emphasis
Use single underscores for reduced (subtle) emphasis:
ssmd.to_ssml("This is _less important_")
# → <speak>This is <emphasis level="reduced">less important</emphasis></speak>
No Emphasis
Use explicit annotation syntax for no emphasis (rarely used):
ssmd.to_ssml('[monotone reading]{emphasis="none"}')
# → <speak><emphasis level="none">monotone reading</emphasis></speak>
Note
The “none” emphasis level is rarely needed in practice. It explicitly instructs the TTS engine to speak without any emphasis, which can be useful for robotic or monotone speech effects.
Breaks and Pauses
Time-Based Breaks
Specify duration in milliseconds or seconds using … followed by a time value:
ssmd.to_ssml("Wait ...500ms please")
# → <speak>Wait <break time="500ms"/> please</speak>
ssmd.to_ssml("Wait ...2s please")
# → <speak>Wait <break time="2s"/> please</speak>
Note
Bare … (without a time or strength code) is NOT treated as a break. It will be preserved as literal ellipsis in your text.
Strength-Based Breaks
Use strength codes for semantic pauses:
ssmd.to_ssml("Hello ...n world") # none
ssmd.to_ssml("Hello ...w world") # weak (x-weak)
ssmd.to_ssml("Hello ...c world") # comma (medium)
ssmd.to_ssml("Hello ...s world") # sentence (strong)
ssmd.to_ssml("Hello ...p world") # paragraph (x-strong)
Strength codes:
n- nonew- weak (x-weak)c- comma (medium)s- sentence (strong)p- paragraph (x-strong)
Paragraphs
Blank lines separate paragraphs:
text = """
This is the first paragraph.
Still in first paragraph.
This is the second paragraph.
"""
ssmd.to_ssml(text)
# → <speak>This is the first paragraph.
# Still in first paragraph.
# This is the second paragraph.</speak>
Headings
Use hash marks for headings (configurable):
from ssmd import Document
text = """
# Main Title
Content here.
## Subtitle
More content.
"""
doc = Document(text, config={
'heading_levels': {
1: [('emphasis', 'strong'), ('pause', '500ms')],
2: [('emphasis', 'moderate')]
}
})
ssml = doc.to_ssml()
Annotations
Annotations use the format [text]{key="value"} where annotations can be:
Language Codes
Specify language with ISO codes:
# Auto-complete to full locale
ssmd.to_ssml('[Bonjour]{lang="fr"}')
# → <speak><lang xml:lang="fr-FR">Bonjour</lang></speak>
# Explicit locale
ssmd.to_ssml('[Hello]{lang="en-GB"}')
# → <speak><lang xml:lang="en-GB">Hello</lang></speak>
Common language codes:
en→ en-USfr→ fr-FRde→ de-DEes→ es-ESit→ it-ITja→ ja-JPzh→ zh-CNru→ ru-RU
Voice Selection
SSMD supports two ways to specify voices: inline annotations for short phrases and block directives for longer passages (ideal for dialogue and scripts).
Inline Voice Annotations
Perfect for short voice changes within a sentence:
# Simple voice name
ssmd.to_ssml('[Hello]{voice="Joanna"}')
# → <speak><voice name="Joanna">Hello</voice></speak>
# Cloud TTS voice (e.g., Google Wavenet, AWS Polly)
ssmd.to_ssml('[Hello]{voice="en-US-Wavenet-A"}')
# → <speak><voice name="en-US-Wavenet-A">Hello</voice></speak>
# Language and gender attributes
ssmd.to_ssml('[Bonjour]{voice-lang="fr-FR" gender="female"}')
# → <speak><voice language="fr-FR" gender="female">Bonjour</voice></speak>
# All attributes (language, gender, variant)
ssmd.to_ssml('[Text]{voice-lang="en-GB" gender="male" variant="1"}')
# → <speak><voice language="en-GB" gender="male" variant="1">Text</voice></speak>
Voice attributes:
voice="NAME"- Voice name (e.g., Joanna, en-US-Wavenet-A)voice-lang="LANG"- Language code (e.g., en-GB)gender="GENDER"- male, female, or neutralvariant="NUMBER"- Variant number for tiebreaking
Voice Directives (Block Syntax)
Perfect for dialogue, podcasts, and scripts with multiple speakers:
script = """ <div voice="af_sarah"> Welcome to Tech Talk! I'm Sarah, and today we're diving into the fascinating world of text-to-speech technology. ...s </div> <div voice="am_michael"> And I'm Michael! We've got an amazing episode lined up. The advances in neural TTS have been incredible lately. ...s </div> <div voice="af_sarah"> So what are we covering today? </div> """ ssmd.to_ssml(script) # Each voice directive creates a separate voice block in SSML
Voice directives support all voice attributes:
# Language and gender multilingual = """ <div voice-lang="fr-FR" gender="female"> Bonjour! Comment allez-vous aujourd'hui? </div> <div voice-lang="en-GB" gender="male"> Hello there! Lovely weather we're having. </div> <div voice-lang="es-ES" gender="female" variant="1"> ¡Hola! ¿Cómo estás? </div> """Voice directive features:
Use
<div voice="name">block syntaxSupports all attributes: language, gender, variant
Applies to all text until the next directive or paragraph break
Automatically detected on SSML→SSMD conversion for long voice blocks
Much more readable than inline annotations for dialogue
Mixing inline and directive syntax:
# Block directive for main speaker, inline for interruptions text = """ <div voice="sarah"> Hello everyone, [but wait!]{voice="michael"} Michael interrupts... </div> <div voice="michael"> Sorry, I had to jump in there! </div> """
Phonetic Pronunciation
IPA (International Phonetic Alphabet)
ssmd.to_ssml('[tomato]{ph="təˈmeɪtoʊ"}')
# → <speak><phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme></speak>
ssmd.to_ssml('[hello]{ipa="həˈloʊ"}')
# → <speak><phoneme alphabet="ipa" ph="həˈloʊ">hello</phoneme></speak>
X-SAMPA (Extended Speech Assessment Methods Phonetic Alphabet)
ssmd.to_ssml('[dictionary]{sampa="dIkS@n@ri"}')
# → <speak><phoneme alphabet="x-sampa" ph="dIkS@n@ri">dictionary</phoneme></speak>
Substitution (Aliases)
Replace text with alternative pronunciation:
ssmd.to_ssml('[H2O]{sub="water"}')
# → <speak><sub alias="water">H2O</sub></speak>
ssmd.to_ssml('[AWS]{sub="Amazon Web Services"}')
# → <speak><sub alias="Amazon Web Services">AWS</sub></speak>
ssmd.to_ssml('[NATO]{sub="North Atlantic Treaty Organization"}')
Say-As Interpretations
Control how text is interpreted:
# Telephone number
ssmd.to_ssml('[+1-555-0123]{as="telephone"}')
# Date with format
ssmd.to_ssml('[31.12.2024]{as="date" format="dd.mm.yyyy"}')
# Say-as with detail attribute (verbosity control)
ssmd.to_ssml('[123]{as="cardinal" detail="2"}')
# → <speak><say-as interpret-as="cardinal" detail="2">123</say-as></speak>
ssmd.to_ssml('[12/31/2024]{as="date" format="mdy" detail="1"}')
# → <speak><say-as interpret-as="date" format="mdy" detail="1">12/31/2024</say-as></speak>
# Spell out characters
ssmd.to_ssml('[NASA]{as="character"}')
# Number types
ssmd.to_ssml('[123]{as="cardinal"}') # one hundred twenty-three
ssmd.to_ssml('[1st]{as="ordinal"}') # first
ssmd.to_ssml('[123]{as="digits"}') # one two three
ssmd.to_ssml('[3.14]{as="fraction"}') # three point one four
# Time
ssmd.to_ssml('[14:30]{as="time"}')
# Expletive (censored/beeped)
ssmd.to_ssml('[damn]{as="expletive"}')
Supported interpret-as values:
character- Spell outcardinal- Numberordinal- First, second, etc.digits- Individual digitsfraction- Decimal numbersunit- Measurementsdate- Datestime- Time valuestelephone- Phone numbersaddress- Street addressesexpletive- Censored words
The detail attribute (1-2) controls verbosity level and is platform-specific.
Higher values generally provide more detailed pronunciation.
Prosody (Voice Control)
Use prosody annotations with explicit key/value pairs:
ssmd.to_ssml('[loud]{volume="loud"}')
ssmd.to_ssml('[slow]{rate="slow"}')
ssmd.to_ssml('[high]{pitch="high"}')
ssmd.to_ssml('[loud and fast]{volume="loud" rate="fast"}')
Scale-Based Values (1-5)
ssmd.to_ssml('[extra loud]{volume="5"}')
ssmd.to_ssml('[extra fast]{rate="5"}')
ssmd.to_ssml('[extra high]{pitch="5"}')
Scale mapping:
Volume: 0=silent, 1=x-soft, 2=soft, 3=medium, 4=loud, 5=x-loud
Rate: 1=x-slow, 2=slow, 3=medium, 4=fast, 5=x-fast
Pitch: 1=x-low, 2=low, 3=medium, 4=high, 5=x-high
Relative Values
# Decibels for volume
ssmd.to_ssml('[louder]{volume="+6dB"}')
ssmd.to_ssml('[quieter]{volume="-3dB"}')
# Percentages for rate and pitch
ssmd.to_ssml('[faster]{rate="+20%"}')
ssmd.to_ssml('[slower]{rate="-10%"}')
ssmd.to_ssml('[higher]{pitch="+15%"}')
ssmd.to_ssml('[lower]{pitch="-5%"}')
Audio Files
Basic Audio
# With description
ssmd.to_ssml('[doorbell]{src="https://example.com/sounds/bell.mp3"}')
# → <audio src="https://example.com/sounds/bell.mp3"><desc>doorbell</desc></audio>
# No description
ssmd.to_ssml('[]{src="beep.mp3"}')
# → <audio src="beep.mp3"></audio>
Audio with Fallback
ssmd.to_ssml('[cat purring]{src="cat.ogg" alt="Sound file not loaded"}')
# → <audio src="cat.ogg"><desc>cat purring</desc>Sound file not loaded</audio>
The fallback text is spoken if the audio file can’t be played.
Advanced Audio Attributes
SSMD supports advanced audio control through SSML attributes:
Audio Clipping
Play a portion of an audio file by specifying start and end times:
ssmd.to_ssml('[music]{src="song.mp3" clip="5s-30s"}')
# → <audio src="song.mp3" clipBegin="5s" clipEnd="30s"><desc>music</desc></audio>
ssmd.to_ssml('[intro]{src="podcast.mp3" clip="0s-10s"}')
# → <audio src="podcast.mp3" clipBegin="0s" clipEnd="10s"><desc>intro</desc></audio>
Speed Control
Adjust playback speed using percentages:
ssmd.to_ssml('[announcement]{src="speech.mp3" speed="150%"}')
# → <audio src="speech.mp3" speed="150%"><desc>announcement</desc></audio>
ssmd.to_ssml('[slow]{src="message.mp3" speed="80%"}')
# → <audio src="message.mp3" speed="80%"><desc>slow</desc></audio>
Repeat Audio
Repeat audio playback a specific number of times:
ssmd.to_ssml('[jingle]{src="ad.mp3" repeat="3"}')
# → <audio src="ad.mp3" repeatCount="3"><desc>jingle</desc></audio>
ssmd.to_ssml('[beep]{src="alert.mp3" repeat="5"}')
# → <audio src="alert.mp3" repeatCount="5"><desc>beep</desc></audio>
Volume Adjustment
Control audio volume using decibel adjustment:
ssmd.to_ssml('[alarm]{src="alert.mp3" level="+6dB"}')
# → <audio src="alert.mp3" soundLevel="+6dB"><desc>alarm</desc></audio>
ssmd.to_ssml('[background]{src="music.mp3" level="-3dB"}')
# → <audio src="music.mp3" soundLevel="-3dB"><desc>background</desc></audio>
Combining Attributes
Multiple audio attributes can be combined with fallback text:
ssmd.to_ssml('[bg music]{src="music.mp3" clip="0s-10s" speed="120%" level="-3dB" alt="Fallback text"}')
# → <audio src="music.mp3" clipBegin="0s" clipEnd="10s" speed="120%" soundLevel="-3dB">
# <desc>bg music</desc>Fallback text</audio>
ssmd.to_ssml('[effect]{src="sound.mp3" clip="2s-5s" repeat="2" alt="Sound unavailable"}')
# → <audio src="sound.mp3" clipBegin="2s" clipEnd="5s" repeatCount="2">
# <desc>effect</desc>Sound unavailable</audio>
Note
Audio attribute support varies by TTS platform. Amazon Polly and Google Cloud TTS support most of these features. Always test with your specific TTS engine.
Markers
Markers create synchronization points for events:
ssmd.to_ssml('I always wanted a @animal cat as a pet.')
# → <speak>I always wanted a <mark name="animal"/> cat as a pet.</speak>
ssmd.to_ssml('Click @here to continue.')
# → <speak>Click <mark name="here"/> to continue.</speak>
Markers are removed when stripping to plain text:
ssmd.to_text('Click @here now')
# → Click now
Extensions
Platform-specific extensions allow you to use TTS features beyond standard SSML.
Amazon Polly Extensions
Amazon Polly provides effects like whispering and dynamic range compression:
# Whisper effect
ssmd.to_ssml('[secret message]{ext="whisper"}')
# → <amazon:effect name="whispered">secret message</amazon:effect>
# Dynamic range compression (for voice over music)
ssmd.to_ssml('[announcement]{ext="drc"}')
# → <amazon:effect name="drc">announcement</amazon:effect>
Google Cloud TTS Speaking Styles
Google Cloud TTS supports speaking styles for Neural2 and Studio voices. You can configure these using SSMD’s extension system:
from ssmd import Document
# Configure Google TTS styles as extensions
doc = Document(config={
'extensions': {
'cheerful': lambda text: f'<google:style name="cheerful">{text}</google:style>',
'calm': lambda text: f'<google:style name="calm">{text}</google:style>',
'empathetic': lambda text: f'<google:style name="empathetic">{text}</google:style>',
'apologetic': lambda text: f'<google:style name="apologetic">{text}</google:style>',
'firm': lambda text: f'<google:style name="firm">{text}</google:style>',
}
})
# Use styles in your content
doc.add_sentence("[Welcome to our service!]{ext=\"cheerful\"}")
doc.add_sentence("[We apologize for the inconvenience.]{ext=\"apologetic\"}")
doc.add_sentence("[Please remain calm.]{ext=\"calm\"}")
ssml = doc.to_ssml()
# → <speak>
# <google:style name="cheerful">Welcome to our service!</google:style>
# <google:style name="apologetic">We apologize for the inconvenience.</google:style>
# <google:style name="calm">Please remain calm.</google:style>
# </speak>
Available Google TTS speaking styles:
cheerful- Upbeat and positive tonecalm- Relaxed and soothing toneempathetic- Understanding and compassionate toneapologetic- Sorry and regretful tonefirm- Confident and authoritative tonenews- Professional news anchor tone (some voices)conversational- Natural conversation tone (some voices)
Note
Google TTS speaking styles are only supported by specific Neural2 and Studio voices. See the Google Cloud TTS documentation for voice compatibility.
Custom Extensions
You can define your own extensions for any custom SSML tags your TTS platform supports:
from ssmd import Document
doc = Document(config={
'extensions': {
'robotic': lambda text: f'<voice-transformation type="robot">{text}</voice-transformation>',
'echo': lambda text: f'<audio-effect type="echo">{text}</audio-effect>',
}
})
doc.add_sentence("[Hello]{ext=\"robotic\"}")
doc.add_sentence("[world]{ext=\"echo\"}")
For a complete Google TTS styles example, see examples/google_tts_styles.py.
Combining Multiple Annotations
Multiple annotations can be space-separated inside the braces:
ssmd.to_ssml('[Bonjour]{lang="fr" volume="5" rate="2"}')
# → <lang xml:lang="fr-FR"><prosody volume="x-loud" rate="slow">Bonjour</prosody></lang>
ssmd.to_ssml('[important]{volume="5" as="character"}')
# → <prosody volume="x-loud"><say-as interpret-as="character">important</say-as></prosody>
Escaping
XML Special Characters
XML special characters are automatically escaped:
ssmd.to_ssml('5 < 10 & 10 > 5')
# → <speak>5 < 10 & 10 > 5</speak>
Security
All user input is automatically sanitized to prevent XML injection attacks. Special characters in both text content and annotation parameters are properly escaped:
# Malicious input is safely escaped
ssmd.to_ssml('[text]{sub="value<script>alert(1)</script>"}')
# → <speak><sub alias="value<script>alert(1)</script>">text</sub></speak>
The library ensures:
XML validity: Output is always valid, well-formed XML
Injection prevention: User input cannot break out of attribute values or inject tags
Automatic escaping: All special characters (
<,>,&,",') are escaped
You can safely use SSMD with untrusted user input in TTS applications.
Literal Asterisks
To include literal asterisks without emphasis, escape them or use different patterns:
# These won't be treated as emphasis
ssmd.to_ssml('2 * 3 = 6')
# → <speak>2 * 3 = 6</speak>
ssmd.to_ssml('* list item')
# → <speak>* list item</speak>