Parser API
The SSMD Parser provides an alternative to SSML generation by extracting structured data from SSMD text. This is useful when you need programmatic control over SSMD features or want to build custom TTS pipelines.
When to Use the Parser
Use the parser API when you need to:
Process SSMD features programmatically - Extract and handle features individually
Build custom TTS pipelines - Implement your own text-to-speech workflow
Handle text transformations - Process say-as, substitution, and phoneme conversions
Create multi-voice dialogue systems - Build voice-specific processing pipelines
Analyze SSMD content - Extract metadata and features without generating SSML
Overview
The parser extracts SSMD markup into structured segments, allowing you to process each feature individually instead of generating a complete SSML document.
from ssmd import parse_paragraphs
script = """
<div voice="sarah">
Hello! Call [+1-555-0123]{as="telephone"} for info.
</div>
<div voice="michael">
Thanks *Sarah*!
</div>
"""
# Parse into structured paragraphs
for paragraph in parse_paragraphs(script):
for sentence in paragraph.sentences:
# Get voice configuration
voice_name = sentence.voice.name if sentence.voice else "default"
# Build complete text from segments
full_text = ""
for seg in sentence.segments:
# Handle text transformations
if seg.say_as:
text = convert_say_as(seg.text, seg.say_as.interpret_as)
elif seg.substitution:
text = seg.substitution
elif seg.phoneme:
text = seg.text # TTS engine handles phoneme
else:
text = seg.text
full_text += text
# Speak with TTS engine
tts.speak(full_text, voice=voice_name)
Parser Functions
parse_paragraphs
Parse SSMD text into structured paragraphs with sentences and segments.
- ssmd.parse_paragraphs(text: str, *, capabilities: TTSCapabilities | str | None = None, heading_levels: dict | None = None, extensions: dict | None = None, sentence_detection: bool = True, language: str = 'en', use_spacy: bool | None = None, model_size: str | None = None, parse_yaml_header: bool = False, strict_parse: bool = False) list[Paragraph][source]
Parse SSMD text into a list of Paragraphs.
This is the main parsing function. It handles: - Directive blocks (<div …> … </div>) - Paragraph and sentence splitting - All SSMD markup (emphasis, annotations, breaks, etc.)
- Args:
text: SSMD markdown text capabilities: TTS capabilities for filtering (optional) heading_levels: Custom heading configurations extensions: Custom extension handlers sentence_detection: If True, split text into sentences language: Default language for sentence detection use_spacy: If True, use spaCy for sentence detection model_size: spaCy model size (“sm”, “md”, “lg”) parse_yaml_header: If True, parse YAML front matter and apply
heading/extensions config while stripping it from the body. If False, YAML front matter is preserved as plain text.
strict_parse: If True, strip unsupported features based on capabilities.
- Returns:
List of Paragraph objects
Returns: List of Paragraph objects.
parse_sentences
Parse SSMD text into structured sentences with segments. This is a convenience
wrapper that flattens the paragraphs returned by parse_paragraphs().
- ssmd.parse_sentences(ssmd_text: str, *, capabilities: TTSCapabilities | str | None = None, include_default_voice: bool = True, sentence_detection: bool = True, language: str = 'en', model_size: str | None = None, spacy_model: str | None = None, use_spacy: bool | None = None, heading_levels: dict | None = None, extensions: dict | None = None, parse_yaml_header: bool = False, strict_parse: bool = False) list[Sentence][source]
Parse SSMD text into sentences (backward compatible API).
This is an alias for parse_paragraphs() with the old parameter names. Returned sentences include paragraph_index and sentence_index metadata.
- Args:
ssmd_text: SSMD formatted text to parse capabilities: TTS capabilities or preset name include_default_voice: If False, exclude sentences without voice context sentence_detection: Enable/disable sentence splitting language: Language code for sentence detection model_size: Size of spacy model (sm/md/lg) spacy_model: Full spacy model name (deprecated, use model_size) use_spacy: Force use of spacy for sentence detection heading_levels: Custom heading configurations extensions: Custom extension handlers parse_yaml_header: If True, parse YAML front matter and apply
heading/extensions config while stripping it from the body. If False, YAML front matter is preserved as plain text.
strict_parse: If True, strip unsupported features based on capabilities.
- Returns:
List of Sentence objects
Parameters:
ssmd_text(str): SSMD markdown text to parsesentence_detection(bool): Split text into sentences (default:True)include_default_voice(bool): Include text before first voice directive (default:True)capabilities(TTSCapabilities | str): Filter features based on TTS engine supportlanguage(str): Language code for sentence detection (default:"en")model_size(str): spaCy model size -"sm","md","lg","trf"(default:"sm")spacy_model(str): Deprecated alias; model size is inferred from the nameuse_spacy(bool): IfFalse, use fast regex splitting instead of spaCy (default:True)
Returns: List of Sentence objects (alias: SSMDSentence). Each
sentence includes paragraph_index and sentence_index metadata.
Example:
from ssmd import parse_sentences
sentences = parse_sentences("Hello *world*! This is great.")
for sent in sentences:
print(f"Voice: {sent.voice.name if sent.voice else 'default'}")
print(f"Segments: {len(sent.segments)}")
for seg in sent.segments:
print(f" - {seg.text!r} (emphasis={seg.emphasis})")
Sentence Detection Configuration
Control how sentences are detected and split. SSMD uses phrasplit for intelligent sentence detection with optional spaCy support for maximum accuracy.
Fast Mode (Regex-Based, No spaCy Required)
The default mode uses fast regex-based splitting that works great for well-formatted text:
from ssmd import parse_sentences
# Fast regex splitting (works out-of-the-box, no spaCy needed)
sentences = parse_sentences(
"Hello world. This is fast.",
use_spacy=False
)
Auto-Detection (Recommended)
By default, SSMD auto-detects if spaCy is installed and uses it for better accuracy:
# Auto-detect: uses spaCy if installed, falls back to regex
sentences = parse_sentences("Hello. World.")
# Works without spaCy, better accuracy with spaCy
Model Size Selection
When spaCy is installed, choose different model sizes for quality vs. speed tradeoffs:
# Small model (fast, good accuracy) - DEFAULT
sentences = parse_sentences("Hello. World.")
# Uses: en_core_web_sm, fr_core_news_sm, etc.
# Medium model (better accuracy)
sentences = parse_sentences("Hello. World.", model_size="md")
# Uses: en_core_web_md, fr_core_news_md, etc.
# Large model (best accuracy)
sentences = parse_sentences("Hello. World.", model_size="lg")
# Uses: en_core_web_lg, fr_core_news_lg, etc.
# Transformer model (research-grade quality, slowest)
sentences = parse_sentences("Hello. World.", model_size="trf")
# Uses: en_core_web_trf, fr_dep_news_trf, etc.
Deprecated ``spacy_model`` Alias
The spacy_model parameter is retained for backward compatibility and only infers the
model size from the name. Prefer model_size for clarity:
sentences = parse_sentences(
"Technical text here.",
spacy_model="en_core_web_lg" # infers model_size="lg"
)
Multi-Language Support
The model_size parameter works across all spaCy-supported languages:
script = """
<div voice="fr-FR">
Bonjour tout le monde!
</div>
<div voice="en-US">
Hello everyone!
</div>
"""
# Uses fr_core_news_md for French, en_core_web_md for English
sentences = parse_sentences(script, model_size="md")
Installation
SSMD works out-of-the-box with fast regex mode. For spaCy support:
# Install spaCy support
pip install "ssmd[spacy]"
# Install models for your languages
python -m spacy download en_core_web_sm # English (small)
python -m spacy download en_core_web_md # English (medium)
python -m spacy download en_core_web_lg # English (large)
python -m spacy download fr_core_news_sm # French
See the spaCy models documentation for a complete list of available models.
Performance Comparison
Mode |
Speed |
Accuracy |
Size |
Use Case |
|---|---|---|---|---|
Regex |
60x faster |
85-90% |
0 MB |
Simple text, speed-critical |
spaCy sm |
Baseline |
~95% |
~30 MB |
Balanced accuracy/performance |
spaCy md |
Slower |
~97% |
~100 MB |
Better accuracy |
spaCy lg |
2x slower |
~98% |
~500 MB |
Best accuracy |
spaCy trf |
10x slower |
~99%+ |
~1 GB |
Research, maximum quality |
parse_segments
Parse SSMD text into segments without sentence grouping.
- ssmd.parse_segments(ssmd_text: str, *, capabilities: TTSCapabilities | str | None = None, voice_context: VoiceAttrs | None = None) list[Segment][source]
Parse SSMD text into segments (backward compatible API).
Parameters:
text(str): SSMD text to parsecapabilities(TTSCapabilities | str): Filter features based on TTS engine supportvoice_context(VoiceAttrs | None): Current voice context
Returns: List of Segment objects (alias: SSMDSegment)
Example:
from ssmd import parse_segments
segments = parse_segments('Call [+1-555-0123]{as="telephone"} now')
for seg in segments:
if seg.say_as:
print(f"Say-as: {seg.text!r} as {seg.say_as.interpret_as}")
Data Structures
Paragraph (alias: SSMDParagraph)
Represents a paragraph containing sentences.
- class ssmd.Paragraph(sentences: list[Sentence] = <factory>)[source]
Bases:
objectA paragraph containing sentences.
Paragraphs group sentences separated by blank lines in SSMD.
Attributes:
sentences(list[Sentence]): List of sentences in the paragraph
Sentence (alias: SSMDSentence)
Represents a complete sentence with voice context and segments.
- class ssmd.Sentence(segments: list[Segment] = <factory>, voice: VoiceAttrs | None = None, language: str | None = None, prosody: ProsodyAttrs | None = None, is_paragraph_end: bool = False, paragraph_index: int = 0, sentence_index: int = 0, breaks_after: list[BreakAttrs] = <factory>)[source]
Bases:
objectA sentence containing segments with directive context.
Represents a logical sentence unit that should be spoken together. Sentences are split on: - Directive changes (<div …> blocks) - Sentence boundaries (.!?) when sentence_detection=True - Paragraph breaks (
)
- Attributes:
segments: List of segments in the sentence voice: Voice context for entire sentence (from <div voice=…> directives) language: Language directive for the sentence prosody: Prosody directive for the sentence is_paragraph_end: True if sentence ends with paragraph break paragraph_index: Zero-based paragraph index for this sentence sentence_index: Zero-based sentence index within the document breaks_after: Pauses after the sentence
- voice: VoiceAttrs | None = None
- prosody: ProsodyAttrs | None = None
- breaks_after: list[BreakAttrs]
- to_ssml(capabilities: TTSCapabilities | None = None, extensions: dict | None = None, wrap_sentence: bool = False, warnings: list[str] | None = None) str[source]
Convert sentence to SSML.
- Args:
capabilities: TTS engine capabilities for filtering extensions: Custom extension handlers wrap_sentence: If True, wrap content in <s> tag warnings: Optional list to collect warnings
- Returns:
SSML string
Attributes:
segments(list[Segment]): List of text segments making up the sentencevoice(VoiceAttrs | None): Voice configuration for this sentenceis_paragraph_end(bool): Whether this sentence ends a paragraphparagraph_index(int): Zero-based paragraph index for this sentencesentence_index(int): Zero-based sentence index within the documentbreaks_after(list[BreakAttrs]): Breaks after the sentence
Segment (alias: SSMDSegment)
Represents a text segment with associated metadata and features.
- class ssmd.Segment(text: str, emphasis: bool | str = False, prosody: ProsodyAttrs | None = None, language: str | None = None, voice: VoiceAttrs | None = None, say_as: SayAsAttrs | None = None, substitution: str | None = None, phoneme: PhonemeAttrs | None = None, audio: AudioAttrs | None = None, extension: str | None = None, breaks_before: list[BreakAttrs] = <factory>, breaks_after: list[BreakAttrs] = <factory>, marks_before: list[str] = <factory>, marks_after: list[str] = <factory>)[source]
Bases:
objectA segment of text with SSMD features.
Represents a portion of text with specific formatting and processing attributes. Segments are the atomic units of SSMD content.
- Attributes:
text: Raw text content emphasis: Emphasis level (True/”moderate”, “strong”, “reduced”, “none”, False) prosody: Volume, rate, pitch settings language: Language code for this segment voice: Voice settings for this segment say_as: Text interpretation hints substitution: Replacement text (alias) phoneme: IPA pronunciation audio: Audio file to play extension: Platform-specific extension name breaks_before: Pauses before this segment breaks_after: Pauses after this segment marks_before: Event markers before this segment marks_after: Event markers after this segment
- prosody: ProsodyAttrs | None = None
- voice: VoiceAttrs | None = None
- say_as: SayAsAttrs | None = None
- phoneme: PhonemeAttrs | None = None
- audio: AudioAttrs | None = None
- breaks_before: list[BreakAttrs]
- breaks_after: list[BreakAttrs]
Attributes:
text(str): The text content of this segmentemphasis(bool | str): Emphasis level (True,"moderate","strong","reduced","none")prosody(ProsodyAttrs | None): Prosody attributes (volume, rate, pitch)language(str | None): Language code (e.g.,"fr-FR")voice(VoiceAttrs | None): Inline voice settings for this segmentsay_as(SayAsAttrs | None): Say-as interpretationsubstitution(str | None): Substitution textphoneme(PhonemeAttrs | None): Phonetic pronunciation (withphandalphabetattributes)audio(AudioAttrs | None): Audio file informationextension(str | None): Platform-specific extension namebreaks_before(list[BreakAttrs]): Pauses before this segmentbreaks_after(list[BreakAttrs]): Pauses after this segmentmarks_before(list[str]): Marker names before this segmentmarks_after(list[str]): Marker names after this segment
VoiceAttrs
Voice configuration attributes.
- class ssmd.VoiceAttrs(name: str | None = None, language: str | None = None, gender: Literal['male', 'female', 'neutral'] | None = None, variant: int | None = None)[source]
Bases:
objectVoice attributes for TTS voice selection.
- Attributes:
name: Voice name (e.g., “Joanna”, “en-US-Wavenet-A”) language: BCP-47 language code (e.g., “en-US”, “fr-FR”) gender: Voice gender variant: Variant number for disambiguation
Attributes:
name(str | None): Voice name (e.g.,"sarah","en-US-Wavenet-A")language(str | None): Language code (e.g.,"en-US")gender(str | None): Gender ("male","female","neutral")variant(int | None): Voice variant number
ProsodyAttrs
Prosody attributes for controlling volume, rate, and pitch.
- class ssmd.ProsodyAttrs(volume: str | None = None, rate: str | None = None, pitch: str | None = None)[source]
Bases:
objectProsody attributes for volume, rate, and pitch control.
- Attributes:
- volume: Volume level (‘silent’, ‘x-soft’, ‘soft’, ‘medium’, ‘loud’,
‘x-loud’, or relative like ‘+10dB’)
- rate: Speech rate (‘x-slow’, ‘slow’, ‘medium’, ‘fast’, ‘x-fast’,
or relative like ‘+20%’)
- pitch: Pitch level (‘x-low’, ‘low’, ‘medium’, ‘high’, ‘x-high’,
or relative like ‘-5%’)
Attributes:
volume(str | None): Volume level (e.g.,"x-loud","+10dB")rate(str | None): Speech rate (e.g.,"fast","120%")pitch(str | None): Pitch level (e.g.,"high","+20%")
BreakAttrs
Pause/break attributes.
- class ssmd.BreakAttrs(time: str | None = None, strength: str | None = None)[source]
Bases:
objectBreak/pause attributes.
- Attributes:
time: Time duration (e.g., ‘500ms’, ‘2s’) strength: Break strength (‘none’, ‘x-weak’, ‘medium’, ‘strong’, ‘x-strong’)
Attributes:
time(str | None): Break duration (e.g.,"500ms","2s")strength(str | None): Break strength (e.g.,"weak","strong")
SayAsAttrs
Say-as interpretation attributes.
- class ssmd.SayAsAttrs(interpret_as: str, format: str | None = None, detail: str | None = None)[source]
Bases:
objectSay-as attributes for text interpretation.
- Attributes:
- interpret_as: Interpretation type (‘telephone’, ‘date’, ‘cardinal’,
‘ordinal’, ‘characters’, ‘expletive’, etc.)
format: Optional format string (e.g., ‘dd.mm.yyyy’ for dates) detail: Optional detail level (e.g., ‘2’ for verbosity)
Attributes:
interpret_as(str): Interpretation type (e.g.,"telephone","date")format(str | None): Format string (e.g.,"mdy"for dates)detail(str | None): Verbosity level (platform-specific)
PhonemeAttrs
Phonetic pronunciation attributes.
- class ssmd.PhonemeAttrs(ph: str, alphabet: str = 'ipa')[source]
Bases:
objectPhoneme pronunciation attributes.
- Attributes:
ph: Phonetic pronunciation string alphabet: Phonetic alphabet (ipa or x-sampa)
Attributes:
ph(str): Phonetic pronunciation stringalphabet(str): Phonetic alphabet ("ipa"or"x-sampa", default:"ipa")
AudioAttrs
Audio file attributes.
- class ssmd.AudioAttrs(src: str, alt_text: str | None = None, clip_begin: str | None = None, clip_end: str | None = None, speed: str | None = None, repeat_count: int | None = None, repeat_dur: str | None = None, sound_level: str | None = None)[source]
Bases:
objectAudio file attributes.
- Attributes:
src: Audio file URL or path alt_text: Fallback text if audio cannot be played clip_begin: Start time for playback (e.g., “0s”, “500ms”) clip_end: End time for playback (e.g., “10s”, “5000ms”) speed: Playback speed as percentage (e.g., “150%”, “80%”) repeat_count: Number of times to repeat audio repeat_dur: Total duration for repetitions (e.g., “10s”) sound_level: Volume adjustment in dB (e.g., “+6dB”, “-3dB”)
Attributes:
src(str): Audio file URL or pathalt_text(str | None): Alternative text if audio fails to loadclip_begin(str | None): Start time for playback (e.g.,"5s","500ms")clip_end(str | None): End time for playbackspeed(str | None): Playback speed as percentage (e.g.,"150%")repeat_count(int | None): Number of times to repeat audiorepeat_dur(str | None): Total duration for repetitionssound_level(str | None): Volume adjustment in dB (e.g.,"+6dB","-3dB")
Usage Examples
Basic Parsing
Extract segments from simple text:
from ssmd import parse_segments
text = "Hello *world*! This is ...500ms great."
segments = parse_segments(text)
for seg in segments:
print(f"Text: {seg.text!r}")
if seg.emphasis:
print(" Has emphasis")
for brk in seg.breaks_after:
print(f" Break: {brk.time}")
Text Transformations
Handle say-as, substitution, and phoneme features:
from ssmd import parse_segments
text = """
Call [+1-555-0123]{as="telephone"} for info.
[H2O]{sub="water"} is important.
Say [tomato]{ipa="təˈmeɪtoʊ"} correctly.
"""
segments = parse_segments(text)
for seg in segments:
if seg.say_as:
print(f"Say-as: {seg.text!r} as {seg.say_as.interpret_as}")
elif seg.substitution:
print(f"Substitute: {seg.text!r} → {seg.substitution!r}")
elif seg.phoneme:
print(f"Phoneme: {seg.text!r} → {seg.phoneme.ph!r}")
Multi-Voice Dialogue
Process voice blocks separately:
from ssmd import parse_voice_blocks
script = """
<div voice="sarah">
Hello! Call [+1-555-0123]{as="telephone"} for info.
</div>
<div voice="michael">
Thanks *Sarah*!
</div>
"""
blocks = parse_voice_blocks(script)
for voice, text in blocks:
if voice:
print(f"{voice.name}: {text.strip()}")
Complete TTS Workflow
Build sentences from segments for TTS processing:
from ssmd import parse_sentences
script = """
<div voice="sarah">
Hello! Call [+1-555-0123]{as="telephone"} for info.
</div>
<div voice="michael">
Thanks *Sarah*!
</div>
"""
for sentence in parse_sentences(script):
# Get voice
voice_name = sentence.voice.name if sentence.voice else "default"
# Build complete text
full_text = ""
metadata = []
for seg in sentence.segments:
# Handle transformations
if seg.say_as:
text = convert_say_as(seg.text, seg.say_as.interpret_as)
metadata.append(f"say-as:{seg.say_as.interpret_as}")
elif seg.substitution:
text = seg.substitution
elif seg.phoneme:
text = seg.text
metadata.append(f"phoneme:{seg.phoneme.ph}")
else:
text = seg.text
full_text += text
# Track emphasis
if seg.emphasis:
metadata.append("emphasis")
# Track breaks
for brk in seg.breaks_after:
metadata.append(f"break:{brk.time}")
# Speak with TTS engine
print(f"[{voice_name}] {full_text}")
if metadata:
print(f" Metadata: {', '.join(metadata)}")
Advanced Sentence Parsing
Control sentence detection and voice filtering:
from ssmd import parse_sentences
text = """
Welcome to the demo.
This is a new paragraph.
<div voice="sarah">
Sarah speaks here.
</div>
"""
sentences = parse_sentences(
text,
sentence_detection=True, # Split by sentences
include_default_voice=True, # Include text before voice directive
)
for i, sent in enumerate(sentences, 1):
voice_name = sent.voice.name if sent.voice else "(default)"
text_content = "".join(seg.text for seg in sent.segments)
para_marker = " [PARA_END]" if sent.is_paragraph_end else ""
print(f"{i}. [{voice_name}] {text_content!r}{para_marker}")
TTS Engine Integration
Example integration with a TTS engine:
from ssmd import parse_sentences
class TTSEngine:
def speak(self, text: str, voice: str = "default", **kwargs):
"""Speak text with given voice and parameters."""
print(f"[TTS] Voice: {voice}, Text: {text}")
# Your TTS implementation here
pass
def process_ssmd_script(script: str, tts: TTSEngine):
"""Process SSMD script with TTS engine."""
sentences = parse_sentences(script)
for sentence in sentences:
# Configure voice
voice_config = {}
if sentence.voice:
if sentence.voice.name:
voice_config["voice"] = sentence.voice.name
if sentence.voice.language:
voice_config["language"] = sentence.voice.language
# Build text with transformations
full_text = ""
for seg in sentence.segments:
if seg.say_as:
# TTS engine handles say-as conversion
text = handle_say_as(seg.text, seg.say_as)
elif seg.substitution:
text = seg.substitution
elif seg.phoneme:
text = seg.text # Use phoneme for pronunciation
else:
text = seg.text
full_text += text
# Speak with TTS
tts.speak(full_text, **voice_config)
# Usage
script = """
<div voice="sarah">
Hello! Today's date is [2024-01-15]{as="date" format="mdy"}.
</div>
<div voice="michael">
Thank you for listening!
</div>
"""
tts = TTSEngine()
process_ssmd_script(script, tts)
Capability Filtering
Filter features based on TTS engine capabilities:
from ssmd import parse_sentences
# Parse with pyttsx3 capabilities (limited SSML support)
sentences = parse_sentences(
'Hello *world*! [Bonjour]{lang="fr"} everyone!',
capabilities='pyttsx3'
)
# Unsupported features (emphasis, language) are filtered out
for sent in sentences:
for seg in sent.segments:
# seg.emphasis will be False (pyttsx3 doesn't support it)
# seg.language will be None (pyttsx3 doesn't support it)
print(seg.text)
Complete Demo
See examples/parser_demo.py for a comprehensive demonstration:
python examples/parser_demo.py
The demo includes:
Basic segment parsing
Text transformations (say-as, substitution, phoneme)
Voice block handling
Complete TTS workflow
Prosody and language annotations
Advanced sentence parsing
Mock TTS integration
See Also
Quick Start - Getting started with SSMD
SSMD Syntax Reference - SSMD syntax reference
Examples - More usage examples
API Reference - Complete API reference