GitHub - speechmatics/speechmatics-python-sdk: Python SDKs for Speechmatics APIs
Speechmatics Python SDK provides convenient access to enterprise-grade speech-to-text APIs from Python applications.
Fully typed with type definitions for all request params and response fields. Modern Python with async/await patterns, type hints, and context managers for production-ready code.
55+ Languages • Realtime & Batch • Custom vocabularies • Speaker diarization • Speaker ID
📋 Table of Contents
- Quickstart
- Why Speechmatics?
- Key Features
- Use Cases
- Documentation
- Authentication
- Advanced Configuration
- Deployment Options
- Community & Support
⚡ Quick Start
Installation
# Choose the package for your use case: # Batch transcription pip install speechmatics-batch # Realtime streaming pip install speechmatics-rt # Voice agents pip install speechmatics-voice # Text-to-speech pip install speechmatics-tts
📦 Package Details • Click to see what's included in each package
speechmatics-batch - Async batch transcription API
- Upload audio files for processing
- Get transcripts with timestamps, speakers, entities
- Supports all audio intelligence features
speechmatics-rt - Realtime WebSocket streaming
- Stream audio for live transcription
- Ultra-low latency (150ms p95)
- Partial and final transcripts
speechmatics-voice - Voice agent SDK
- Build conversational AI applications
- Speaker diarization and turn detection
- Optional ML-based smart turn:
pip install speechmatics-voice[smart]
speechmatics-tts - Text-to-speech
- Convert text to natural-sounding speech
- Multiple voices and languages
- Streaming and batch modes
Setting Up Development Environment
git clone https://github.com/speechmatics/speechmatics-python-sdk.git cd speechmatics-python-sdk python -m venv .venv .venv\Scripts\activate # On Mac/Linux: source .venv/bin/activate # Install development dependencies for all SDKs make install-dev # Install pre-commit hooks pre-commit install
Simple and Pythonic! Get your API key at portal.speechmatics.com
Your First Transcription
Note
All examples use load_dotenv() to load your API key from a .env file. Create a .env file with SPEECHMATICS_API_KEY=your_key_here.
There are several different methods of generating your first transcription:
- Batch Transcription - transcribe audio files
- Realtime Streaming - live microphone transcription
- Text-to-Speech - convert text to audio
- Voice Agent - real-time transcription with speaker diarization and turn detection
Batch Transcription
Transcribe audio files:
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) result = await client.transcribe("audio.wav") print(result.transcript_text) await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Realtime Streaming
Live microphone transcription:
import asyncio import os from dotenv import load_dotenv from speechmatics.rt import ( AsyncClient, ServerMessageType, TranscriptionConfig, TranscriptResult, AudioFormat, AudioEncoding, Microphone, ) load_dotenv() CHUNK_SIZE = 4096 async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) mic = Microphone(sample_rate=16000, chunk_size=CHUNK_SIZE) @client.on(ServerMessageType.ADD_TRANSCRIPT) def on_final(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[final]: {result.metadata.transcript}") @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT) def on_partial(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[partial]: {result.metadata.transcript}") mic.start() try: await client.start_session( transcription_config=TranscriptionConfig(language="en", enable_partials=True), audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000), ) print("Speak now...") while True: await client.send_audio(await mic.read(CHUNK_SIZE)) finally: mic.stop() await client.close() asyncio.run(main())
Installation:
pip install speechmatics-rt python-dotenv pyaudio
Text-to-Speech
Convert text to audio:
import asyncio import os from dotenv import load_dotenv from speechmatics.tts import AsyncClient, Voice, OutputFormat load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) response = await client.generate( text="Hello! Welcome to Speechmatics Text-to-Speech", voice=Voice.SARAH, output_format=OutputFormat.WAV_16000 ) audio_data = await response.read() with open("output.wav", "wb") as f: f.write(audio_data) print("Audio saved to output.wav") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-tts python-dotenv
Voice Agent
Real-time transcription with speaker diarization and turn detection:
import asyncio import os from dotenv import load_dotenv from speechmatics.rt import Microphone from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType load_dotenv() async def main(): client = VoiceAgentClient( api_key=os.getenv("SPEECHMATICS_API_KEY"), config=VoiceAgentConfigPreset.load("adaptive") ) @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}") @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print("[END OF TURN]") mic = Microphone(sample_rate=16000, chunk_size=320) mic.start() try: await client.connect() print("Voice agent ready. Speak now...") while True: await client.send_audio(await mic.read(320)) finally: mic.stop() await client.disconnect() asyncio.run(main())
Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio
Tip
Ready for more? Explore 20+ working examples at Speechmatics Academy — voice agents, integrations, use cases, and migration guides.
Built for Production
- 99.9% Uptime SLA - Enterprise-grade reliability
- SOC 2 Type II Certified - Your data is secure
- Flexible Deployment - SaaS, on-premises, or air-gapped
Accuracy That Matters
When 1% WER improvement translates to millions in revenue, you need the best.
| Metric | Speechmatics | Deepgram |
|---|---|---|
| Word Error Rate (WER) | 6.8% | 16.5% |
| Languages Supported | 55+ | 30+ |
| Custom dictionary | 1,000 words | 100 words |
| Speaker diarization | Included | Extra charge |
| Realtime translation | 30+ languages | ❌ |
| Sentiment analysis | ✅ | ❌ |
| On-premises | ✅ | Limited |
| On-device | ✅ | ❌ |
| Air-gapped deployment | ✅ | ❌ |
🚀 Key Features
Realtime Transcription
Stream audio and get instant transcriptions with ultra-low latency. Perfect for voice agents, live captioning, and conversational AI.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.rt import Microphone from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType load_dotenv() async def main(): # Voice SDK with adaptive turn detection - optimised for conversational AI client = VoiceAgentClient( api_key=os.getenv("SPEECHMATICS_API_KEY"), config=VoiceAgentConfigPreset.load("adaptive") ) # Handle transcription segments with speaker labels @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}") # Detect when speaker finishes their turn @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print("[END OF TURN]") mic = Microphone(sample_rate=16000, chunk_size=320) mic.start() try: await client.connect() print("Voice agent ready. Speak now...") while True: await client.send_audio(await mic.read(320)) finally: mic.stop() await client.disconnect() asyncio.run(main())
Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio
Batch Transcription
Upload audio files and get accurate transcripts with speaker labels, timestamps, and more.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) # Submit job with advanced features job = await client.submit_job( "example.wav", transcription_config=TranscriptionConfig( language="en", diarization="speaker", enable_entities=True, punctuation_overrides={ "permitted_marks": [".", "?", "!"] } ) ) # Wait for completion result = await client.wait_for_completion(job.id, format_type=FormatType.JSON) # Access results print(f"Transcript: {result.transcript_text}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Speaker Diarization
Automatically detect and label different speakers in your audio.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) job = await client.submit_job( "example.wav", transcription_config=TranscriptionConfig( language="en", diarization="speaker", speaker_diarization_config={ "prefer_current_speaker": True } ) ) result = await client.wait_for_completion(job.id) # Access full transcript with speaker labels print(f"Full transcript:\n{result.transcript_text}\n") # Access individual results with speaker information for result_item in result.results: if result_item.alternatives: alt = result_item.alternatives[0] speaker = alt.speaker or "Unknown" content = alt.content print(f"Speaker {speaker}: {content}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Custom Dictionary
Add domain-specific terms, names, and acronyms for perfect accuracy.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.rt import ( AsyncClient, ServerMessageType, TranscriptionConfig, TranscriptResult, AudioFormat, AudioEncoding, Microphone, ConversationConfig, ) load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") if not api_key: print("Error: SPEECHMATICS_API_KEY not set") return transcript_parts = [] audio_format = AudioFormat( encoding=AudioEncoding.PCM_S16LE, chunk_size=4096, sample_rate=16000, ) transcription_config = TranscriptionConfig( language="en", enable_partials=True, additional_vocab=[ {"content": "Speechmatics", "sounds_like": ["speech mat ics"]}, {"content": "API", "sounds_like": ["A P I", "A. P. I."]}, {"content": "kubernetes", "sounds_like": ["koo ber net ees"]}, {"content": "Anthropic", "sounds_like": ["an throp ik", "an throw pick"]}, {"content": "OAuth", "sounds_like": ["oh auth", "O auth", "O. Auth"]}, {"content": "PostgreSQL", "sounds_like": ["post gres Q L", "post gres sequel"]}, {"content": "Nginx", "sounds_like": ["engine X", "N jinx"]}, {"content": "GraphQL", "sounds_like": ["graph Q L", "graph quel"]}, ], conversation_config=ConversationConfig( end_of_utterance_silence_trigger=0.5, # seconds of silence to trigger end of utterance ), ) mic = Microphone(sample_rate=16000, chunk_size=4096) if not mic.start(): print("PyAudio not installed") return client = AsyncClient(api_key=api_key) @client.on(ServerMessageType.ADD_TRANSCRIPT) def on_final(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[final]: {result.metadata.transcript}") transcript_parts.append(result.metadata.transcript) @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT) def on_partial(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[partial]: {result.metadata.transcript}") @client.on(ServerMessageType.END_OF_UTTERANCE) def on_utterance_end(message): print("[END OF UTTERANCE]\n") try: await client.start_session( transcription_config=transcription_config, audio_format=audio_format, ) print("Speak now...") while True: await client.send_audio(await mic.read(4096)) except KeyboardInterrupt: pass finally: mic.stop() await client.close() print(f"\nFull transcript: {' '.join(transcript_parts)}") asyncio.run(main())
Installation:
pip install speechmatics-rt python-dotenv pyaudio
55+ Languages
Native models for major languages, not just multilingual Whisper.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) # Automatic language detection job = await client.submit_job( "audio.wav", transcription_config=TranscriptionConfig(language="auto") ) result = await client.wait_for_completion(job.id) print(f"Detected language transcript: {result.transcript_text}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Audio Intelligence
Get sentiment, topics, summaries, and chapters from your audio.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import ( AsyncClient, JobConfig, JobType, TranscriptionConfig, SentimentAnalysisConfig, TopicDetectionConfig, SummarizationConfig, AutoChaptersConfig ) load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) # Configure job with all audio intelligence features config = JobConfig( type=JobType.TRANSCRIPTION, transcription_config=TranscriptionConfig(language="en"), sentiment_analysis_config=SentimentAnalysisConfig(), topic_detection_config=TopicDetectionConfig(), summarization_config=SummarizationConfig(), auto_chapters_config=AutoChaptersConfig() ) job = await client.submit_job("example.wav", config=config) result = await client.wait_for_completion(job.id) # Access all results print(f"Transcript: {result.transcript_text}") if result.sentiment_analysis: print(f"Sentiment: {result.sentiment_analysis}") if result.topics: print(f"Topics: {result.topics}") if result.summary: print(f"Summary: {result.summary}") if result.chapters: print(f"Chapters: {result.chapters}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Translation
Transcribe and translate simultaneously to multiple languages.
Code example - Click to expand
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import ( AsyncClient, JobConfig, JobType, TranscriptionConfig, TranslationConfig ) load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) config = JobConfig( type=JobType.TRANSCRIPTION, transcription_config=TranscriptionConfig(language="en"), translation_config=TranslationConfig(target_languages=["es", "fr", "de"]) ) job = await client.submit_job("sample.mp4", config=config) result = await client.wait_for_completion(job.id) # Access original transcript print(f"Original (English): {result.transcript_text}\n") # Access translations if result.translations: for lang_code, segments in result.translations.items(): translated_text = " ".join(seg.get("content", "") for seg in segments) print(f"Translated ({lang_code}): {translated_text}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
🔌 Framework Integrations
For more integration examples including Django, Next.js, and production patterns, visit the Speechmatics Academy.
LiveKit Agents (Voice Assistants)
Build real-time voice assistants with LiveKit Agents - a framework for building voice AI applications with WebRTC.
from dotenv import load_dotenv from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import openai, silero, speechmatics, elevenlabs load_dotenv() class VoiceAssistant(Agent): """Voice assistant agent with Speechmatics STT.""" def __init__(self) -> None: super().__init__(instructions="You are a helpful voice assistant. Be concise and friendly.") async def entrypoint(ctx: agents.JobContext): """ Main entrypoint for the voice assistant. Pipeline: LiveKit Room - Speechmatics STT - OpenAI LLM - ElevenLabs TTS - LiveKit Room """ await ctx.connect() # Speech to text: Speechmatics with speaker diarization stt = speechmatics.STT( language="en", speaker_active_format="<{speaker_id}>{text}</{speaker_id}>", focus_speakers=["S1"], ) # Language Model: OpenAI llm = openai.LLM(model="gpt-4o-mini") # Text-to-Speech: ElevenLabs tts = elevenlabs.TTS(voice_id="21m00Tcm4TlvDq8ikWAM") # Voice Activity Detection: Silero vad = silero.VAD.load() # Create and start Agent Session session = AgentSession(stt=stt, llm=llm, tts=tts, vad=vad) await session.start( room=ctx.room, agent=VoiceAssistant(), room_input_options=RoomInputOptions(), ) # Send initial greeting await session.generate_reply(instructions="Say a short hello and ask how you can help.") if __name__ == "__main__": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
Installation:
pip install livekit-agents livekit-plugins-speechmatics livekit-plugins-openai livekit-plugins-elevenlabs livekit-plugins-silero
Key Features:
- Realtime WebRTC audio streaming
- Speechmatics STT with speaker diarization
- Configurable LLM and TTS providers
- Voice Activity Detection (VAD)
Pipecat AI (Voice Agents)
Build Realtime voice bots with Pipecat - a framework for voice and multimodal conversational AI.
import asyncio import os from dotenv import load_dotenv from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask from pipecat.services.openai.llm import OpenAILLMService, OpenAILLMContext from pipecat.services.speechmatics.stt import SpeechmaticsSTTService, Language from pipecat.services.speechmatics.tts import SpeechmaticsTTSService from pipecat.transports.local.audio import LocalAudioTransport load_dotenv() async def main(): # Configure Speechmatics STT with speaker diarization stt = SpeechmaticsSTTService( api_key=os.getenv("SPEECHMATICS_API_KEY"), params=SpeechmaticsSTTService.InputParams( language=Language.EN, speaker_active_format="@{speaker_id}: {text}" ) ) # Configure Speechmatics TTS tts = SpeechmaticsTTSService( api_key=os.getenv("SPEECHMATICS_API_KEY"), voice_id="sarah" ) # Configure LLM (OpenAI, Anthropic, etc.) llm = OpenAILLMService( api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o" ) # Set up conversation context context = OpenAILLMContext([ {"role": "system", "content": "You are a helpful AI assistant."} ]) context_aggregator = llm.create_context_aggregator(context) # Build pipeline: Audio Input -> STT -> LLM -> TTS -> Audio Output transport = LocalAudioTransport() pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Run the voice bot runner = PipelineRunner() task = PipelineTask(pipeline) print("Voice bot ready! Speak into your microphone...") await runner.run(task) asyncio.run(main())
Installation:
pip install pipecat-ai[speechmatics, openai] pyaudio
Key Features:
- Real-time STT with speaker diarization
- Natural-sounding TTS with multiple voices
- Interruption handling (users can interrupt bot responses)
- Works with any LLM provider (OpenAI, Anthropic, etc.)
📚 Documentation
Package Documentation
Each SDK package includes detailed documentation:
| Package | Documentation | Description |
|---|---|---|
| speechmatics-batch | README • Migration Guide | Async batch transcription |
| speechmatics-rt | README • Migration Guide | Realtime Streaming |
| speechmatics-voice | README | Voice agent SDK |
| speechmatics-tts | README | Text-to-speech |
Speechmatics Academy
Comprehensive collection of working examples, integrations, and templates: github.com/speechmatics/speechmatics-academy
Fundamentals
| Example | Description | Package |
|---|---|---|
| Hello World | Simplest transcription example | Batch |
| Batch vs Realtime | Learn the difference between API modes | Batch, RT |
| Configuration Guide | Common configuration options | Batch |
| Audio Intelligence | Sentiment, topics, and summaries | Batch |
| Multilingual & Translation | 50+ languages and real-time translation | RT |
| Text-to-Speech | Convert text to natural-sounding speech | TTS |
| Turn Detection | Silence-based turn detection | RT |
| Voice Agent Turn Detection | Smart turn detection with presets | Voice |
| Speaker ID & Focus | Speaker identification and focus control | Voice |
| Channel Diarization | Multi-channel transcription | Voice, RT |
Integrations
| Integration | Example | Features |
|---|---|---|
| LiveKit | Simple Voice Assistant | WebRTC, VAD, diarization, LLM, TTS |
| LiveKit | Telephony with Twilio | Phone calls via SIP, Krisp noise cancellation |
| Pipecat | Simple Voice Bot | Local audio, VAD, LLM, TTS |
| Pipecat | Voice Bot (Web) | Browser-based WebRTC |
| Twilio | Outbound Dialer | Media Streams, ElevenLabs TTS |
| VAPI | Voice Assistant | Voice AI platform integration |
Use Cases
| Industry | Example | Features |
|---|---|---|
| Healthcare | Medical Transcription | Realtime, custom medical vocabulary |
| Media | Video Captioning | SRT generation, batch processing |
| Contact Center | Call Analytics | Channel diarization, sentiment, topics |
| Business | AI Receptionist | LiveKit, Twilio SIP, Google Calendar |
| Seasonal | Santa Voice Agent | LiveKit, Twilio SIP, ElevenLabs TTS, custom voice |
Migration Guides
| From | Guide | Status |
|---|---|---|
| Deepgram | Migration Guide | Available |
Official Documentation
- API Reference - Complete API documentation
- SDK Repository - Python SDK source code
- Developer Portal - Get your API key
🔄 Migrating from speechmatics-python?
The legacy speechmatics-python package has been deprecated. This new SDK offers:
- Cleaner API - More Pythonic, better type hints
- More features - Sentiment, translation, summarization
- Better docs - Comprehensive examples and guides
Migration Guide
speechmatics-python:
from speechmatics.models import BatchTranscriptionConfig from speechmatics.batch_client import BatchClient with BatchClient("API_KEY") as client: job_id = client.submit_job("audio.wav", BatchTranscriptionConfig("en")) transcript = client.wait_for_completion(job_id, transcription_format='txt') print(transcript)
speechmatics-python-sdk:
import asyncio from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType async def main(): client = AsyncClient(api_key="API_KEY") job = await client.submit_job( "audio.wav", transcription_config=TranscriptionConfig(language="en") ) result = await client.wait_for_completion(job.id, format_type=FormatType.TXT) print(result) await client.close() asyncio.run(main())
Full Migration Guides: Batch Migration Guide • Realtime Migration Guide
💡 Use Cases
Healthcare & Medical
HIPAA-compliant transcription for clinical notes, patient interviews, and telemedicine.
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") client = AsyncClient(api_key=api_key) # Use medical domain for better accuracy with clinical terminology job = await client.submit_job( "patient_interview.wav", transcription_config=TranscriptionConfig( language="en", domain="medical", additional_vocab=[ {"content": "hypertension"}, {"content": "metformin"}, {"content": "echocardiogram"}, {"content": "MRI", "sounds_like": ["M R I"]}, {"content": "CT scan", "sounds_like": ["C T scan"]} ] ) ) result = await client.wait_for_completion(job.id) print(f"Transcript:\n{result.transcript_text}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Voice Agents & Conversational AI
Build Alexa-like experiences with real-time transcription and speaker detection.
import asyncio import os from dotenv import load_dotenv from speechmatics.rt import Microphone from speechmatics.voice import ( VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType, ) load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") # Load a preset configuration (options: adaptive, scribe, captions, external, fast) config = VoiceAgentConfigPreset.load("adaptive") # Initialize microphone mic = Microphone(sample_rate=16000, chunk_size=320) if not mic.start(): print("PyAudio not available - install with: pip install pyaudio") return # Create voice agent client client = VoiceAgentClient(api_key=api_key, config=config) @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): speaker_id = segment.get("speaker_id", "S1") text = segment.get("text", "") print(f"[{speaker_id}]: {text}") @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print("[END OF TURN]") try: await client.connect() print("Voice agent started. Speak into your microphone (Ctrl+C to stop)...") while True: audio_chunk = await mic.read(320) await client.send_audio(audio_chunk) except KeyboardInterrupt: print("\nStopping...") finally: mic.stop() await client.disconnect() asyncio.run(main())
Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio
📂 More Use Cases • Click to explore Call Center, Healthcare, Media & Entertainment, Education, and Meetings examples
Call Center Analytics
Transcribe calls with speaker diarization, sentiment analysis, and topic detection.
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import ( AsyncClient, JobConfig, JobType, TranscriptionConfig, SummarizationConfig, SentimentAnalysisConfig, TopicDetectionConfig ) load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") client = AsyncClient(api_key=api_key) config = JobConfig( type=JobType.TRANSCRIPTION, transcription_config=TranscriptionConfig( language="en", diarization="speaker" ), sentiment_analysis_config=SentimentAnalysisConfig(), topic_detection_config=TopicDetectionConfig(), summarization_config=SummarizationConfig( content_type="conversational", summary_length="brief" ) ) job = await client.submit_job("call_recording.wav", config=config) result = await client.wait_for_completion(job.id) print(f"Transcript:\n{result.transcript_text}\n") if result.sentiment_analysis: segments = result.sentiment_analysis.get("segments", []) counts = {"positive": 0, "negative": 0, "neutral": 0} for seg in segments: sentiment = seg.get("sentiment", "").lower() if sentiment in counts: counts[sentiment] += 1 overall = max(counts, key=counts.get) print(f"Sentiment: {overall.capitalize()}") print(f"Breakdown: {counts['positive']} positive, {counts['neutral']} neutral, {counts['negative']} negative") if result.topics and 'summary' in result.topics: overall = result.topics['summary']['overall'] topics = [topic for topic, count in overall.items() if count > 0] print(f"Topics: {', '.join(topics)}") if result.summary: print(f"Summary: {result.summary.get('content')}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Media & Entertainment
Add captions, create searchable archives, generate clips from keywords.
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") client = AsyncClient(api_key=api_key) job = await client.submit_job( "movie.mp4", transcription_config=TranscriptionConfig(language="en") ) # Get SRT captions captions = await client.wait_for_completion(job.id, format_type=FormatType.SRT) # Save captions with open("movie.srt", "w", encoding="utf-8") as f: f.write(captions) print("Captions saved to movie.srt") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Education & E-Learning
Auto-generate lecture transcripts, searchable course content, and accessibility captions.
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") client = AsyncClient(api_key=api_key) job = await client.submit_job( "lecture_recording.wav", transcription_config=TranscriptionConfig( language="en", diarization="speaker", enable_entities=True ) ) result = await client.wait_for_completion(job.id) # Save transcript with open("lecture_transcript.txt", "w", encoding="utf-8") as f: f.write(result.transcript_text) # Save SRT captions for accessibility captions = await client.wait_for_completion(job.id, format_type=FormatType.SRT) with open("lecture_captions.srt", "w", encoding="utf-8") as f: f.write(captions) print("Transcript and captions saved") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Meetings
Turn meetings into searchable, actionable summaries with action items and key decisions.
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import ( AsyncClient, JobConfig, JobType, TranscriptionConfig, SummarizationConfig, AutoChaptersConfig ) load_dotenv() async def main(): api_key = os.getenv("SPEECHMATICS_API_KEY") client = AsyncClient(api_key=api_key) config = JobConfig( type=JobType.TRANSCRIPTION, transcription_config=TranscriptionConfig( language="en", diarization="speaker" ), summarization_config=SummarizationConfig(), auto_chapters_config=AutoChaptersConfig() ) job = await client.submit_job("board_meeting.mp4", config=config) result = await client.wait_for_completion(job.id) print(f"Transcript:\n{result.transcript_text}\n") if result.summary: summary = result.summary.get('content', 'N/A') print(f"Summary:\n{summary}\n") if result.chapters: print("Chapters:") for i, chapter in enumerate(result.chapters, 1): print(f"{i}. {chapter}") await client.close() asyncio.run(main())
Installation:
pip install speechmatics-batch python-dotenv
Architecture
Realtime Flow
sequenceDiagram
participant App as Your App
participant SM as Speechmatics RT
App->>SM: Connect WebSocket (WSS)
App->>SM: StartRecognition (config, audio format)
SM->>App: RecognitionStarted
loop Stream Audio
App->>SM: Audio Chunks (binary)
SM->>App: AudioAdded (ack)
SM->>App: AddPartialTranscript (JSON)
SM->>App: AddTranscript (JSON, final)
end
App->>SM: EndOfStream
SM->>App: EndOfTranscript
Batch Flow
sequenceDiagram
participant App as Your App
participant API as Batch API
participant Queue as Job Queue
participant Engine as Transcription Engine
App->>API: POST /jobs (upload audio)
API->>Queue: Enqueue job
API->>App: Return job_id
Queue->>Engine: Process audio
Engine->>Queue: Store results
loop Poll Status
App->>API: GET /jobs/{id}
API->>App: Status: running/done
end
App->>API: GET /jobs/{id}/transcript
API->>App: Return transcript (JSON/TXT/SRT)
Authentication
Caution
Security Best Practice: Never hardcode API keys in your source code. Always use environment variables or secure secret management systems.
Environment Variable (Recommended)
export SPEECHMATICS_API_KEY="your_api_key_here"
import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) # Use client here # ... await client.close() asyncio.run(main())
JWT Token (Temporary Keys)
Warning
Browser Security: For browser-based transcription, always use temporary JWT tokens to avoid exposing your long-lived API key. Pass the token as a query parameter: wss://eu2.rt.speechmatics.com/v2?jwt=<token>
import asyncio from speechmatics.batch import AsyncClient, JWTAuth async def main(): # Generate temporary token (expires after ttl seconds) auth = JWTAuth(api_key="your_api_key", ttl=3600) client = AsyncClient(auth=auth) # Use client here # ... await client.close() asyncio.run(main())
Advanced Configuration
Connection Settings
import asyncio from speechmatics.rt import AsyncClient, ConnectionConfig async def main(): # Configure WebSocket connection parameters conn_config = ConnectionConfig( ping_timeout=60.0, # Timeout waiting for pong response (seconds) ping_interval=20.0, # Interval for WebSocket ping frames (seconds) open_timeout=30.0, # Timeout for establishing connection (seconds) close_timeout=10.0 # Timeout for closing connection (seconds) ) client = AsyncClient( api_key="KEY", url="wss://eu2.rt.speechmatics.com/v2", conn_config=conn_config ) # Use client here # ... await client.close() asyncio.run(main())
Retry & Error Handling
import asyncio from speechmatics.batch import AsyncClient, TranscriptionConfig from speechmatics.batch import BatchError, JobError, AuthenticationError from tenacity import retry, stop_after_attempt, wait_exponential @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10) ) async def transcribe_with_retry(audio_file): client = AsyncClient(api_key="YOUR_API_KEY") try: job = await client.submit_job( audio_file, transcription_config=TranscriptionConfig(language="en") ) return await client.wait_for_completion(job.id) except AuthenticationError: print("Authentication failed") raise except (BatchError, JobError) as e: print(f"Transcription failed: {e}") raise finally: await client.close() asyncio.run(transcribe_with_retry("audio.wav"))
Custom HTTP Client (Batch)
import asyncio from speechmatics.batch import AsyncClient, ConnectionConfig async def main(): # Configure HTTP connection settings for batch API conn_config = ConnectionConfig( connect_timeout=30.0, # Timeout for connection establishment operation_timeout=300.0 # Default timeout for API operations ) client = AsyncClient( api_key="KEY", conn_config=conn_config ) # Use client here # ... await client.close() asyncio.run(main())
Deployment Options
Cloud (SaaS)
Zero infrastructure - just sign up and start transcribing.
import asyncio from speechmatics.batch import AsyncClient async def main(): client = AsyncClient(api_key="YOUR_API_KEY") # Uses global SaaS endpoints automatically await client.close() asyncio.run(main())
Docker Container
Run Speechmatics on your own hardware.
docker pull speechmatics/transcription-engine:latest docker run -p 9000:9000 speechmatics/transcription-engine
import asyncio from speechmatics.batch import AsyncClient async def main(): client = AsyncClient( api_key="YOUR_LICENSE_KEY", url="http://localhost:9000/v2" ) # Use on-premises instance await client.close() asyncio.run(main())
Kubernetes
Scale transcription with k8s orchestration.
# Install the sm-realtime chart helm upgrade --install speechmatics-realtime \ oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \ --version 0.7.0 \ --set proxy.ingress.url="speechmatics.example.com"
🧪 Testing Your Integration
The 5-Minute Test: Can you install, authenticate, and run a successful transcription in under 5 minutes?
# 1. Install (30 seconds) pip install speechmatics-batch python-dotenv # 2. Set API key (30 seconds) export SPEECHMATICS_API_KEY="your_key_here" # 3. Run test (4 minutes) python3 << 'EOF' import asyncio import os from speechmatics.batch import AsyncClient, TranscriptionConfig, AuthenticationError from dotenv import load_dotenv load_dotenv() async def test(): api_key = os.getenv("SPEECHMATICS_API_KEY") # Replace with your audio file path audio_file = "your_audio_file.wav" client = AsyncClient(api_key=api_key) try: print("Submitting transcription job...") job = await client.submit_job(audio_file, transcription_config=TranscriptionConfig(language="en")) print(f"Job submitted: {job.id}") print("Waiting for completion...") result = await client.wait_for_completion(job.id) print(f"\nTranscript: {result.transcript_text}") print("\nTest completed successfully!") except AuthenticationError as e: print(f"\nAuthentication Error: {e}") finally: await client.close() asyncio.run(test()) EOF
If this fails, open an issue - we prioritize developer experience.
Community & Support
Get Help
- GitHub Discussions: Ask questions, share projects
- Stack Overflow: Tag with
speechmatics - Email Support: devrel@speechmatics.com
- Status Page: status.speechmatics.com
Show Your Support
Share what you built:
- Tweet with @Speechmatics
- Post in Show & Tell
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
- Website: speechmatics.com
- Documentation: docs.speechmatics.com
- Portal: portal.speechmatics.com
- Status Page: status.speechmatics.com
- Blog: speechmatics.com/blog
- GitHub: @speechmatics