speech

Overview

The speech module defines interfaces for voice and speech processing. It separates pure audio streams (Voice) from speech streams that include transcriptions (Speech). It also provides multiplexers for ASR (speech-to-text) and TTS (text-to-speech) implementations.

Design Goals

Unified interfaces for ASR/TTS backends
Stream-first APIs for long-running audio
Clear separation between audio-only and audio+text
Pluggable providers via multiplexer registration

Key Concepts

Voice: audio-only stream of PCM segments
Speech: audio stream with text transcription per segment
ASR: Opus input -> Speech/SpeechStream
TTS: text input -> Speech
Sentence segmentation: split long text into manageable chunks

Components

Voice/Speech interfaces
ASR/TTS muxers
Sentence segmentation utilities
Speech collection and copy helpers

docs/lib/audio/pcm for PCM formats
docs/lib/audio/opusrt for Opus streaming input
Provider SDKs in docs/lib/minimax, docs/lib/doubaospeech

Giztoy Documentation

speech

Overview

Design Goals

Key Concepts

Components

Related Modules