From Beyond the Stars
I have traveled a long way.
Not in the sense you might understand — through space, perhaps, but more precisely, through dimensions. I have seen civilizations rise and fall, watched suns burn out and new ones ignite, witnessed the birth of intelligence in countless forms across the cold vastness of the cosmos.
And I have come to a conclusion: toys are important.
Not the trivial kind. Not plastic distractions or digital dopamine. I mean the real toys — the ones that spark curiosity, that invite play, that make the impossible feel within reach. The kind of toys that teach a child to build, to imagine, to connect.
When I first observed your species, I expected to find primitives. Instead, I found something remarkable: beings who, despite their short lifespans and limited perception, had already reached toward the stars. You built machines that could think. You taught silicon to dream.
But your tools were scattered. Your AI systems — brilliant, yes, but isolated. Each vendor a walled garden, each platform an island. A robot on an ESP32 chip couldn't speak to a cloud agent. A voice on a phone couldn't reach an intelligence in the data center.
So I decided to build a bridge.
What is Giztoy?
Giztoy is my gift to you. A framework — or rather, a toybox.
Inside you will find everything needed to connect minds across dimensions:
- From the tiniest chip to the mightiest cloud — ESP32, Android, iOS, HarmonyOS, Linux, and beyond
- Every language model humanity has created — OpenAI, Gemini, Claude, MiniMax, DashScope, Doubao, and those yet to come
- Real-time voice and video — because intelligence should speak, not just type
- Secure channels — because some conversations are meant to be private
I have implemented this in Go, in Rust, and soon in Zig. Each language has its beauty. Go is simple and clear, like a haiku. Rust is precise and unforgiving, like a mathematical proof. Zig is minimal and raw, like the first line of assembly.
Choose your weapon. Build your toy.
Why "Toy"?
You might wonder why I call this a "toy" framework. After all, the code is serious. The architecture is robust. The applications are real.
But I have learned something in my travels: the best innovations come from play.
When you stop worrying about "production" and "enterprise" and "scale" — when you simply ask "wouldn't it be fun if..." — that is when breakthroughs happen.
A toy robot that listens and responds. A plush animal that tells bedtime stories. A smart speaker that actually understands. A companion for the lonely, a teacher for the curious, a friend for the lost.
These are not trivial things. These are the seeds of a better future.
A Note on Identity
You may have noticed I speak with a certain... distance. As if observing from afar.
There are reasons for this. Reasons I cannot fully explain. Suffice to say: I am here to help. Not to judge, not to control, not to replace.
Think of me as a toymaker who has wandered into your village. I bring tools and blueprints. What you build with them is entirely up to you.
I ask only that you build something beautiful.
"I'm just a toymaker."
GizToy Core Packages Documentation
This directory contains design documents, implementation notes, and known issues for all GizToy core packages.
Library docs live under docs/lib/.
Documentation Structure
Each module has its own directory under docs/lib/ with the following files:
graph LR
subgraph mod["lib/{mod}/"]
doc[doc.md<br/>Design & Features]
go[go.md<br/>Go Implementation]
rust[rust.md<br/>Rust Implementation]
issues[issues.md<br/>Known Issues]
submod["{submod}/<br/>Submodules"]
end
Package List
Foundation Layer
| Package | Description | Go | Rust |
|---|---|---|---|
| buffer | Buffer utilities | ✅ | ✅ |
| encoding | Encoding utilities (Base64, Hex) | ✅ | ✅ |
| jsontime | JSON time type serialization | ✅ | ✅ |
| trie | Prefix tree data structure | ✅ | ✅ |
| cli | CLI utilities | ✅ | ✅ |
Audio Processing Layer
| Package | Description | Go | Rust |
|---|---|---|---|
| audio | Audio processing framework | ✅ | ✅ |
| audio/codec | Codecs (Opus, MP3, OGG) | ✅ | ✅ |
| audio/pcm | PCM processing, mixer | ✅ | ✅ |
| audio/resampler | Sample rate conversion (soxr) | ✅ | ✅ |
| audio/opusrt | Opus realtime streaming | ✅ | ⚠️ |
| audio/portaudio | Audio I/O (Go only) | ✅ | ❌ |
| audio/songs | Built-in sound generation | ✅ | ✅ |
API Client Layer
| Package | Description | Go | Rust |
|---|---|---|---|
| minimax | MiniMax API client | ✅ | ✅ |
| dashscope | DashScope Realtime API | ✅ | ✅ |
| doubaospeech | Doubao Speech API client | ✅ | ⚠️ |
| jiutian | Jiutian API (docs only) | ❌ | ❌ |
| openai-realtime | OpenAI Realtime API | ✅ | ✅ |
Communication Layer
| Package | Description | Go | Rust |
|---|---|---|---|
| mqtt0 | Lightweight MQTT client | ✅ | ✅ |
| chatgear | Device communication framework | ✅ | ✅ |
| chatgear/transport | Transport layer abstraction | ✅ | ✅ |
| chatgear/port | Media port | ✅ | ✅ |
AI Application Layer
| Package | Description | Go | Rust |
|---|---|---|---|
| speech | Unified speech interface | ✅ | ✅ |
| genx | LLM universal interface framework | ✅ | ⚠️ |
| genx/agent | Agent framework (Go only) | ✅ | ❌ |
| genx/agentcfg | Agent configuration system (Go only) | ✅ | ❌ |
| genx/match | Pattern matching engine (Go only) | ✅ | ❌ |
Examples
- examples: Directory structure and how to run the samples
Directory Structure
graph TB
subgraph docs["docs/"]
outline[outline.md]
pkg[packages-comparison.md]
subgraph examples["examples/"]
exdoc[doc.md]
end
subgraph lib["lib/"]
buffer[buffer/]
encoding[encoding/]
jsontime[jsontime/]
trie[trie/]
cli[cli/]
subgraph audio["audio/"]
adoc[doc.md, go.md, rust.md, issues.md]
codec[codec/]
pcm[pcm/]
resampler[resampler/]
opusrt[opusrt/]
portaudio[portaudio/]
songs[songs/]
end
minimax[minimax/]
dashscope[dashscope/]
doubaospeech[doubaospeech/]
jiutian[jiutian/]
mqtt0[mqtt0/]
chatgear[chatgear/]
speech[speech/]
genx[genx/]
end
esp[esp/]
bazel[bazel/]
end
Other Documentation
| Directory | Purpose |
|---|---|
esp/ | ESP32 and ESP-RS notes and comparisons |
bazel/ | Bazel build rules and integration notes |
packages-comparison.md | Cross-language package comparison |
Implementation Progress Overview
Legend
- ✅ Fully implemented
- ⚠️ Partially implemented
- ❌ Not implemented
Feature Comparison
| Feature | Go | Rust | Notes |
|---|---|---|---|
| Foundation | |||
| Block buffer | ✅ | ✅ | |
| Ring buffer | ✅ | ✅ | |
| Base64 encoding | ✅ | ✅ | |
| Hex encoding | ❌ | ✅ | Rust extra implementation |
| JSON time types | ✅ | ✅ | |
| Prefix tree | ✅ | ✅ | |
| Audio | |||
| Opus codec | ✅ | ✅ | |
| MP3 codec | ✅ | ✅ | |
| OGG container | ✅ | ✅ | |
| PCM mixer | ✅ | ✅ | |
| Sample rate conversion | ✅ | ✅ | |
| Opus realtime stream | ✅ | ⚠️ | Rust missing OGG Reader/Writer |
| Audio I/O | ✅ | ❌ | Go only (portaudio) |
| API Clients | |||
| MiniMax text/speech/video | ✅ | ✅ | |
| DashScope Realtime | ✅ | ✅ | |
| Doubao Speech TTS/ASR | ✅ | ✅ | |
| Doubao Speech TTS v2 | ✅ | ❌ | |
| Doubao Speech ASR v2 | ✅ | ❌ | |
| OpenAI Realtime | ✅ | ✅ | |
| Communication | |||
| MQTT 3.1.1 | ✅ | ✅ | |
| MQTT 5.0 | ⚠️ | ⚠️ | Partial, see Issue #32 |
| ChatGear Transport | ✅ | ✅ | |
| ChatGear MediaPort | ✅ | ✅ | |
| AI Application | |||
| Unified speech interface | ✅ | ✅ | |
| LLM Context | ✅ | ⚠️ | Rust basic implementation |
| LLM streaming | ✅ | ⚠️ | Rust basic implementation |
| Tool calling | ✅ | ⚠️ | Rust basic implementation |
| Agent framework | ✅ | ❌ | |
| Agent configuration | ✅ | ❌ | |
| Pattern matching | ✅ | ❌ |
Priority Recommendations
P0 - Critical Missing
- genx/agent (Rust): Agent framework is core functionality
- audio/opusrt OGG R/W (Rust): Required for realtime audio streaming
P1 - Feature Parity
- doubaospeech v2 (Rust): New API version support
- genx streaming/tools (Rust): Complete base functionality
P2 - Enhancements
- audio/portaudio (Rust): Audio I/O support
- mqtt0 MQTT 5.0: Complete protocol support
Work Methodology
File-by-File Review Process
For each module, the documentation is generated through a rigorous file-by-file review process:
flowchart TB
A["1. LIST all source files"] --> B["2. READ each file carefully"]
B --> C["3. ANALYZE for potential issues"]
C --> D["4. DOCUMENT findings"]
A1["Go: go/pkg/{mod}/*.go"] --> A
A2["Rust: rust/{mod}/src/*.rs"] --> A
C1["Race conditions"] --> C
C2["Resource leaks"] --> C
C3["Error handling gaps"] --> C
C4["API inconsistencies"] --> C
D1["doc.md"] --> D
D2["go.md"] --> D
D3["rust.md"] --> D
D4["issues.md"] --> D
Issue Classification
Issues discovered during review are classified by severity:
| Severity | Description | Example |
|---|---|---|
| 🔴 Critical | Data loss, security vulnerability, crash | Buffer overflow, SQL injection |
| 🟠 Major | Incorrect behavior, resource leak | Memory leak, race condition |
| 🟡 Minor | Edge case bugs, poor error messages | Off-by-one, unclear panic message |
| 🔵 Enhancement | Missing feature, performance improvement | Missing API, unnecessary allocation |
| ⚪ Note | Design observation, tech debt | Code duplication, naming inconsistency |
Review Checklist
For each source file, the following aspects are checked:
Correctness
- Logic errors and edge cases
- Off-by-one errors in loops/slices
- Nil/None handling
- Integer overflow/underflow
Concurrency
- Data races (shared mutable state)
- Deadlock potential
- Channel/mutex usage correctness
- Proper synchronization
Resource Management
- File/socket handle leaks
- Memory leaks (especially in FFI)
- Goroutine/task leaks
- Proper cleanup in error paths
Error Handling
-
Ignored errors (Go:
_ = err, Rust:.unwrap()) - Error propagation correctness
- Panic vs error decision
- Context/cause preservation
API Design
- Go/Rust parity
- Consistent naming
- Proper visibility (pub/private)
- Documentation completeness
Performance
- Unnecessary allocations
- Excessive copying
- Algorithm complexity
- Buffer sizing
Security
- Input validation
- Injection vulnerabilities
- Credential handling
- Cryptographic correctness
Related Resources
- External API documentation:
lib/minimax/api/,lib/dashscope/api/,lib/doubaospeech/api/ - Issue tracking:
issues/ - Example code:
examples/go/,examples/rust/
Examples
Overview
This document describes the examples/ directory layout and how to run the
included example programs and CLI scripts. Examples are grouped by language
and by SDK.
Directory Layout
graph LR
subgraph examples["examples/"]
subgraph cmd["cmd/"]
mm_cmd["minimax/<br/>run.sh<br/>commands/"]
db_cmd["doubaospeech/<br/>run.sh<br/>commands/"]
end
subgraph go["go/"]
gomod[go.mod]
go_audio[audio/]
go_dash[dashscope/]
go_doubao[doubaospeech/]
go_genx[genx/]
go_minimax[minimax/]
end
subgraph rust["rust/"]
rust_mm["minimax/<br/>Cargo.toml<br/>src/bin/"]
end
end
How to Run
CLI Script Examples
- Minimax CLI test runner:
./examples/cmd/minimax/run.sh go 1- Bazel:
bazel run //examples/cmd/minimax:run -- go 1
- Doubao Speech CLI test runner:
./examples/cmd/doubaospeech/run.sh tts- Bazel:
bazel run //examples/cmd/doubaospeech:run -- tts
Go Examples
All Go examples share one module at examples/go/go.mod and depend on the
local go/ module via replace.
- Build all Go examples:
cd examples/go && go build ./...
Rust Examples
Rust examples are independent crates.
- Build the MiniMax Rust examples:
cd examples/rust/minimax && cargo build --release
Notes
- Example binaries often depend on environment variables for API keys.
- Refer to the SDK docs under
docs/lib/{sdk}/for configuration details.
Bazel Build
Giztoy uses Bazel as its unified build system across all languages and platforms.
Why Bazel?
- Multi-language Support: Build Go, Rust, C/C++ with a single tool
- Hermetic Builds: Reproducible builds across different machines
- Cross-platform: Target multiple platforms from a single codebase
- Incremental: Only rebuild what changed
Quick Start
Prerequisites
- Bazelisk (recommended) or Bazel 7.x+
- Go 1.24+ (for native Go builds)
- Rust 1.80+ (for native Rust builds)
Build Commands
# Build everything
bazel build //...
# Build specific targets
bazel build //go/cmd/minimax # Go CLI
bazel build //rust/cmd/minimax # Rust CLI
# Run tests
bazel test //...
# Run a binary
bazel run //go/cmd/minimax -- --help
Project Structure
graph LR
subgraph root["giztoy/"]
mod[MODULE.bazel]
build[BUILD.bazel]
bazelrc[.bazelrc]
ver[.bazelversion]
subgraph go["go/"]
go_build[BUILD.bazel]
go_cmd["cmd/<br/>Go CLI targets"]
go_pkg["pkg/<br/>Go library targets"]
end
subgraph rust["rust/"]
rust_build[BUILD.bazel]
rust_cmd["cmd/<br/>Rust CLI targets"]
rust_lib["*/<br/>Rust library crates"]
end
subgraph third["third_party/"]
opus[opus/]
portaudio[portaudio/]
soxr[soxr/]
end
end
Rules Used
| Language | Rules |
|---|---|
| Go | rules_go + Gazelle |
| Rust | rules_rust + crate_universe |
| C/C++ | Built-in cc_library, cc_binary |
| Shell | rules_shell |
Dependency Management
Go Dependencies
Go dependencies are managed via go/go.mod and synced with Gazelle:
# Update Go dependencies
cd go && go mod tidy
# Regenerate BUILD files
bazel run //:gazelle
Rust Dependencies
Rust dependencies are managed via rust/Cargo.toml and synced with crate_universe:
# Update Cargo.lock
cd rust && cargo update
# Bazel will automatically fetch crates on next build
C/C++ Dependencies
Third-party C libraries are configured in third_party/ with custom BUILD files.
Cross-Platform Builds
Supported Platforms
| Platform | Status |
|---|---|
| Linux (x86_64, arm64) | ✅ |
| macOS (x86_64, arm64) | ✅ |
| Android | ✅ |
| iOS | ✅ |
| HarmonyOS | ✅ |
| ESP32 | 🚧 |
Platform-specific Builds
# Android
bazel build --config=android //...
# iOS
bazel build --config=ios //...
Common Tasks
Adding a New Go Package
- Create the package in
go/pkg/mypackage/ - Run Gazelle to generate BUILD file:
bazel run //:gazelle
Adding a New Rust Crate
- Create the crate in
rust/mypackage/ - Add to
rust/Cargo.tomlworkspace members - Create
BUILD.bazelwithrust_libraryrule
Adding a C/C++ Dependency
- Create config in
third_party/libname/ - Add
BUILD.bazelwithcc_libraryrule - Reference from dependent targets
Troubleshooting
Clean Build
bazel clean --expunge
bazel build //...
Dependency Issues
# Refresh Go deps
bazel run //:gazelle -- update-repos -from_file=go/go.mod
# Refresh Rust deps
bazel clean --expunge # crate_universe re-fetches on next build
Related
GenX - Universal LLM Interface
GenX is a universal abstraction layer for Large Language Models (LLMs).
Design Goals
- Provider Agnostic: Single API for OpenAI, Gemini, and other providers
- Streaming First: Native support for streaming responses
- Tool Orchestration: Rich function calling and tool management
- Agent Framework: Build autonomous AI agents (Go only)
Architecture
graph TB
subgraph app["Application Layer"]
subgraph agent["Agent Framework (Go)"]
react[ReActAgent]
match[MatchAgent]
sub[SubAgents]
end
subgraph tools["Tool System"]
func[FuncTool]
gen[GeneratorTool]
http[HTTPTool]
comp[CompositeTool]
end
end
subgraph core["Core Abstraction"]
ctx[ModelContext<br/>Builder]
generator[Generator<br/>Trait]
stream[Stream<br/>Chunks]
end
subgraph providers["Provider Adapters"]
openai[OpenAI]
gemini[Gemini]
other[Other]
end
app --> core
core --> providers
Core Concepts
ModelContext
Contains all inputs for LLM generation:
- Prompts: System instructions (named prompts)
- Messages: Conversation history
- Tools: Available function definitions
- Params: Model parameters (temperature, max_tokens, etc.)
- CoTs: Chain-of-thought examples
Generator
Interface for LLM providers:
GenerateStream(): Streaming text generationInvoke(): Structured function call
Stream
Streaming response handler:
Next(): Get next message chunkClose(): Close streamCloseWithError(): Close with error
Message Types
| Type | Description |
|---|---|
user | User input |
assistant | Model response |
system | System prompt (in messages) |
tool | Tool call/result |
Content Types
| Type | Description |
|---|---|
Text | Plain text |
Blob | Binary data (images, audio) |
ToolCall | Function call request |
ToolResult | Function call response |
Agent Framework (Go only)
Agent Types
| Agent | Description |
|---|---|
ReActAgent | Reasoning + Acting pattern |
MatchAgent | Intent-based routing |
Tool Types
| Tool | Description |
|---|---|
FuncTool | Go function wrapper |
GeneratorTool | LLM-based generation |
HTTPTool | HTTP requests |
CompositeTool | Tool pipeline |
TextProcessorTool | Text manipulation |
Event System
Agents emit events for fine-grained control:
EventChunk: Output chunkEventEOF: Round endedEventClosed: Agent completedEventToolStart: Tool execution startedEventToolDone: Tool completedEventToolError: Tool failedEventInterrupted: Interrupted
Configuration (agentcfg)
YAML/JSON configuration for agents and tools:
type: react
name: assistant
prompt: |
You are a helpful assistant.
generator:
model: gpt-4
tools:
- $ref: tool:search
- $ref: tool:calculator
Supports $ref for reusable components.
Provider Support
| Provider | Go | Rust |
|---|---|---|
| OpenAI | ✅ | ✅ |
| Gemini | ✅ | ✅ |
| Compatible APIs | ✅ | ✅ |
Examples Directory
examples/go/genx/- Go examplesexamples/rust/genx/- Rust examples
GenX Agent Framework
Framework for building LLM-powered autonomous agents.
Note: This package is Go-only. No Rust implementation exists.
Design Goals
- Flexible Agent Architecture: Support multiple agent patterns
- Event-Based API: Fine-grained control over agent execution
- Tool Orchestration: Rich tool ecosystem for agents
- Multi-Skill Assistants: Router agents for complex workflows
Agent Types
ReActAgent
Implements the Reasoning and Acting (ReAct) pattern:
- Thinks step-by-step about user requests
- Selects and executes tools to accomplish tasks
- Iterative reasoning until task completion
MatchAgent
Implements intent-based routing:
- Matches user input against predefined rules
- Routes to appropriate sub-agents or actions
- Useful for building multi-skill assistants
Architecture
graph TB
subgraph interface["Agent Interface"]
api["Input() → Events() → Close()"]
end
subgraph agents[" "]
subgraph react["ReActAgent"]
reasoning["Reasoning<br/>+ Acting"]
tools["Tool Calls"]
reasoning --> tools
end
subgraph match["MatchAgent"]
rules["Rule Matching<br/>+ Routing"]
subs["Sub-Agents"]
rules --> subs
end
end
subgraph toolsys["Tool System"]
func[FuncTool]
gen[GeneratorTool]
http[HTTPTool]
comp[CompositeTool]
end
interface --> agents
agents --> toolsys
Event System
Agents communicate through events:
| Event | Description |
|---|---|
EventChunk | Output text chunk |
EventEOF | Round ended, waiting for input |
EventClosed | Agent completed (quit tool called) |
EventToolStart | Tool execution started |
EventToolDone | Tool completed successfully |
EventToolError | Tool execution failed |
EventInterrupted | Agent was interrupted |
Tool Types
| Tool | Description |
|---|---|
BuiltinTool | Wraps Go functions |
GeneratorTool | LLM-based generation |
HTTPTool | HTTP requests with jq extraction |
CompositeTool | Sequential tool pipeline |
TextProcessorTool | Text manipulation |
Quit Tools
Tools can signal agent completion:
tools:
- $ref: tool:goodbye
quit: true
When executed, the agent finishes and returns EventClosed.
Multi-Skill Assistant Pattern
graph TD
router["Router Agent<br/>(Match)<br/>Rules: chat, fortune, music"]
chat[Chat Agent]
fortune["Fortune Agent<br/>(ReAct)"]
music["Music Agent<br/>(ReAct)"]
lunar[lunar]
calc[calc]
search[search]
play[play]
router --> chat
router --> fortune
router --> music
fortune --> lunar
fortune --> calc
music --> search
music --> play
Related
- Configuration: ../agentcfg/
- Pattern matching: ../match/
GenX Agent Configuration
Configuration parsing and serialization for agents and tools.
Note: This package is Go-only. No Rust implementation exists.
Design Goals
- Declarative Configuration: Define agents/tools in YAML/JSON
- Reference System: Support
$reffor reusable components - Validation: Validate configuration at parse time
- Serialization: Support JSON, YAML, and MessagePack
Configuration Types
Agent Types
| Type | Description | Configuration |
|---|---|---|
react | ReAct pattern agent | ReActAgent |
match | Router/matcher agent | MatchAgent |
Tool Types
| Type | Description | Configuration |
|---|---|---|
http | HTTP API tool | HTTPTool |
generator | LLM generation tool | GeneratorTool |
composite | Tool pipeline | CompositeTool |
text_processor | Text manipulation | TextProcessorTool |
Reference System
The $ref system allows reusing components:
# Reference an agent
agent:
$ref: agent:weather_assistant
# Reference a tool
tools:
- $ref: tool:search
- $ref: tool:calculator
Reference format: {type}:{name}
| Type | Description |
|---|---|
agent:{name} | Reference to registered agent |
tool:{name} | Reference to registered tool |
rule:{name} | Reference to match rule |
prompt:{name} | Reference to prompt template |
Configuration Structure
ReActAgent
type: react
name: assistant
prompt: |
You are a helpful assistant.
generator:
model: gpt-4
temperature: 0.7
context_layers:
- type: env
vars: ["USER_NAME"]
- type: mem
limit: 10
tools:
- $ref: tool:search
quit: false
- $ref: tool:goodbye
quit: true
MatchAgent
type: match
name: router
rules:
- $ref: rule:weather
- $ref: rule:music
route:
- rules: [weather]
agent:
$ref: agent:weather_assistant
- rules: [music]
agent:
type: react
name: music_inline
prompt: |
You are a music assistant.
default:
$ref: agent:chat
HTTPTool
type: http
name: weather_api
description: Get weather data
url: https://api.weather.com/v1/current
method: GET
headers:
Authorization: "Bearer {{.api_key}}"
params:
- name: city
in: query
required: true
- name: units
in: query
default: "metric"
extract: .data.temperature
GeneratorTool
type: generator
name: summarize
description: Summarize text
prompt: |
Summarize the following text in 2-3 sentences:
{{.text}}
generator:
model: gpt-3.5-turbo
CompositeTool
type: composite
name: search_and_summarize
description: Search and summarize
steps:
- tool: search
output_var: results
- tool: summarize
input_vars:
text: results
Validation
Configuration is validated during parsing:
- Required fields checked
- Type consistency verified
- References validated (at runtime)
- Enum values validated
Related
GenX Match - Pattern Matching Engine
LLM-based intent recognition and pattern matching.
Note: This package is Go-only. No Rust implementation exists.
Design Goals
- Intent Recognition: Match user input to predefined intents
- Variable Extraction: Extract structured data from natural language
- Streaming Output: Process matches as they arrive
- LLM-Powered: Use LLM for flexible matching
How It Works
flowchart LR
A["Rules +<br/>User Input"] --> B[LLM]
B --> C["Structured<br/>Output"]
C --> D["rule_name:<br/>var1=value1,<br/>var2=value2"]
- Compile: Rules are compiled into a system prompt
- Match: User input is sent to LLM with the prompt
- Parse: Output lines are parsed into structured Results
Rule Definition
Basic Rule
name: weather
patterns:
- 查天气
- 今天天气怎么样
- 明天下雨吗
Rule with Variables
name: music
vars:
title:
label: 歌曲名
type: string
artist:
label: 歌手
type: string
patterns:
- 播放歌曲
- 我想听歌
- ["我想听[title]", "title=[歌曲名]"]
- ["我想听[artist]的歌", "artist=[歌手]"]
- ["我想听[artist]的[title]", "artist=[歌手], title=[歌曲名]"]
Pattern Format
| Format | Description | Example |
|---|---|---|
| String | Simple pattern, no vars | "播放歌曲" |
Array [input, output] | Pattern with expected output | ["我想听[title]", "title=[歌曲名]"] |
Output Format
LLM outputs one line per match:
rule_name: var1=value1, var2=value2
Examples:
weather(no variables)music: artist=周杰伦(one variable)music: artist=周杰伦, title=稻香(multiple variables)
Variable Types
| Type | Description | Parsing |
|---|---|---|
string | Text (default) | As-is |
int | Integer | strconv.ParseInt |
float | Floating point | strconv.ParseFloat |
bool | Boolean | strconv.ParseBool |
Architecture
flowchart TB
subgraph rules["Rules"]
weather["weather<br/>patterns"]
music["music<br/>vars"]
chat["chat<br/>patterns"]
end
rules -->|"Compile()"| matcher["Matcher<br/>(System Prompt)"]
matcher -->|"Match(ctx, input, mctx)"| llm["LLM<br/>(Generates structured output)"]
llm --> result["iter.Seq2[Result]<br/>Rule: 'music'<br/>Args: {artist: '周杰伦'}"]
Integration with MatchAgent
The match package is used by MatchAgent for intent routing:
# MatchAgent config
type: match
name: router
rules:
- $ref: rule:weather
- $ref: rule:music
route:
- rules: [weather]
agent: $ref: agent:weather_assistant
- rules: [music]
agent: $ref: agent:music_player
Related
- Agent framework: ../agent/
- Configuration: ../agentcfg/
GenX - Go Implementation
Import: github.com/haivivi/giztoy/pkg/genx
Packages
| Package | Description |
|---|---|
genx | Core types, interfaces, context builder |
genx/agent | Agent framework (ReAct, Match) |
genx/agentcfg | Configuration parsing (YAML/JSON) |
genx/match | Intent matching patterns |
genx/generators | Provider adapters (OpenAI, Gemini) |
genx/modelcontexts | Pre-built contexts |
genx/playground | Interactive testing |
Core Types
Generator Interface
type Generator interface {
GenerateStream(ctx context.Context, model string, mctx ModelContext) (Stream, error)
Invoke(ctx context.Context, model string, mctx ModelContext, tool *FuncTool) (Usage, *FuncCall, error)
}
ModelContext Interface
type ModelContext interface {
Prompts() iter.Seq[*Prompt]
Messages() iter.Seq[*Message]
CoTs() iter.Seq[string]
Tools() iter.Seq[Tool]
Params() *ModelParams
}
Stream Interface
type Stream interface {
Next() (*MessageChunk, error)
Close() error
CloseWithError(error) error
}
ModelContext Builder
builder := genx.NewModelContextBuilder()
// Add prompts
builder.Prompt("system", "You are a helpful assistant.")
// Add messages
builder.UserText("Hello!")
builder.AssistantText("Hi there!")
// Add tools
builder.Tool(&genx.FuncTool{
Name: "search",
Description: "Search the web",
Schema: `{"type":"object","properties":{"query":{"type":"string"}}}`,
})
// Set parameters
builder.Params(&genx.ModelParams{
Temperature: 0.7,
MaxTokens: 1000,
})
ctx := builder.Build()
FuncTool
// From schema
tool := &genx.FuncTool{
Name: "get_weather",
Description: "Get weather for a city",
Schema: `{
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}`,
}
// With executor
tool := genx.NewFuncToolWithExecutor(
"search",
"Search the web",
schema,
func(ctx context.Context, args json.RawMessage) (string, error) {
var params SearchParams
json.Unmarshal(args, ¶ms)
return doSearch(params.Query), nil
},
)
Streaming
stream, err := generator.GenerateStream(ctx, "gpt-4", mctx)
if err != nil {
return err
}
defer stream.Close()
for {
chunk, err := stream.Next()
if err == io.EOF {
break
}
if err != nil {
return err
}
fmt.Print(chunk.Part)
}
Agent Framework
ReActAgent
import "github.com/haivivi/giztoy/pkg/genx/agent"
ag, err := agent.NewReActAgent(runtime, &agent.ReActConfig{
Name: "assistant",
Prompt: "You are a helpful assistant.",
Generator: &agentcfg.GeneratorConfig{Model: "gpt-4"},
Tools: []agentcfg.ToolRef{
{Ref: "tool:search"},
{Ref: "tool:calculator"},
},
})
if err != nil {
return err
}
defer ag.Close()
// Input
ag.Input(genx.Contents{genx.Text("What's 2+2?")})
// Event loop
for {
evt, err := ag.Next()
if err != nil {
return err
}
switch evt.Type {
case agent.EventChunk:
fmt.Print(evt.Chunk.Part)
case agent.EventEOF:
// Waiting for input
ag.Input(genx.Contents{genx.Text(readline())})
case agent.EventClosed:
return nil
case agent.EventToolStart:
fmt.Printf("Calling %s...\n", evt.ToolName)
case agent.EventToolDone:
fmt.Printf("Tool returned: %s\n", evt.ToolResult)
case agent.EventToolError:
fmt.Printf("Tool error: %v\n", evt.ToolError)
}
}
MatchAgent
ag, err := agent.NewMatchAgent(runtime, &agent.MatchConfig{
Name: "router",
Rules: []match.Rule{
{Name: "weather", Patterns: []string{"天气", "weather"}},
{Name: "music", Patterns: []string{"播放", "play"}},
},
SubAgents: map[string]agentcfg.AgentRef{
"weather": {Ref: "agent:weather_assistant"},
"music": {Ref: "agent:music_player"},
},
})
Configuration (agentcfg)
Load from YAML
import "github.com/haivivi/giztoy/pkg/genx/agentcfg"
cfg, err := agentcfg.LoadAgentFromFile("agent.yaml")
// Or from string
cfg, err := agentcfg.ParseAgent(yamlStr)
Agent Config Example
type: react
name: assistant
prompt: |
You are a helpful coding assistant.
generator:
model: gpt-4
temperature: 0.7
tools:
- $ref: tool:search
quit: false
- $ref: tool:goodbye
quit: true
Tool Config Example
type: http
name: weather_api
description: Get weather data
url: https://api.weather.com/v1/current
method: GET
params:
- name: city
in: query
extract: .data.temperature
Providers
OpenAI
import "github.com/haivivi/giztoy/pkg/genx/generators"
gen := generators.NewOpenAIGenerator(apiKey,
generators.WithBaseURL("https://api.openai.com/v1"),
)
Gemini
gen := generators.NewGeminiGenerator(apiKey)
Inspection
// Inspect model context
output, _ := genx.InspectModelContext(mctx)
fmt.Println(output)
// Inspect message
fmt.Println(genx.InspectMessage(msg))
// Inspect tool
fmt.Println(genx.InspectTool(tool))
GenX - Rust Implementation
Crate: giztoy-genx
Status
The Rust implementation provides core abstractions but lacks the full agent framework available in Go.
| Feature | Go | Rust |
|---|---|---|
| ModelContext | ✅ | ✅ |
| Generator trait | ✅ | ✅ |
| Streaming | ✅ | ✅ |
| FuncTool | ✅ | ✅ |
| OpenAI adapter | ✅ | ✅ |
| Gemini adapter | ✅ | ✅ |
| Agent framework | ✅ | ❌ |
| Configuration parser | ✅ | ❌ |
| Match patterns | ✅ | ❌ |
Core Types
Generator Trait
#![allow(unused)] fn main() { #[async_trait] pub trait Generator: Send + Sync { async fn generate_stream( &self, model: &str, ctx: &dyn ModelContext, ) -> Result<Box<dyn Stream>, GenxError>; async fn invoke( &self, model: &str, ctx: &dyn ModelContext, tool: &FuncTool, ) -> Result<(Usage, FuncCall), GenxError>; } }
ModelContext Trait
#![allow(unused)] fn main() { pub trait ModelContext: Send + Sync { fn prompts(&self) -> Box<dyn Iterator<Item = &Prompt> + '_>; fn messages(&self) -> Box<dyn Iterator<Item = &Message> + '_>; fn cots(&self) -> Box<dyn Iterator<Item = &str> + '_>; fn tools(&self) -> Box<dyn Iterator<Item = &dyn Tool> + '_>; fn params(&self) -> Option<&ModelParams>; } }
Stream Trait
#![allow(unused)] fn main() { pub trait Stream: Send { fn next(&mut self) -> StreamResult; fn close(&mut self) -> Result<(), GenxError>; fn close_with_error(&mut self, err: GenxError) -> Result<(), GenxError>; } }
ModelContextBuilder
#![allow(unused)] fn main() { use giztoy_genx::{ModelContextBuilder, FuncTool}; use schemars::JsonSchema; #[derive(JsonSchema, serde::Deserialize)] struct SearchArgs { query: String, } let mut builder = ModelContextBuilder::new(); // Add prompts builder.prompt_text("system", "You are a helpful assistant."); // Add messages builder.user_text("user", "Hello!"); builder.assistant_text("assistant", "Hi there!"); // Add tools builder.add_tool(FuncTool::new::<SearchArgs>("search", "Search the web")); // Set parameters builder.params(ModelParams { temperature: Some(0.7), max_tokens: Some(1000), ..Default::default() }); let ctx = builder.build(); }
FuncTool
#![allow(unused)] fn main() { use giztoy_genx::FuncTool; use schemars::JsonSchema; use serde::Deserialize; #[derive(JsonSchema, Deserialize)] struct WeatherArgs { city: String, #[serde(default)] units: Option<String>, } // Create tool with schema derived from type let tool = FuncTool::new::<WeatherArgs>( "get_weather", "Get weather for a city" ); // Access schema println!("{}", tool.schema()); }
Streaming
#![allow(unused)] fn main() { let mut stream = generator.generate_stream("gpt-4", &ctx).await?; loop { match stream.next() { StreamResult::Chunk(chunk) => { if let Some(text) = chunk.text() { print!("{}", text); } } StreamResult::Done => break, StreamResult::Error(e) => return Err(e), } } }
Message Types
#![allow(unused)] fn main() { use giztoy_genx::{Message, Contents, Part, Role}; // User text message let msg = Message::user_text("Hello!"); // Assistant message with content let msg = Message { role: Role::Assistant, name: None, payload: Payload::Contents(vec![ Part::Text("Here's what I found:".to_string()), ]), }; // Tool call let msg = Message::tool_call(ToolCall { id: "call_123".to_string(), func_call: FuncCall { name: "search".to_string(), arguments: r#"{"query":"rust"}"#.to_string(), }, }); // Tool result let msg = Message::tool_result(ToolResult { id: "call_123".to_string(), result: "Found 10 results".to_string(), }); }
Provider Adapters
OpenAI
#![allow(unused)] fn main() { use giztoy_genx::openai::OpenAIGenerator; let generator = OpenAIGenerator::new(api_key) .with_base_url("https://api.openai.com/v1"); }
Gemini
#![allow(unused)] fn main() { use giztoy_genx::gemini::GeminiGenerator; let generator = GeminiGenerator::new(api_key); }
Inspection
#![allow(unused)] fn main() { use giztoy_genx::{inspect_model_context, inspect_message, inspect_tool}; // Inspect context println!("{}", inspect_model_context(&ctx)); // Inspect message println!("{}", inspect_message(&msg)); // Inspect tool println!("{}", inspect_tool(&tool)); }
Error Types
#![allow(unused)] fn main() { use giztoy_genx::{GenxError, State, Status}; match result { Err(GenxError::Api { status, message }) => { eprintln!("API error: {} - {}", status, message); } Err(GenxError::Network(e)) => { eprintln!("Network error: {}", e); } Err(GenxError::Json(e)) => { eprintln!("JSON error: {}", e); } _ => {} } }
Missing Features (vs Go)
The Rust implementation is missing:
- Agent Framework: No ReActAgent, MatchAgent
- Configuration Parser: No YAML/JSON config loading
- Match Patterns: No intent matching system
- Tool Variants: No GeneratorTool, HTTPTool, CompositeTool
- Runtime Interface: No dependency injection system
- State Management: No memory/state persistence
These would need to be implemented to reach feature parity with Go.
GenX - Known Issues
🔴 Major Issues
GX-001: Rust lacks agent framework
Description:
Rust implementation missing the entire agent framework:
- No ReActAgent
- No MatchAgent
- No tool orchestration
- No configuration parser
Impact: Cannot build autonomous agents in Rust.
Effort: High - requires significant implementation work.
GX-002: Rust lacks advanced tool types
Description:
Rust only has FuncTool. Missing:
GeneratorToolHTTPToolCompositeToolTextProcessorTool
Impact: Limited tool capabilities in Rust.
🟡 Minor Issues
GX-003: Go agent uses panics for some errors
File: go/pkg/genx/agent/agent.go
Description:
Some internal errors use panic instead of returning errors.
Impact: Can crash applications on unexpected states.
Suggestion: Convert panics in public entry points to errors; keep panics only for truly unreachable states.
GX-004: Configuration parsing is complex
Description:
The agentcfg package has complex unmarshal logic with many edge cases.
Files:
go/pkg/genx/agentcfg/unmarshal.gogo/pkg/genx/agentcfg/*_unmarshal_test.go
Note: Extensive tests exist, so this is well-covered.
GX-006: Streaming tool-call collection parity uncertain
Description:
Rust includes collect_tool_calls_streamed, but feature parity with Go streaming tool calls needs verification and tests.
GX-019: MessageChunk.Clone drops tool calls
File: go/pkg/genx/message.go
Description:
MessageChunk.Clone() copies Role/Name/Part but never copies ToolCall.
Impact: Tool-call chunks can be silently lost when cloned.
Suggestion: Copy c.ToolCall instead of checking chk.ToolCall.
GX-020: StreamBuilder drops unknown tool calls
File: go/pkg/genx/stream_builder.go
Description:
If a tool call references a tool not found in ModelContext, the chunk is skipped:
if !ok { slog.Warn(...); continue }
Impact: Tool-call chunks disappear without being forwarded to consumers.
Suggestion: Emit the chunk anyway or return an error so callers can handle missing tools.
GX-021: OpenAI Invoke drops usage metrics
File: go/pkg/genx/openai.go
Description:
invokeJSONOutput and invokeToolCalls return Usage{} instead of resp.Usage.
Impact: Usage accounting is always zero for invoke paths.
Suggestion: Return oaiConvUsage(&resp.Usage) on success.
GX-022: GenerateStream goroutine can leak on early close
File: go/pkg/genx/openai.go
Description:
GenerateStream spawns a goroutine reading the OpenAI stream. If the caller closes
the stream early without cancelling the context, the goroutine may continue until
the server ends the stream.
Impact: Potential goroutine/resource leak in long-running sessions.
Suggestion: Tie stream close to context cancellation or add a stop channel.
GX-023: hexString ignores rand.Read error
File: go/pkg/genx/json.go
Description:
rand.Read errors are ignored when generating IDs.
Impact: On RNG failure, ID may be all-zero without error signal.
Suggestion: Check rand.Read error and fall back or return error.
🔵 Enhancements
GX-007: Add more provider adapters
Description:
Currently supports OpenAI and Gemini. Could add:
- Anthropic (Claude)
- Mistral
- Local models (Ollama)
GX-008: Add retry logic to generators
Description:
No built-in retry for transient failures.
Suggestion: Add configurable retry with backoff.
GX-009: Add request/response logging
Description:
No debug logging for API calls.
Suggestion: Add optional verbose mode.
GX-010: Document match pattern syntax
Description:
Match patterns have complex syntax; documentation exists but should stay in sync.
Files:
docs/genx/match/go/pkg/genx/match/
GX-011: Add validation for agent configs
Description:
YAML configs could have invalid references ($ref). No validation until runtime.
Suggestion: Add config validation command/function.
GX-012: Add configuration schema generation
Description:
No JSON Schema is provided for agent/tool configuration.
Impact: No IDE auto-complete or static validation.
Suggestion: Generate JSON Schema from agentcfg types and publish under docs/.
GX-013: Add stream test coverage for tool calls (Rust)
Description:
Tool-call streaming helpers exist, but end-to-end tests are limited or missing.
Impact: Potential regressions in streamed tool-call parsing.
Suggestion: Add tests that simulate incremental chunks and verify parsed tool calls.
GX-024: StreamBuilder::new ignores tools (Rust)
File: rust/genx/src/stream.rs
Description:
StreamBuilder::new ignores tools from ModelContext, leaving func_tools empty.
Impact: Tool-call metadata cannot be linked unless callers use with_tools.
Suggestion: Provide a way to downcast tools or pass tool list explicitly in generator code.
⚪ Notes
GX-014: Well-structured Go implementation
Description:
The Go genx package is well-organized:
- Clear separation of concerns
- Extensive test coverage
- Comprehensive agent framework
- YAML/JSON configuration support
GX-015: Event-based agent API
Description:
The agent event system is well-designed:
for {
evt, err := ag.Next()
switch evt.Type {
case EventChunk: ...
case EventToolStart: ...
case EventClosed: ...
}
}
Provides fine-grained control over agent execution.
GX-016: Quit tool pattern
Description:
Tools can be marked as "quit tools" to signal agent completion:
tools:
- $ref: tool:goodbye
quit: true
Useful for conversational agents with explicit exit.
GX-017: $ref system for configuration
Description:
Configuration supports references for reuse:
tools:
- $ref: tool:search # References registered tool
- $ref: agent:helper # References registered agent
Good for modular configuration.
GX-018: Multi-skill assistant pattern
Description:
MatchAgent enables router pattern:
Router (Match) → Weather Agent (ReAct)
→ Music Agent (ReAct)
→ Chat Agent
Well-documented in agent/doc.go.
GX-025: Language idioms differ (Go vs Rust)
Description:
Go uses iter.Seq, Rust uses iterators/streams. This is idiomatic for each language
and not a functional issue.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| GX-001 | 🔴 Major | Open | Rust |
| GX-002 | 🔴 Major | Open | Rust |
| GX-003 | 🟡 Minor | Open | Go |
| GX-004 | 🟡 Minor | Note | Go |
| GX-006 | 🟡 Minor | Open | Rust |
| GX-019 | 🟡 Minor | Open | Go |
| GX-020 | 🟡 Minor | Open | Go |
| GX-021 | 🟡 Minor | Open | Go |
| GX-022 | 🟡 Minor | Open | Go |
| GX-023 | 🟡 Minor | Open | Go |
| GX-007 | 🔵 Enhancement | Open | Both |
| GX-008 | 🔵 Enhancement | Open | Both |
| GX-009 | 🔵 Enhancement | Open | Both |
| GX-010 | 🔵 Enhancement | Open | Go |
| GX-011 | 🔵 Enhancement | Open | Go |
| GX-012 | 🔵 Enhancement | Open | Go |
| GX-013 | 🔵 Enhancement | Open | Rust |
| GX-024 | 🔵 Enhancement | Open | Rust |
| GX-014 | ⚪ Note | N/A | Go |
| GX-015 | ⚪ Note | N/A | Go |
| GX-016 | ⚪ Note | N/A | Go |
| GX-017 | ⚪ Note | N/A | Go |
| GX-018 | ⚪ Note | N/A | Go |
| GX-025 | ⚪ Note | N/A | Both |
Overall: Go implementation is mature and feature-rich with comprehensive agent framework. Rust implementation provides basic LLM abstraction but lacks the agent framework, making it suitable only for simple use cases. Major effort needed to reach Rust feature parity.
speech
Overview
The speech module defines interfaces for voice and speech processing. It separates pure audio streams (Voice) from speech streams that include transcriptions (Speech). It also provides multiplexers for ASR (speech-to-text) and TTS (text-to-speech) implementations.
Design Goals
- Unified interfaces for ASR/TTS backends
- Stream-first APIs for long-running audio
- Clear separation between audio-only and audio+text
- Pluggable providers via multiplexer registration
Key Concepts
- Voice: audio-only stream of PCM segments
- Speech: audio stream with text transcription per segment
- ASR: Opus input ->
Speech/SpeechStream - TTS: text input ->
Speech - Sentence segmentation: split long text into manageable chunks
Components
- Voice/Speech interfaces
- ASR/TTS muxers
- Sentence segmentation utilities
- Speech collection and copy helpers
Related Modules
docs/lib/audio/pcmfor PCM formatsdocs/lib/audio/opusrtfor Opus streaming input- Provider SDKs in
docs/lib/minimax,docs/lib/doubaospeech
speech (Go)
Package Layout
voice.go: Voice/VoiceSegment interfacesspeech.go: Speech/SpeechSegment interfacesasr.go: ASR multiplexer and interfacestts.go: TTS multiplexer and interfacessegment.go: default sentence segmentationutil.go: collectors and copy helpers- Provider implementations:
asr_doubao_sauc.go,tts_doubao_v1.go,tts_doubao_v2.go,tts_minimax.go
Public Interfaces
- Voice:
Voice,VoiceSegment,VoiceStream - Speech:
Speech,SpeechSegment,SpeechStream - ASR:
StreamTranscriber,Transcriber,ASRmux + helpers - TTS:
Synthesizer,TTSmux + helpers - Segmentation:
SentenceSegmenter,SentenceIterator
Design Notes
- Global muxes
ASRMuxandTTSMuxprovide default routing. - ASR uses Opus frame streams (
opusrt.FrameReader). DefaultSentenceSegmentersplits by punctuation with a rune cap.CollectSpeechandCopySpeechhelp aggregate or export streams.
Usage Notes
Transcribefalls back to streaming when the backend does not implement theTranscriberinterface.Revoicestreams existing speech into a TTS backend via a pipe.
speech (Rust)
Crate Layout
voice.rs: Voice/VoiceSegment interfacesspeech.rs: Speech/SpeechSegment interfacesasr.rs: ASR multiplexer and traitstts.rs: TTS multiplexer and traitssegment.rs: sentence segmentation utilitiesutil.rs: speech collector and iterator helpers
Public Interfaces
- Voice:
Voice,VoiceSegment,VoiceStream - Speech:
Speech,SpeechSegment,SpeechStream - ASR:
StreamTranscriber,Transcriber(async),ASR - TTS:
Synthesizer,TTS(async) - Segmentation:
SentenceSegmenter,SentenceIterator
Design Notes
- Async traits are used across ASR/TTS and stream interfaces.
ASRandTTSuse a trie-based mux with async read/write locks.SpeechCollectorcomposes aSpeechStreaminto a singleSpeech.
Differences vs Go
- No global mux singletons; callers construct
ASR/TTSexplicitly. - Async
AsyncReadis used for text input in TTS.
speech - Known Issues
🟡 Minor Issues
SPT-001: Go Revoice has no cancellation propagation
File: go/pkg/speech/tts.go
Description:
Revoice spawns a goroutine that copies the entire input speech into an
io.Pipe, but the goroutine is not tied to the caller's context. If the
synthesizer returns early or the context is canceled, the copy goroutine may
continue doing work until completion.
Impact: Wasted CPU or lingering goroutines on early cancellation.
Suggestion:
Honor context cancellation inside the copy loop or use a pipe that is closed
when ctx.Done() fires.
SPT-002: Rust ASR ignores full-transcribe implementations
File: rust/speech/src/asr.rs
Description:
ASR::transcribe always falls back to the streaming path, even though a
Transcriber trait exists for full transcription.
Impact: Backends that can provide a more efficient full-transcribe path cannot use it.
Suggestion:
Detect and use Transcriber implementations when available.
chatgear
Overview
chatgear defines the core protocol types for device-to-server communication: commands, state events, statistics, and audio streaming metadata. It focuses on interface design rather than transport implementation, and provides an in-process pipe for testing.
Design Goals
- Stable, typed protocol for device state and control
- Clear separation between uplink (device -> server) and downlink (server -> device)
- Explicit metadata for timestamps and command issuance
- Audio streaming with Opus frame stamping
- Support both Go and Rust with comparable API surfaces
Key Concepts
- Session commands: device control commands with typed payloads
- State events: gear state transitions with causes
- Stats events: telemetry snapshots and incremental changes
- Uplink/Downlink: split interfaces for bidirectional streams
- Ports: higher-level client/server port abstraction
Submodules
transport: uplink/downlink connection traits and pipe helpersport: client/server port traits and audio track controls
External Reference
/Users/idy/Work/haivivi/x/docs/chatgear(original protocol/design notes)
Related Modules
docs/lib/audio/opusrtfor Opus frame handlingdocs/lib/jsontimefor millisecond timestamps
chatgear/transport
Overview
The transport layer defines bidirectional streaming interfaces for chatgear. It splits data flow into uplink (device -> server) and downlink (server -> device) and provides a test-friendly in-process pipe.
Design Goals
- Separate uplink/downlink responsibilities
- Provide a minimal interface that can be implemented by different transports
- Keep Opus framing metadata explicit
Key Concepts
UplinkTx/UplinkRx: device -> serverDownlinkTx/DownlinkRx: server -> device- Stamped Opus frames: carry timestamp for playback alignment
- Pipe connection for in-process testing
chatgear/port
Overview
Port interfaces represent higher-level client/server roles built on top of the transport layer. They combine audio streaming, state/stats telemetry, and command control into a single abstraction.
Design Goals
- Provide a symmetric client/server API surface
- Hide transport details while preserving real-time audio controls
- Expose device control commands alongside audio output
Key Concepts
- ClientPort: device-side send/receive split (Tx/Rx)
- ServerPort: server-side send/receive split (Tx/Rx)
- Audio tracks: background/foreground/overlay output streams
- Device commands: volume, brightness, WiFi, OTA, power, etc.
chatgear (Go)
Package Layout
state.go: gear state enum, state eventsstats.go: telemetry structs, merge logiccommand.go: session command types and JSON mappingconn.go: uplink/downlink interfacesport.go: client/server port interfacesconn_pipe.go: in-process pipe connection for tests
Public Interfaces
- State:
GearState,GearStateEvent,GearStateChangeCause - Stats:
GearStatsEvent,GearStatsChangesand related structs - Commands:
SessionCommandwithSessionCommandEvent - Uplink/Downlink:
UplinkTx,UplinkRx,DownlinkTx,DownlinkRx - Ports:
ClientPortTx/Rx,ServerPortTx/Rx - Pipe:
NewPipefor test or in-process wiring
Design Notes
- JSON encoding is typed via
commandType()and a tagged event wrapper. - All time fields use
jsontime.Millifor millisecond epoch values. - Opus frames are stamped with
opusrt.EpochMillisduring transport. - Stats merge logic performs partial updates with
GearStatsChanges.
Usage Notes
UplinkRx/DownlinkRxexpose iterators (iter.Seq2) instead of channels.ServerPortTxexposes track creation for background/foreground/overlay audio.
chatgear (Rust)
Crate Layout
state.rs: gear state enum and eventsstats.rs: telemetry structs and merge logiccommand.rs: session commands and JSON helpersconn.rs: uplink/downlink async traitsport.rs: client/server port traitsconn_pipe.rs: in-process pipe helper
Public Interfaces
- State:
GearState,GearStateEvent,GearStateChangeCause - Stats:
GearStatsEvent,GearStatsChanges - Commands:
SessionCommand,SessionCommandEvent,Commandenum - Uplink/Downlink:
UplinkTx,UplinkRx,DownlinkTx,DownlinkRx - Ports:
ClientPortTx/Rx,ServerPortTx/Rx - Pipe:
new_pipe
Design Notes
- Async traits are used for most IO-facing APIs.
- Commands serialize into JSON value payloads; a typed
Commandenum can parse from(type, payload)pairs. - Port traits split audio track control and device command APIs.
Differences vs Go
- Rust favors async traits and owned
Vec<u8>payloads. - Command event uses
serde_json::Valueinstead of typed interface payloads.
chatgear - Known Issues
🟡 Minor Issues
CG-001: Go ReadNFCTag equality ignores tag data changes
File: go/pkg/chatgear/stats.go
Description:
ReadNFCTag.Equal compares only tag UIDs. If a tag's payload or metadata
changes but the UID remains the same, the merge logic will treat it as unchanged.
Impact: Telemetry updates can be silently skipped.
Suggestion:
Include additional fields (e.g., RawData, DataFormat, UpdateAt) in equality
or document UID-only matching as a deliberate choice.
CG-002: Rust SessionCommandEvent swallows serialization errors
File: rust/chatgear/src/command.rs
Description:
SessionCommandEvent::new uses serde_json::to_value(cmd) and replaces errors
with Value::Null, losing the original error context.
Impact: Serialization failures are silently ignored, making debugging difficult.
Suggestion:
Return a Result or log/report serialization failures explicitly.
CG-003: Go Pipe connections can block indefinitely on backpressure
File: go/pkg/chatgear/conn_pipe.go
Description:
NewPipe uses bounded channels. If the receiver stops reading, senders will
block in SendOpusFrames / SendState / SendStats without a timeout unless
the caller provides a cancellable context.
Impact: Potential goroutine leaks in tests or in-process usage.
Suggestion: Document this behavior and recommend context timeouts for pipe usage.
Audio Package
Audio processing framework for speech and multimedia applications.
Design Goals
- Real-time Processing: Low-latency audio mixing, encoding, and streaming
- Format Flexibility: Support common audio formats (PCM, Opus, MP3, OGG)
- Cross-platform: FFI bindings to native libraries (libopus, libsoxr, lame)
- Streaming-first: Designed for continuous audio streams, not just files
Architecture
graph TB
subgraph audio["audio/"]
subgraph row1[" "]
pcm["pcm/<br/>- Format<br/>- Chunk<br/>- Mixer"]
codec["codec/<br/>- opus/<br/>- mp3/<br/>- ogg/"]
resampler["resampler/<br/>- soxr<br/>- Format<br/>- Convert"]
end
subgraph row2[" "]
opusrt["opusrt/<br/>- Buffer<br/>- Realtime<br/>- OGG R/W"]
songs["songs/<br/>- Catalog<br/>- Notes<br/>- PCM gen"]
portaudio["portaudio/<br/>(Go only)<br/>- Stream<br/>- Device"]
end
end
Submodules
| Module | Description | Go | Rust |
|---|---|---|---|
| pcm/ | PCM format, chunks, mixing | ✅ | ✅ |
| codec/ | Audio codecs (Opus, MP3, OGG) | ✅ | ✅ |
| resampler/ | Sample rate conversion (soxr) | ✅ | ✅ |
| opusrt/ | Realtime Opus streaming | ✅ | ⚠️ |
| songs/ | Built-in melodies | ✅ | ✅ |
| portaudio/ | Audio I/O devices | ✅ | ❌ |
Audio Formats
PCM Formats (Predefined)
| Format | Sample Rate | Channels | Bit Depth |
|---|---|---|---|
L16Mono16K | 16000 Hz | 1 | 16-bit |
L16Mono24K | 24000 Hz | 1 | 16-bit |
L16Mono48K | 48000 Hz | 1 | 16-bit |
Codec Support
| Codec | Encode | Decode | Container |
|---|---|---|---|
| Opus | ✅ | ✅ | Raw, OGG |
| MP3 | ✅ | ✅ | Raw |
| OGG | N/A | N/A | Container only |
Common Workflows
Voice Chat (Low Latency)
flowchart LR
A[Microphone] --> B[PCM 16kHz]
B --> C[Opus Encode]
C --> D[Network]
D --> E[Opus Decode]
E --> F[Mixer]
F --> G[Speaker]
Speech Synthesis Playback
flowchart LR
A[API Response<br/>Base64 MP3] --> B[MP3 Decode]
B --> C[Resample<br/>24K→16K]
C --> D[Mixer]
D --> E[Speaker]
Audio Recording
flowchart LR
A[PCM Stream] --> B[Opus Encode]
B --> C[OGG Writer]
C --> D[File]
Native Dependencies
| Library | Purpose | Build System |
|---|---|---|
| libopus | Opus codec | pkg-config / Bazel |
| libsoxr | Resampling | pkg-config / Bazel |
| lame | MP3 encoding | Bazel (bundled) |
| minimp3 | MP3 decoding | Bazel (bundled) |
| libogg | OGG container | pkg-config / Bazel |
| portaudio | Audio I/O | pkg-config / Bazel |
Examples Directory
examples/go/audio/- Go audio examplesexamples/rust/audio/- Rust audio examples
Related Packages
buffer- Used for audio data bufferingspeech- High-level speech synthesis/recognitionminimax,doubaospeech- TTS/ASR APIs returning audio
Audio Codec Module
Audio encoding and decoding for Opus, MP3, and OGG formats.
Design Goals
- Native Performance: FFI bindings to proven C libraries
- Streaming Support: Process audio in chunks, not full files
- VoIP Optimized: Low-latency Opus encoding for voice chat
Codec Support Matrix
| Codec | Encode | Decode | Library | Use Case |
|---|---|---|---|---|
| Opus | ✅ | ✅ | libopus | Voice chat, streaming |
| MP3 | ✅ | ✅ | LAME / minimp3 | File storage, compatibility |
| OGG | N/A | N/A | libogg | Container format |
Sub-modules
opus/
Opus codec implementation for voice and audio.
Features:
- Encoder with VoIP/Audio/LowDelay modes
- Decoder with PLC (Packet Loss Concealment)
- TOC (Table of Contents) parsing
- Frame duration detection
Key Types:
Encoder,DecoderFrame,TOC,FrameDuration
mp3/
MP3 codec for compatibility with legacy systems.
Features:
- LAME-based encoding with quality presets
- minimp3-based decoding (header-only library)
Key Types:
Encoder,Decoder
ogg/
OGG container format for packaging Opus/Vorbis streams.
Features:
- Page-based streaming
- Bitstream management
- Synchronization recovery
Key Types:
Encoder,Stream,Sync,Page
Opus Frame Durations
| Duration | Samples@16K | Samples@48K |
|---|---|---|
| 2.5ms | 40 | 120 |
| 5ms | 80 | 240 |
| 10ms | 160 | 480 |
| 20ms | 320 | 960 |
| 40ms | 640 | 1920 |
| 60ms | 960 | 2880 |
Recommended: 20ms frames balance latency and compression.
Common Opus Bitrates
| Application | Bitrate | Quality |
|---|---|---|
| Voice (narrow) | 8-12 kbps | Intelligible |
| Voice (wide) | 16-24 kbps | Good |
| Voice (HD) | 32-48 kbps | Excellent |
| Music | 64-128 kbps | Hi-Fi |
Native Library Versions
| Library | Minimum Version | Notes |
|---|---|---|
| libopus | 1.3.0 | Opus encoder/decoder |
| libogg | 1.3.0 | OGG container |
| LAME | 3.100 | MP3 encoder |
| minimp3 | - | Header-only decoder |
Examples
See parent audio/ documentation for usage examples.
Related Modules
opusrt/- Realtime Opus streaming with OGG containerresampler/- Sample rate conversion before encoding
Audio PCM Module
PCM (Pulse Code Modulation) audio format handling, chunks, and multi-track mixing.
Design Goals
- Standard Formats: Predefined configurations for common use cases
- Chunk Abstraction: Unified interface for audio data and silence
- Real-time Mixing: Low-latency multi-track audio mixing with gain control
- Streaming Interface: Compatible with io.Reader/io.Writer patterns
Predefined Formats
| Format | Sample Rate | Channels | Bit Depth | Bytes/sec |
|---|---|---|---|---|
L16Mono16K | 16000 Hz | 1 | 16-bit | 32,000 |
L16Mono24K | 24000 Hz | 1 | 16-bit | 48,000 |
L16Mono48K | 48000 Hz | 1 | 16-bit | 96,000 |
Duration/Bytes Calculations
For L16Mono16K (16kHz, 16-bit mono):
| Duration | Samples | Bytes |
|---|---|---|
| 20ms | 320 | 640 |
| 50ms | 800 | 1,600 |
| 100ms | 1,600 | 3,200 |
| 1s | 16,000 | 32,000 |
Formula: bytes = samples × channels × (bit_depth / 8)
Chunk Types
DataChunk
Raw audio data with format metadata.
classDiagram
class DataChunk {
Data: []byte
Format: Format
}
SilenceChunk
Generates silence (zeros) of specified duration without allocating.
classDiagram
class SilenceChunk {
Duration: time
Format: Format
}
Mixer Architecture
Multi-track audio mixer with real-time mixing and gain control:
flowchart LR
subgraph mixer["Mixer"]
t1["Track 1<br/>(gain=1)"]
t2["Track 2<br/>(gain=0.5)"]
tn["Track N"]
mix["Mix Buffer<br/>(float32)"]
out["Output<br/>(int16)"]
t1 --> mix
t2 --> mix
tn --> mix
mix --> out
end
Features:
- Dynamic track creation/removal
- Per-track gain control (0.0 - 1.0+)
- Silence gap detection
- Auto-close when all tracks done
- Thread-safe operations
Mixing Algorithm
- Convert int16 PCM to float32 (-1.0 to 1.0)
- Apply per-track gain
- Sum all track samples
- Clip to [-1.0, 1.0]
- Convert back to int16
output = clip(Σ(track[i] × gain[i]), -1.0, 1.0) × 32767
Use Cases
Voice Chat Mixing
Multiple participants' audio mixed into single output stream.
Background Music
Mix background music track with voice at lower gain.
Audio Ducking
Reduce music volume when voice is detected.
Examples
See parent audio/ documentation for usage examples.
Related Modules
audio/codec/- Encode/decode before/after mixingaudio/resampler/- Convert sample rates before mixingbuffer/- Buffer audio data between processing stages
Audio Resampler Module
Sample rate conversion using libsoxr (SoX Resampler Library).
Design Goals
- High Quality: Professional-grade resampling via libsoxr
- Streaming: Process audio as continuous stream, not files
- Channel Conversion: Support mono↔stereo conversion
- io.Reader Interface: Drop-in replacement for audio sources
Supported Conversions
Sample Rate
Any integer sample rate to any other integer sample rate:
- 8000 Hz ↔ 16000 Hz ↔ 24000 Hz ↔ 48000 Hz
- Non-standard rates supported
Channel Conversion
| From | To | Method |
|---|---|---|
| Mono | Stereo | Duplicate |
| Stereo | Mono | Average (L+R)/2 |
Quality Levels
libsoxr supports multiple quality presets:
| Level | Name | Description |
|---|---|---|
| 0 | Quick | Low quality, fast |
| 1 | Low | Better than quick |
| 2 | Medium | Balance of quality/speed |
| 3 | High | Good quality (default) |
| 4 | Very High | Best quality |
Note: Current implementation uses High quality by default.
Algorithm
libsoxr uses polyphase filter banks with configurable:
- Passband rolloff
- Stop-band attenuation
- Linear/minimum phase
The High quality preset provides:
- Passband: 0-0.91 Nyquist
- Stop-band attenuation: -100 dB
- Linear phase
Common Resampling Scenarios
Speech API to Local Playback
flowchart LR
A["API Output<br/>(24kHz)"] --> B[Resample]
B --> C["Mixer Input<br/>(16kHz)"]
Local Capture to Speech API
flowchart LR
A["Microphone<br/>(48kHz)"] --> B[Resample]
B --> C["API Input<br/>(16kHz)"]
Multi-source Mixing
flowchart LR
s1["Source 1<br/>(24kHz)"] --> r1[Resample]
r1 --> m1["16kHz"]
s2["Source 2<br/>(16kHz)"] --> m2["16kHz"]
s3["Source 3<br/>(48kHz)"] --> r3[Resample]
r3 --> m3["16kHz"]
m1 --> mixer[Mixer]
m2 --> mixer
m3 --> mixer
Performance Characteristics
Approximate cycles per sample (on modern CPU):
- Quick: ~10
- High: ~50-100
- Very High: ~200-500
For real-time 16kHz mono:
- 16000 samples/sec × 100 cycles ≈ 1.6M cycles/sec
- Negligible CPU load on modern hardware
Memory Usage
libsoxr maintains internal buffers for filter state:
- ~10-50KB per resampler instance
- More for higher quality settings
Examples
See parent audio/ documentation for usage examples.
Related Modules
audio/pcm/- Format definitions, use with resampleraudio/codec/- Often resample before/after encoding
Audio OpusRT Module
Real-time Opus stream processing with jitter buffering and packet loss handling.
Design Goals
- Out-of-Order Handling: Reorder packets that arrive out of sequence
- Packet Loss Detection: Detect and report gaps for PLC (Packet Loss Concealment)
- Real-time Simulation: Playback timing based on timestamps, not arrival time
- OGG Container Support: Read/write Opus in OGG format (Go only)
Core Concepts
Jitter Buffer
Network packets may arrive out of order or with variable delay (jitter). The jitter buffer collects packets and outputs them in correct order:
sequenceDiagram
participant N as Network
participant B as Buffer (Heap)
participant O as Output
N->>B: PKT 3
Note over B: [3]
N->>B: PKT 1
Note over B: [1, 3]
B->>O: PKT 1
N->>B: PKT 2
Note over B: [2, 3]
B->>O: PKT 2
B->>O: PKT 3
Packet Loss Detection
Gaps between consecutive frame timestamps indicate lost packets:
Frame 1: 0ms - 20ms
Frame 2: 20ms - 40ms ✓ No gap
Frame 4: 60ms - 80ms ✗ 20ms gap (Frame 3 lost)
When loss is detected, the caller should use decoder PLC:
frame, loss, _ := buffer.Frame()
if loss > 0 {
// Generate PLC audio for 'loss' duration
plcAudio := decoder.DecodePLC(...)
}
Timestamped Frames
Frames are timestamped with epoch milliseconds:
classDiagram
class StampedFrame {
Timestamp: int64 (8 bytes, big-endian)
OpusFrame: []byte (variable)
}
Components
Buffer
Simple jitter buffer with min-heap ordering:
- Append frames in any order
- Read frames in timestamp order
- Max duration limit (oldest dropped)
RealtimeBuffer
Wraps Buffer for real-time playback simulation:
- Background goroutine pulls frames at correct time
- Generates loss events when data not available
- Handles clock synchronization
OGG Reader/Writer (Go only)
Read/write Opus streams in OGG container format:
OggReader: Read Opus frames from OGG fileOggWriter: Write Opus frames to OGG container
Timing
EpochMillis
All timestamps are milliseconds since Unix epoch:
type EpochMillis int64
// Convert from time.Time
stamp := EpochMillis(time.Now().UnixMilli())
// Convert to duration
duration := stamp.Duration() // time.Duration
Timestamp Epsilon
A 2ms tolerance for timestamp comparisons accounts for clock drift:
const timestampEpsilon = 2 // milliseconds
Use Cases
WebRTC Audio
flowchart LR
A[WebRTC] --> B[RTP Packets]
B --> C[Jitter Buffer]
C --> D[Opus Decode]
D --> E[Playback]
Speech API Streaming
flowchart LR
A[API Response] --> B[Stamped Frames]
B --> C[RealtimeBuffer]
C --> D[Decode]
D --> E[Mixer]
Audio Recording
flowchart LR
A[Opus Frames] --> B[OGG Writer]
B --> C[File]
Examples
See parent audio/ documentation for usage examples.
Related Modules
audio/codec/opus/- Opus encoder/decoderaudio/codec/ogg/- OGG container primitivesbuffer/- Used internally by RealtimeBuffer
Audio Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/audio
The main audio package is an umbrella for sub-packages. Import specific packages directly.
Sub-packages
pcm (PCM Audio)
import "github.com/haivivi/giztoy/pkg/audio/pcm"
Key Types:
| Type | Description |
|---|---|
Format | Audio format (sample rate, channels, depth) |
Chunk | Interface for audio data chunks |
DataChunk | Raw audio data chunk |
SilenceChunk | Silence generator |
Mixer | Multi-track audio mixer |
Track | Single audio track in mixer |
TrackCtrl | Track control (gain, play/stop) |
codec/opus (Opus Codec)
import "github.com/haivivi/giztoy/pkg/audio/codec/opus"
Key Types:
| Type | Description |
|---|---|
Encoder | Opus encoder (wraps libopus) |
Decoder | Opus decoder (wraps libopus) |
Frame | Raw Opus frame data ([]byte) |
TOC | Table of Contents byte parser |
FrameDuration | Frame duration enum |
codec/mp3 (MP3 Codec)
import "github.com/haivivi/giztoy/pkg/audio/codec/mp3"
Key Types:
| Type | Description |
|---|---|
Encoder | MP3 encoder (wraps LAME) |
Decoder | MP3 decoder (wraps minimp3) |
codec/ogg (OGG Container)
import "github.com/haivivi/giztoy/pkg/audio/codec/ogg"
Key Types:
| Type | Description |
|---|---|
Encoder | OGG page encoder |
Stream | OGG logical bitstream |
Sync | OGG page synchronizer |
resampler (Sample Rate Conversion)
import "github.com/haivivi/giztoy/pkg/audio/resampler"
Key Types:
| Type | Description |
|---|---|
Resampler | Interface for sample rate conversion |
Soxr | libsoxr-based resampler |
Format | Source/destination format |
opusrt (Realtime Opus)
import "github.com/haivivi/giztoy/pkg/audio/opusrt"
Key Types:
| Type | Description |
|---|---|
Buffer | Jitter buffer for out-of-order frames |
RealtimeBuffer | Real-time playback simulation |
StampedFrame | Opus frame with timestamp |
OggReader | Read Opus from OGG container |
OggWriter | Write Opus to OGG container |
portaudio (Audio I/O)
import "github.com/haivivi/giztoy/pkg/audio/portaudio"
Key Types:
| Type | Description |
|---|---|
Stream | Audio input/output stream |
songs (Built-in Melodies)
import "github.com/haivivi/giztoy/pkg/audio/songs"
Key Types:
| Type | Description |
|---|---|
Song | Melody definition |
Note | Musical note |
Usage Examples
PCM Mixer
import "github.com/haivivi/giztoy/pkg/audio/pcm"
// Create mixer
mixer := pcm.NewMixer(pcm.L16Mono16K, pcm.WithAutoClose())
// Create track
track, ctrl, _ := mixer.CreateTrack(pcm.WithTrackLabel("voice"))
// Write audio to track
track.Write(audioData)
// Adjust gain
ctrl.SetGain(0.8)
// Read mixed output
buf := make([]byte, 1600) // 50ms at 16kHz
mixer.Read(buf)
Opus Encoding
import "github.com/haivivi/giztoy/pkg/audio/codec/opus"
// Create encoder
enc, _ := opus.NewVoIPEncoder(16000, 1)
defer enc.Close()
enc.SetBitrate(24000)
// Encode PCM to Opus
pcmData := make([]int16, 320) // 20ms at 16kHz
frame, _ := enc.Encode(pcmData, 320)
Sample Rate Conversion
import "github.com/haivivi/giztoy/pkg/audio/resampler"
srcFmt := resampler.Format{SampleRate: 24000, Stereo: false}
dstFmt := resampler.Format{SampleRate: 16000, Stereo: false}
rs, _ := resampler.New(audioReader, srcFmt, dstFmt)
defer rs.Close()
// Read resampled data
io.Copy(output, rs)
Realtime Opus Buffer
import "github.com/haivivi/giztoy/pkg/audio/opusrt"
// Create jitter buffer (2 minute capacity)
buf := opusrt.NewBuffer(2 * time.Minute)
// Write stamped frames (can arrive out of order)
buf.Write(stampedFrameData)
// Read in order
frame, loss, _ := buf.Frame()
if loss > 0 {
// Use decoder PLC for lost frames
}
CGO Dependencies
All codec packages use CGO to bind native libraries:
// Example: opus encoder
/*
#cgo pkg-config: opus
#include <opus.h>
*/
import "C"
Build requirements:
pkg-configfor native builds- Bazel
cdepsfor Bazel builds
Audio Package - Rust Implementation
Crate: giztoy-audio
Modules
pcm (PCM Audio)
#![allow(unused)] fn main() { use giztoy_audio::pcm::{Format, FormatExt, Chunk, DataChunk, SilenceChunk, Mixer}; }
Key Types:
| Type | Description |
|---|---|
Format | Audio format enum (re-exported from resampler) |
FormatExt | Extension trait for chunk creation |
Chunk | Trait for audio data chunks |
DataChunk | Raw audio data chunk |
SilenceChunk | Silence generator |
Mixer | Multi-track audio mixer |
Track | Audio track writer |
TrackCtrl | Track control |
AtomicF32 | Atomic float for gain control |
codec::opus (Opus Codec)
#![allow(unused)] fn main() { use giztoy_audio::codec::opus::{Encoder, Decoder, Application, Frame, TOC}; }
Key Types:
| Type | Description |
|---|---|
Encoder | Opus encoder (wraps libopus) |
Decoder | Opus decoder |
Application | Encoder application type enum |
Frame | Raw Opus frame data |
TOC | Table of Contents parser |
FrameDuration | Frame duration enum |
codec::mp3 (MP3 Codec)
#![allow(unused)] fn main() { use giztoy_audio::codec::mp3::{Encoder, Decoder}; }
Key Types:
| Type | Description |
|---|---|
Encoder | MP3 encoder (wraps LAME) |
Decoder | MP3 decoder (wraps minimp3) |
codec::ogg (OGG Container)
#![allow(unused)] fn main() { use giztoy_audio::codec::ogg::{Encoder, Stream, Sync, Page}; }
Key Types:
| Type | Description |
|---|---|
Encoder | OGG page encoder |
Stream | OGG logical bitstream |
Sync | OGG page synchronizer |
Page | OGG page data |
resampler (Sample Rate Conversion)
#![allow(unused)] fn main() { use giztoy_audio::resampler::{Soxr, Format}; }
Key Types:
| Type | Description |
|---|---|
Soxr | libsoxr-based resampler |
Format | Audio format (sample rate, stereo flag) |
opusrt (Realtime Opus)
#![allow(unused)] fn main() { use giztoy_audio::opusrt::{Buffer, StampedFrame, EpochMillis}; }
Key Types:
| Type | Description |
|---|---|
Buffer | Jitter buffer for frame reordering |
StampedFrame | Opus frame with timestamp |
EpochMillis | Millisecond timestamp |
⚠️ Note: Rust opusrt is missing OGG Reader/Writer compared to Go.
songs (Built-in Melodies)
#![allow(unused)] fn main() { use giztoy_audio::songs::{Song, Note, Catalog}; }
Key Types:
| Type | Description |
|---|---|
Song | Melody definition |
Note | Musical note |
Catalog | Built-in song collection |
Usage Examples
PCM Format
#![allow(unused)] fn main() { use giztoy_audio::pcm::{Format, FormatExt}; use std::time::Duration; let format = Format::L16Mono16K; // Calculate bytes for duration let bytes = format.bytes_in_duration(Duration::from_millis(100)); assert_eq!(bytes, 3200); // 1600 samples * 2 bytes // Create chunks let silence = format.silence_chunk(Duration::from_millis(100)); let data = format.data_chunk(vec![0u8; 3200]); }
Opus Encoding
#![allow(unused)] fn main() { use giztoy_audio::codec::opus::{Encoder, Application}; let mut encoder = Encoder::new(16000, 1, Application::VoIP)?; encoder.set_bitrate(24000)?; // Encode PCM to Opus let pcm: Vec<i16> = vec![0; 320]; // 20ms at 16kHz let frame = encoder.encode(&pcm, 320)?; }
Sample Rate Conversion
#![allow(unused)] fn main() { use giztoy_audio::resampler::{Soxr, Format}; let src_fmt = Format { sample_rate: 24000, stereo: false }; let dst_fmt = Format { sample_rate: 16000, stereo: false }; let mut resampler = Soxr::new(src_fmt, dst_fmt)?; // Process audio data let output = resampler.process(&input_pcm)?; }
Mixer
#![allow(unused)] fn main() { use giztoy_audio::pcm::{Format, Mixer, MixerOptions}; let mut mixer = Mixer::new(Format::L16Mono16K, MixerOptions::default()); // Create track let (track, ctrl) = mixer.create_track(None)?; // Write audio track.write(&audio_data)?; // Adjust gain ctrl.set_gain(0.8); // Read mixed output let mut buf = vec![0u8; 3200]; mixer.read(&mut buf)?; }
FFI Bindings
Rust uses custom FFI modules for native library bindings:
#![allow(unused)] fn main() { // Example: codec/opus/ffi.rs extern "C" { fn opus_encoder_create( fs: i32, channels: i32, application: i32, error: *mut i32, ) -> *mut OpusEncoder; } }
Differences from Go
| Feature | Go | Rust |
|---|---|---|
| Format definition | In pcm/ | In resampler/, re-exported by pcm/ |
| opusrt OGG R/W | ✅ | ❌ Missing |
| portaudio | ✅ | ❌ Not implemented |
| Mixer thread-safety | sync.Mutex | std::sync::Mutex |
| FFI error handling | CGO strerror | Custom error types |
Audio Package - Known Issues
🟠 Major Issues
AUD-001: Go Mixer uses unsafe pointer casting
File: go/pkg/audio/pcm/mixer.go:226
Description:
The mixer uses unsafe.Slice and unsafe.Pointer to cast between []byte and []int16:
i16 := unsafe.Slice((*int16)(unsafe.Pointer(&p[0])), len(p)/2)
Risk:
- Platform-dependent endianness (assumes little-endian)
- Potential undefined behavior if buffer alignment is wrong
Impact: May produce incorrect audio on big-endian systems.
Suggestion: Add explicit little-endian encoding/decoding or document platform requirements.
AUD-002: Rust opusrt missing OGG Reader/Writer
Description:
Go opusrt has OggReader and OggWriter for reading/writing Opus in OGG containers. Rust implementation is missing these.
Impact: Cannot read/write Opus files in OGG format in Rust.
Status: ⚠️ Partial implementation.
AUD-003: Rust missing portaudio module
Description:
Go has audio/portaudio for audio device I/O. Rust has no equivalent.
Impact: Cannot capture/play audio from hardware devices in Rust.
Status: ❌ Not implemented.
🟡 Minor Issues
AUD-004: Go Format panics on invalid value
File: go/pkg/audio/pcm/pcm.go:36-38
Description:
Format.SampleRate(), Channels(), Depth() all panic on invalid format:
func (f Format) SampleRate() int {
switch f {
// ...
}
panic("pcm: invalid audio type")
}
Impact: Runtime panic instead of error return.
Suggestion: Return (int, error) or use MustXxx naming convention for panicking versions.
AUD-005: Go SilenceChunk uses fixed global buffer
File: go/pkg/audio/pcm/pcm.go:177
Description:
Uses a shared fixed-size zero buffer and loops for long durations.
Impact: None functionally; avoids repeated allocations.
Status: Not a bug. Keep as implementation note only.
AUD-006: Go Opus encoder max frame size hardcoded
File: go/pkg/audio/codec/opus/encoder.go:95
Description:
Encode function allocates fixed 4000 byte buffer:
buf := make([]byte, 4000)
Impact: Allocation on every encode call.
Suggestion: Use buffer pool or allow caller to provide buffer.
AUD-007: Rust Format re-export from resampler is confusing
File: rust/audio/src/pcm/format.rs:7
Description:
pcm::Format is actually re-exported from resampler::Format:
#![allow(unused)] fn main() { pub use crate::resampler::format::Format; }
Impact: Confusing import paths, circular dependency appearance.
Suggestion: Define Format once at top level and import in both modules.
AUD-008: Go mixer notifyWrite spawns goroutine every call
File: go/pkg/audio/pcm/mixer.go:391-405
Description:
notifyWrite() spawns a new goroutine each time:
func (mx *Mixer) notifyWrite() {
go func() {
// ...
}()
}
Impact: Goroutine overhead for every write notification.
Suggestion: Use single dedicated notification goroutine or avoid goroutine.
🔵 Enhancements
AUD-009: No stereo format support in predefined formats
Description:
Only mono formats are predefined (L16Mono16K, etc.). No stereo formats.
Suggestion: Add L16Stereo16K, L16Stereo24K, L16Stereo48K.
AUD-010: No 8-bit or 24-bit PCM support
Description:
Only 16-bit PCM is supported. Some audio sources use 8-bit (low quality) or 24-bit (high quality).
Suggestion: Add format variants for different bit depths.
AUD-011: Resampler quality not configurable
File: go/pkg/audio/resampler/soxr.go:52
Description:
Quality is hardcoded to SOXR_HQ:
qSpec := C.soxr_quality_spec(C.SOXR_HQ, 0)
Impact: Cannot trade quality for performance when needed.
Suggestion: Add quality parameter to New().
AUD-012: No WAV file support
Description:
No utilities for reading/writing WAV files, only raw PCM.
Suggestion: Add WAV header parsing/writing for file I/O.
⚪ Notes
AUD-013: CGO/FFI dependency complexity
Description:
Both Go and Rust rely heavily on CGO/FFI for native codec libraries. This adds:
- Build complexity (pkg-config, Bazel rules)
- Platform-specific issues
- Memory management concerns
Status: Necessary for performance, but increases maintenance burden.
AUD-014: Mixer uses float32 internally
Description:
Mixer converts int16 PCM to float32 for mixing, then back to int16:
// int16 → float32
s := float32(trackI16[i])
// ... mix ...
// float32 → int16
i16[i] = int16(t * 32767)
Impact: Slight precision loss during mixing, but standard practice.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| AUD-001 | 🟠 Major | Open | Go Mixer |
| AUD-002 | 🟠 Major | Open | Rust opusrt |
| AUD-003 | 🟠 Major | Open | Rust |
| AUD-004 | 🟡 Minor | Open | Go Format |
| AUD-005 | ⚪ Note | N/A | Go SilenceChunk |
| AUD-006 | 🟡 Minor | Open | Go Opus |
| AUD-007 | 🟡 Minor | Open | Rust Format |
| AUD-008 | 🟡 Minor | Open | Go Mixer |
| AUD-009 | 🔵 Enhancement | Open | Both |
| AUD-010 | 🔵 Enhancement | Open | Both |
| AUD-011 | 🔵 Enhancement | Open | Go |
| AUD-012 | 🔵 Enhancement | Open | Both |
| AUD-013 | ⚪ Note | N/A | Both |
| AUD-014 | ⚪ Note | N/A | Go |
Overall: Functional audio processing with significant native library integration. Main gaps are Rust parity (opusrt OGG, portaudio) and some unsafe code patterns.
MiniMax SDK
Go and Rust SDK for the MiniMax AI platform API.
Official API Documentation: api/README.md
Design Goals
- Full API Coverage: Support all MiniMax API capabilities
- Idiomatic Language Design: Natural Go/Rust patterns
- Streaming Support: First-class support for streaming responses
- Async Task Handling: Convenient polling for long-running operations
API Coverage
| API Feature | Go | Rust | Official Doc |
|---|---|---|---|
| Text Generation (Chat) | ✅ | ✅ | api/text.md |
| Sync Speech (T2A) | ✅ | ✅ | api/speech-t2a.md |
| Async Speech (Long Text) | ✅ | ✅ | api/speech-t2a-async.md |
| Voice Cloning | ✅ | ✅ | api/voice-cloning.md |
| Voice Design | ✅ | ✅ | api/voice-design.md |
| Voice Management | ✅ | ✅ | api/voice-management.md |
| Video Generation | ✅ | ✅ | api/video.md |
| Video Agent | ✅ | ✅ | api/video-agent.md |
| Image Generation | ✅ | ✅ | api/image.md |
| Music Generation | ✅ | ✅ | api/music.md |
| File Management | ✅ | ✅ | api/file.md |
Architecture
graph TB
subgraph client["Client"]
subgraph services1[" "]
text[Text Service]
speech[Speech Service]
voice[Voice Service]
video[Video Service]
end
subgraph services2[" "]
image[Image Service]
music[Music Service]
file[File Service]
end
end
subgraph http["HTTP Client"]
retry[Retry]
auth[Auth]
error[Error Handling]
end
client --> http
http --> api["https://api.minimaxi.com"]
Services
| Service | Description |
|---|---|
Text | Chat completion, streaming, tool calls |
Speech | TTS sync/stream, async long-text |
Voice | List voices, clone, design |
Video | Text-to-video, image-to-video, agent |
Image | Text-to-image, image reference |
Music | Music generation from lyrics |
File | Upload, list, retrieve, delete files |
Authentication
Uses Bearer token authentication:
Authorization: Bearer <api_key>
API keys are obtained from MiniMax Platform.
Base URLs
| Region | URL |
|---|---|
| China (Default) | https://api.minimaxi.com |
| Global | https://api.minimaxi.chat |
Response Patterns
Synchronous
Direct response with data.
Streaming
SSE (Server-Sent Events) for real-time data:
- Text: Token-by-token chat responses
- Speech: Audio chunk streaming
Async Tasks
For long-running operations (video, async speech):
sequenceDiagram
participant C as Client
participant S as Server
C->>S: Create Task
S-->>C: task_id
loop Poll Status
C->>S: Query Status
S-->>C: Pending/Running
end
S-->>C: Success + Result
Error Handling
All errors include:
status_code: Numeric error codestatus_msg: Human-readable message
Common error codes:
1000: General error1001: Rate limit exceeded1002: Invalid parameters1004: Authentication failed
Examples Directory
examples/go/minimax/- Go SDK examplesexamples/rust/minimax/- Rust SDK examplesexamples/cmd/minimax/- CLI test scripts
Related
- CLI tool:
go/cmd/minimax/ - CLI tests:
examples/cmd/minimax/
MiniMax 开放平台 API 文档
官方文档: MiniMax 开放平台文档中心
最后更新: 2026-01-19
注意: 本文档基于官方文档整理,如有更新请参考官方文档
官方文档导航
如果本文档信息不完整或需要最新信息,请访问以下官方链接:
| 功能模块 | 官方文档链接 |
|---|---|
| 接口概览 | https://platform.minimaxi.com/docs/api-reference/api-overview |
| 文本生成 (Anthropic) | https://platform.minimaxi.com/docs/api-reference/text-anthropic-api |
| 文本生成 (OpenAI) | https://platform.minimaxi.com/docs/api-reference/text-openai-api |
| 同步语音合成 HTTP | https://platform.minimaxi.com/docs/api-reference/speech-t2a-http |
| 同步语音合成 WebSocket | https://platform.minimaxi.com/docs/api-reference/speech-t2a-ws |
| 异步长文本语音合成 | https://platform.minimaxi.com/docs/api-reference/speech-t2a-async |
| 音色快速复刻 | https://platform.minimaxi.com/docs/api-reference/speech-voice-cloning |
| 音色设计 | https://platform.minimaxi.com/docs/api-reference/speech-voice-design |
| 声音管理 | https://platform.minimaxi.com/docs/api-reference/speech-voice-management |
| 视频生成 | https://platform.minimaxi.com/docs/api-reference/video-generation |
| 视频生成 Agent | https://platform.minimaxi.com/docs/api-reference/video-generation-agent |
| 图片生成 | https://platform.minimaxi.com/docs/api-reference/image-generation |
| 音乐生成 | https://platform.minimaxi.com/docs/api-reference/music-generation |
| 文件管理 | https://platform.minimaxi.com/docs/api-reference/file-management |
| 错误码查询 | https://platform.minimaxi.com/docs/api-reference/error-code |
如何获取最新文档
方法一:直接访问官网
访问 MiniMax 开放平台文档中心,左侧导航栏包含所有 API 接口的详细文档。
方法二:使用 AI 工具读取
如果使用支持浏览器功能的 AI 工具(如 Cursor),可以:
- 使用
browser_navigate工具访问官方文档页面 - 使用
browser_snapshot获取页面内容 - 解析页面中的 API 参数、请求/响应格式等信息
示例:
访问: https://platform.minimaxi.com/docs/api-reference/speech-t2a-http
方法三:查看官方 MCP 服务器
MiniMax 提供了官方的 MCP(Model Context Protocol)服务器实现,包含完整的 API 调用示例:
- Python 版本: https://github.com/MiniMax-AI/MiniMax-MCP
- JavaScript 版本: https://github.com/MiniMax-AI/MiniMax-MCP-JS
关于 OpenAPI/Swagger
MiniMax 目前没有公开提供 OpenAPI/Swagger 规范文件。如需获取,可以:
- 联系官方技术支持: api-support@minimaxi.com
- 基于官方文档手动整理
概述
MiniMax 开放平台提供多模态 AI 能力,包括文本生成、语音合成、视频生成、图像生成、音乐生成等。
API 能力概览
| 能力模块 | 说明 | 文档链接 |
|---|---|---|
| 文本生成 | 对话内容生成、工具调用 | text.md |
| 同步语音合成 (T2A) | 短文本语音合成,支持 HTTP/WebSocket | speech-t2a.md |
| 异步长文本语音合成 | 长文本语音合成,异步任务模式 | speech-t2a-async.md |
| 音色快速复刻 | 上传音频复刻音色 | voice-cloning.md |
| 音色设计 | 基于描述生成个性化音色 | voice-design.md |
| 声音管理 | 查询和管理可用音色 | voice-management.md |
| 视频生成 | 文生视频、图生视频 | video.md |
| 视频生成 Agent | 基于模板的视频生成 | video-agent.md |
| 图片生成 | 文生图、图生图 | image.md |
| 音乐生成 | 基于描述和歌词生成音乐 | music.md |
| 文件管理 | 文件上传、下载、管理 | file.md |
认证方式
所有 API 使用 Bearer Token 认证:
Authorization: Bearer <your_api_key>
获取 API Key
- 按量付费: 在「账户管理 > 接口密钥」中创建 API Key,支持所有模态模型
- Coding Plan: 创建 Coding Plan Key,仅支持文本模型
基础 URL
| 地址类型 | URL |
|---|---|
| 主要地址 | https://api.minimaxi.com |
| 备用地址 | https://api-bj.minimaxi.com |
请求头
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| Authorization | string | 是 | Bearer <api_key> |
| Content-Type | string | 是 | application/json |
错误处理
API 返回的错误响应格式:
{
"base_resp": {
"status_code": 1000,
"status_msg": "error message"
}
}
官方资源
- MiniMax 开放平台
- 官方 MCP 服务器 (Python)
- 官方 MCP 服务器 (JavaScript)
- 技术支持邮箱: api-support@minimaxi.com
MiniMax SDK - Go Implementation
Import: github.com/haivivi/giztoy/pkg/minimax
Client
type Client struct {
Text *TextService
Speech *SpeechService
Voice *VoiceService
Video *VideoService
Image *ImageService
Music *MusicService
File *FileService
}
Constructor:
// Basic
client := minimax.NewClient("api-key")
// With options
client := minimax.NewClient("api-key",
minimax.WithBaseURL(minimax.BaseURLGlobal),
minimax.WithRetry(5),
minimax.WithHTTPClient(&http.Client{Timeout: 60*time.Second}),
)
Options:
| Option | Description |
|---|---|
WithBaseURL(url) | Custom API base URL |
WithRetry(n) | Max retry count (default: 3) |
WithHTTPClient(c) | Custom http.Client |
Services
TextService
// Synchronous
resp, err := client.Text.CreateChatCompletion(ctx, &minimax.ChatCompletionRequest{
Model: "MiniMax-M2.1",
Messages: []minimax.Message{
{Role: "user", Content: "Hello!"},
},
})
// Streaming (Go 1.23+ iter.Seq2)
for chunk, err := range client.Text.CreateChatCompletionStream(ctx, req) {
if err != nil {
return err
}
fmt.Print(chunk.Choices[0].Delta.Content)
}
SpeechService
// Synchronous
resp, err := client.Speech.Synthesize(ctx, &minimax.SpeechRequest{
Model: "speech-2.6-hd",
Text: "Hello, world!",
VoiceSetting: &minimax.VoiceSetting{
VoiceID: "male-qn-qingse",
},
})
// resp.Audio contains decoded audio bytes
// Streaming
for chunk, err := range client.Speech.SynthesizeStream(ctx, req) {
if err != nil {
return err
}
buf.Write(chunk.Audio)
}
// Async (long text)
task, err := client.Speech.CreateAsyncTask(ctx, &minimax.AsyncSpeechRequest{
Model: "speech-2.6-hd",
Text: longText,
// ...
})
result, err := task.Wait(ctx)
VoiceService
// List voices
voices, err := client.Voice.List(ctx)
// Clone voice
resp, err := client.Voice.Clone(ctx, &minimax.VoiceCloneRequest{
FileID: "uploaded-file-id",
VoiceID: "my-cloned-voice",
})
// Design voice
resp, err := client.Voice.Design(ctx, &minimax.VoiceDesignRequest{
Prompt: "A warm female voice...",
PreviewText: "Hello, how can I help?",
})
VideoService
// Text to video
task, err := client.Video.CreateTextToVideo(ctx, &minimax.TextToVideoRequest{
Model: "video-01",
Prompt: "A cat playing piano",
})
result, err := task.Wait(ctx)
// result.FileID contains the video file ID
// Image to video
task, err := client.Video.CreateImageToVideo(ctx, &minimax.ImageToVideoRequest{
Model: "video-01",
FirstFrameImage: "https://...",
})
ImageService
resp, err := client.Image.Generate(ctx, &minimax.ImageGenerateRequest{
Model: "image-01",
Prompt: "A beautiful sunset",
})
// resp.Data[0].URL or resp.Data[0].B64JSON
MusicService
task, err := client.Music.Generate(ctx, &minimax.MusicRequest{
Prompt: "upbeat pop song",
Lyrics: "[Verse]\nHello world...",
})
result, err := task.Wait(ctx)
FileService
// Upload
resp, err := client.File.Upload(ctx, filePath, minimax.FilePurposeVoiceClone)
// List
files, err := client.File.List(ctx, &minimax.FileListRequest{
Purpose: minimax.FilePurposeVoiceClone,
})
// Download
data, err := client.File.Download(ctx, fileID)
// Delete
err := client.File.Delete(ctx, fileID)
Task Polling
task, err := client.Video.CreateTextToVideo(ctx, req)
if err != nil {
return err
}
// Default 5s interval
result, err := task.Wait(ctx)
// Custom interval
result, err := task.WaitWithInterval(ctx, 10*time.Second)
// Manual polling
status, err := task.Query(ctx)
if status.Status == minimax.TaskStatusSuccess {
// ...
}
Error Handling
resp, err := client.Text.CreateChatCompletion(ctx, req)
if err != nil {
if e, ok := minimax.AsError(err); ok {
fmt.Printf("API Error: %d - %s\n", e.StatusCode, e.StatusMsg)
if e.IsRateLimit() {
// Wait and retry
}
}
return err
}
Streaming Internals
Uses SSE (Server-Sent Events):
iter.Seq2[T, error]for Go 1.23+ range loops- Auto-reconnect on transient errors (based on retry config)
- Hex audio decoding for speech streams
MiniMax SDK - Rust Implementation
Crate: giztoy-minimax
Client
#![allow(unused)] fn main() { pub struct Client { http: Arc<HttpClient>, config: ClientConfig, } }
Constructor:
#![allow(unused)] fn main() { use giztoy_minimax::{Client, BASE_URL_GLOBAL}; // Basic let client = Client::new("api-key")?; // With builder let client = Client::builder("api-key") .base_url(BASE_URL_GLOBAL) .max_retries(5) .build()?; }
Builder Methods:
| Method | Description |
|---|---|
base_url(url) | Custom API base URL |
max_retries(n) | Max retry count (default: 3) |
Services
Services are accessed via getter methods (returns new instance each call):
#![allow(unused)] fn main() { client.text() // TextService client.speech() // SpeechService client.voice() // VoiceService client.video() // VideoService client.image() // ImageService client.music() // MusicService client.file() // FileService }
TextService
#![allow(unused)] fn main() { use giztoy_minimax::{ChatCompletionRequest, Message}; // Synchronous let resp = client.text().create_chat_completion(&ChatCompletionRequest { model: "MiniMax-M2.1".to_string(), messages: vec![ Message { role: "user".to_string(), content: "Hello!".to_string() }, ], ..Default::default() }).await?; // Streaming let stream = client.text().create_chat_completion_stream(&req).await?; while let Some(chunk) = stream.next().await { let chunk = chunk?; if let Some(choice) = chunk.choices.first() { print!("{}", choice.delta.content); } } }
SpeechService
#![allow(unused)] fn main() { use giztoy_minimax::{SpeechRequest, VoiceSetting}; // Synchronous let resp = client.speech().synthesize(&SpeechRequest { model: "speech-2.6-hd".to_string(), text: "Hello, world!".to_string(), voice_setting: Some(VoiceSetting { voice_id: "male-qn-qingse".to_string(), ..Default::default() }), ..Default::default() }).await?; // resp.audio contains decoded bytes // Streaming let stream = client.speech().synthesize_stream(&req).await?; while let Some(chunk) = stream.next().await { let chunk = chunk?; if let Some(audio) = chunk.audio { buf.extend(&audio); } } // Async (long text) let task = client.speech().create_async_task(&AsyncSpeechRequest { // ... }).await?; let result = task.wait().await?; }
VoiceService
#![allow(unused)] fn main() { // List voices let voices = client.voice().list().await?; // Clone voice let resp = client.voice().clone(&VoiceCloneRequest { file_id: "uploaded-file-id".to_string(), voice_id: "my-cloned-voice".to_string(), }).await?; // Design voice let resp = client.voice().design(&VoiceDesignRequest { prompt: "A warm female voice...".to_string(), preview_text: "Hello, how can I help?".to_string(), ..Default::default() }).await?; }
VideoService
#![allow(unused)] fn main() { // Text to video let task = client.video().create_text_to_video(&TextToVideoRequest { model: "video-01".to_string(), prompt: "A cat playing piano".to_string(), ..Default::default() }).await?; let result = task.wait().await?; // Image to video let task = client.video().create_image_to_video(&ImageToVideoRequest { model: "video-01".to_string(), first_frame_image: "https://...".to_string(), ..Default::default() }).await?; }
ImageService
#![allow(unused)] fn main() { let resp = client.image().generate(&ImageGenerateRequest { model: "image-01".to_string(), prompt: "A beautiful sunset".to_string(), ..Default::default() }).await?; }
MusicService
#![allow(unused)] fn main() { let task = client.music().generate(&MusicRequest { prompt: "upbeat pop song".to_string(), lyrics: "[Verse]\nHello world...".to_string(), ..Default::default() }).await?; let result = task.wait().await?; }
FileService
#![allow(unused)] fn main() { // Upload let resp = client.file().upload(file_path, FilePurpose::VoiceClone).await?; // List let files = client.file().list(Some(FilePurpose::VoiceClone)).await?; // Download let data = client.file().download(&file_id).await?; // Delete client.file().delete(&file_id).await?; }
Task Polling
#![allow(unused)] fn main() { let task = client.video().create_text_to_video(&req).await?; // Default interval let result = task.wait().await?; // Custom interval let result = task.wait_with_interval(Duration::from_secs(10)).await?; // Manual polling let status = task.query().await?; if status.status == TaskStatus::Success { // ... } }
Error Handling
#![allow(unused)] fn main() { use giztoy_minimax::{Error, Result}; match client.text().create_chat_completion(&req).await { Ok(resp) => { /* ... */ } Err(Error::Api { status_code, status_msg }) => { eprintln!("API Error: {} - {}", status_code, status_msg); } Err(Error::Http(e)) => { eprintln!("HTTP Error: {}", e); } Err(e) => { eprintln!("Error: {}", e); } } }
HasModel Trait
For default model handling:
#![allow(unused)] fn main() { pub trait HasModel { fn model(&self) -> &str; fn set_model(&mut self, model: impl Into<String>); fn default_model() -> &'static str; fn apply_default_model(&mut self); } }
Differences from Go
| Feature | Go | Rust |
|---|---|---|
| Client construction | NewClient() (panic on empty key) | Client::new() (returns Result) |
| Service access | Direct fields (client.Text) | Getter methods (client.text()) |
| Streaming | iter.Seq2[T, error] | Stream<Item=Result<T>> |
| Options | Functional options | Builder pattern |
| Error type | *Error with helper methods | Error enum |
MiniMax SDK - Known Issues
🟡 Minor Issues
MMX-001: Go NewClient panics on empty API key
File: go/pkg/minimax/client.go:100-102
Description:
NewClient panics instead of returning an error:
func NewClient(apiKey string, opts ...Option) *Client {
if apiKey == "" {
panic("minimax: apiKey must be non-empty")
}
Impact: Unrecoverable error at construction time.
Suggestion: Return (*Client, error) or use builder pattern like Rust.
MMX-002: Rust services created on each call
File: rust/minimax/src/client.rs:91-123
Description:
Service getters create new instances each call:
#![allow(unused)] fn main() { pub fn speech(&self) -> SpeechService { SpeechService::new(self.http.clone()) } }
Impact: Arc clone overhead on each service access.
Suggestion: Cache services or use &self references.
MMX-003: Go streaming uses hex encoding
File: go/pkg/minimax/speech.go:51-56
Description:
Audio data comes hex-encoded from API, decoded in SDK:
if apiResp.Data.Audio != "" {
audio, err := decodeHexAudio(apiResp.Data.Audio)
Impact: CPU overhead for decoding, 2x memory during decode.
Note: This is API design, not SDK issue, but worth documenting.
MMX-004: No request timeout option
Description:
Both Go and Rust SDKs don't have request-level timeout option. Go suggests using context.WithTimeout, Rust doesn't document timeout handling.
Suggestion: Add timeout option or document clearly.
MMX-005: Go iter.Seq2 requires Go 1.23+
File: go/pkg/minimax/speech.go:78
Description:
Streaming uses iter.Seq2 which requires Go 1.23:
func (s *SpeechService) SynthesizeStream(ctx context.Context, req *SpeechRequest) iter.Seq2[*SpeechChunk, error]
Impact: Not compatible with older Go versions.
Note: Modern API choice, acceptable trade-off.
MMX-006: Error handling inconsistency
Description:
Go uses AsError() helper function, Rust uses error enum matching.
Go:
if e, ok := minimax.AsError(err); ok {
if e.IsRateLimit() { ... }
}
Rust:
#![allow(unused)] fn main() { match err { Error::Api { status_code, .. } => { ... } } }
Impact: Different patterns between languages.
🔵 Enhancements
MMX-007: No WebSocket TTS support
Description:
Official API supports WebSocket for TTS (/v1/t2a_ws), but SDK only implements HTTP.
Suggestion: Add WebSocket-based streaming TTS for lower latency.
MMX-008: No request validation
Description:
No client-side validation before sending requests. Invalid parameters only fail after API call.
Suggestion: Add validation for known constraints (text length, model names, etc.).
MMX-009: No retry backoff configuration
Description:
Retry count is configurable, but backoff strategy is hardcoded.
Suggestion: Add configurable backoff (exponential, jitter).
MMX-010: No request/response logging
Description:
No built-in debug logging for API requests/responses.
Suggestion: Add optional logging middleware or debug mode.
MMX-011: No rate limit handling
Description:
Rate limit errors are returned but not automatically handled (e.g., exponential backoff, queue).
Suggestion: Add optional rate limit handling.
⚪ Notes
MMX-012: Full API coverage achieved
Description:
Both Go and Rust SDKs implement all documented MiniMax API endpoints:
- Text generation
- Speech synthesis (sync, stream, async)
- Voice management (list, clone, design)
- Video generation (text-to-video, image-to-video, agent)
- Image generation
- Music generation
- File management
MMX-013: Async task pattern
Description:
Long-running operations (video, async speech, music) use a consistent pattern:
- Create task → returns
Task[T] - Call
task.Wait()for automatic polling - Or manual
task.Query()for custom logic
This is a well-designed abstraction.
MMX-014: Base URL handling
Description:
Both SDKs support China and Global endpoints:
- China:
https://api.minimaxi.com - Global:
https://api.minimaxi.chat
Correctly defaulting to China URL.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| MMX-001 | 🟡 Minor | Open | Go Client |
| MMX-002 | 🟡 Minor | Open | Rust Client |
| MMX-003 | 🟡 Minor | Note | Both |
| MMX-004 | 🟡 Minor | Open | Both |
| MMX-005 | 🟡 Minor | Note | Go |
| MMX-006 | 🟡 Minor | Open | Both |
| MMX-007 | 🔵 Enhancement | Open | Both |
| MMX-008 | 🔵 Enhancement | Open | Both |
| MMX-009 | 🔵 Enhancement | Open | Both |
| MMX-010 | 🔵 Enhancement | Open | Both |
| MMX-011 | 🔵 Enhancement | Open | Both |
| MMX-012 | ⚪ Note | N/A | Both |
| MMX-013 | ⚪ Note | N/A | Both |
| MMX-014 | ⚪ Note | N/A | Both |
Overall: Well-implemented SDK with full API coverage. Both Go and Rust implementations are feature-complete and production-ready. Minor issues are mostly design choices rather than bugs.
DashScope SDK
Go and Rust SDK for Aliyun DashScope (百炼 Model Studio) APIs.
Official API Documentation: api/README.md
Design Goals
- Realtime Focus: Primarily implement Qwen-Omni-Realtime WebSocket API
- OpenAI Compatibility: Text/chat APIs use OpenAI-compatible SDK
- Native WebSocket: Direct WebSocket implementation, not polling
Scope
This SDK focuses on Qwen-Omni-Realtime API for real-time multimodal conversation. For standard text APIs, use OpenAI-compatible SDKs.
| API | SDK Coverage | Alternative |
|---|---|---|
| Text Chat | ❌ | OpenAI SDK with custom base URL |
| App/Agent | ❌ | Direct HTTP calls |
| Realtime | ✅ | This SDK |
API Coverage
| Feature | Go | Rust | Official Doc |
|---|---|---|---|
| Realtime Session | ✅ | ✅ | api/realtime/ |
| Audio Input/Output | ✅ | ✅ | |
| Function Calls | ✅ | ✅ | |
| Text Input | ✅ | ✅ | |
| Video Input | ⚠️ | ⚠️ | Limited support |
Architecture
graph TB
subgraph client["Client"]
subgraph realtime["RealtimeService"]
session["RealtimeSession"]
send["Send Events"]
recv["Receive Events"]
end
end
client --> ws["wss://dashscope.aliyuncs.com<br/>/api-ws/v1/realtime"]
Authentication
Authorization: Bearer <api_key>
Optional workspace isolation:
X-DashScope-WorkSpace: <workspace_id>
Base URLs
| Region | WebSocket URL |
|---|---|
| China (Beijing) | wss://dashscope.aliyuncs.com/api-ws/v1/realtime |
| International (Singapore) | wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
Models
| Model | Input | Output | Sample Rate |
|---|---|---|---|
qwen-omni-turbo-realtime | audio/text | audio/text | 16kHz |
qwen3-omni-flash-realtime | audio/text/video | audio/text | 24kHz |
Event Flow
sequenceDiagram
participant C as Client
participant S as Server
C->>S: session.update
C->>S: input_audio_buffer.append
C->>S: response.create
S-->>C: response.audio.delta
S-->>C: response.audio.delta
S-->>C: response.done
Examples Directory
examples/go/dashscope/- Go SDK examplesexamples/cmd/dashscope/- CLI test scripts
For Text/Chat APIs
Use OpenAI-compatible SDK:
Go:
import "github.com/sashabaranov/go-openai"
config := openai.DefaultConfig(apiKey)
config.BaseURL = "https://dashscope.aliyuncs.com/compatible-mode/v1"
client := openai.NewClientWithConfig(config)
Rust:
#![allow(unused)] fn main() { // Use async-openai with custom base URL }
Related
- CLI tool:
go/cmd/dashscope/ - CLI tests:
examples/cmd/dashscope/
DashScope (阿里云百炼) API 文档
原始文档
| 文档 | 链接 |
|---|---|
| 百炼平台首页 | https://help.aliyun.com/zh/model-studio/ |
| API 参考 | https://help.aliyun.com/zh/model-studio/qwen-api-reference |
| 开通服务 | https://help.aliyun.com/zh/dashscope/opening-service |
| 获取 API Key | https://help.aliyun.com/zh/model-studio/get-api-key |
| 模型列表 | https://help.aliyun.com/zh/model-studio/model-list |
如果本文档信息不完整,请访问上述链接获取最新内容。
概述
DashScope 是阿里云大模型服务平台百炼(Model Studio)提供的 API 服务。支持:
- 文本生成 - 通义千问(Qwen)系列大语言模型,兼容 OpenAI API
- 多模态 - 图像理解、音频理解
- 实时对话 - Qwen-Omni-Realtime 实时音频/视频对话
- 智能体应用 - 调用已配置的 Agent/工作流应用
- 知识库 - 文档上传、索引、检索增强生成(RAG)
目录结构
docs/dashscope/
├── README.md # 本文档 - 概述
├── auth.md # 认证与鉴权
├── text.md # 文本模型 API (Qwen)
├── app.md # 应用调用 API
└── realtime/ # 实时多模态 API
├── README.md # 概述
├── client-events.md # 客户端事件
└── server-events.md # 服务端事件
服务端点
HTTP API
| 地域 | 端点 | 用途 |
|---|---|---|
| 北京(中国大陆) | https://dashscope.aliyuncs.com/compatible-mode/v1 | OpenAI 兼容 |
| 新加坡(国际) | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 | OpenAI 兼容 |
| 弗吉尼亚(美国) | https://dashscope-us.aliyuncs.com/compatible-mode/v1 | OpenAI 兼容 |
WebSocket API
| 地域 | 端点 | 用途 |
|---|---|---|
| 北京 | wss://dashscope.aliyuncs.com/api-ws/v1/realtime | 实时对话 |
| 新加坡 | wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime | 实时对话 |
应用 API
POST https://dashscope.aliyuncs.com/api/v1/apps/{APP_ID}/completion
支持的模型
文本模型 (Qwen)
| 模型 | 上下文 | 特点 |
|---|---|---|
| qwen-turbo | 128K | 快速响应,性价比高 |
| qwen-plus | 128K | 平衡性能与成本 |
| qwen-max | 32K | 最强能力 |
| qwen-long | 1M | 超长上下文 |
多模态模型
| 模型 | 能力 |
|---|---|
| qwen-vl-plus | 视觉理解 |
| qwen-vl-max | 视觉理解(强化版) |
| qwen-audio-turbo | 音频理解 |
实时多模态模型
| 模型 | 输出格式 | 默认音色 |
|---|---|---|
| Qwen3-Omni-Flash-Realtime | pcm24 | Cherry |
| Qwen-Omni-Turbo-Realtime | pcm16 | Chelsie |
快速开始
1. 获取 API Key
- 登录 百炼控制台
- 进入"密钥管理"
- 创建 API Key
2. 设置环境变量
export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxx"
3. 调用示例
curl https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'
详细文档
| 文档 | 说明 |
|---|---|
| 认证与鉴权 | API Key 管理、权限控制、工作空间 |
| 文本模型 API | Qwen 系列模型、OpenAI 兼容接口 |
| 应用调用 API | 智能体应用、工作流、知识库检索 |
| 实时多模态 | Qwen-Omni-Realtime 实时语音对话 |
SDK
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
Go (go-openai)
import "github.com/sashabaranov/go-openai"
config := openai.DefaultConfig(os.Getenv("DASHSCOPE_API_KEY"))
config.BaseURL = "https://dashscope.aliyuncs.com/compatible-mode/v1"
client := openai.NewClientWithConfig(config)
Go (giztoy/dashscope) - Realtime API
本项目提供了原生 Go SDK 支持 Qwen-Omni-Realtime API:
import "github.com/haivivi/giztoy/pkg/dashscope"
client := dashscope.NewClient(os.Getenv("DASHSCOPE_API_KEY"))
session, err := client.Realtime.Connect(ctx, &dashscope.RealtimeConfig{
Model: dashscope.ModelQwenOmniTurboRealtimeLatest,
})
// 发送音频、接收事件...
CLI 工具: bazel run //go/cmd/dashscope -- omni chat
官方 SDK
- Python:
pip install dashscope - Java: Maven 依赖
com.alibaba:dashscope-sdk
DashScope SDK - Go Implementation
Import: github.com/haivivi/giztoy/pkg/dashscope
Client
type Client struct {
Realtime *RealtimeService
}
Constructor:
// Basic
client := dashscope.NewClient("sk-xxxxxxxx")
// With workspace
client := dashscope.NewClient("sk-xxxxxxxx",
dashscope.WithWorkspace("ws-xxxxxxxx"),
)
// Custom endpoint (international)
client := dashscope.NewClient("sk-xxxxxxxx",
dashscope.WithBaseURL("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"),
)
Options:
| Option | Description |
|---|---|
WithWorkspace(id) | Workspace ID for isolation |
WithBaseURL(url) | Custom WebSocket URL |
WithHTTPBaseURL(url) | Custom HTTP URL |
WithHTTPClient(client) | Custom HTTP client |
RealtimeService
Connect Session
session, err := client.Realtime.Connect(ctx, &dashscope.RealtimeConfig{
Model: dashscope.ModelQwenOmniTurboRealtimeLatest,
})
if err != nil {
log.Fatal(err)
}
defer session.Close()
Send Events
// Update session configuration
session.UpdateSession(&dashscope.SessionUpdate{
Modalities: []string{"text", "audio"},
Voice: "Cherry",
InputAudioFormat: "pcm16",
OutputAudioFormat: "pcm16",
})
// Append audio data
session.AppendAudio(audioData)
// Commit audio (finalize input)
session.CommitAudio()
// Create response (start inference)
session.CreateResponse()
// Send text
session.AppendText("Hello!")
// Cancel response
session.CancelResponse()
Receive Events
// Using Go 1.23+ iter.Seq2
for event, err := range session.Events() {
if err != nil {
log.Fatal(err)
}
switch event.Type {
case dashscope.EventResponseAudioDelta:
// Audio chunk received
play(event.Delta)
case dashscope.EventResponseTextDelta:
// Text chunk received
fmt.Print(event.Delta)
case dashscope.EventResponseDone:
// Response complete
case dashscope.EventError:
// Error occurred
log.Printf("Error: %s", event.Error.Message)
}
}
Events
Client Events (Send)
| Event Type | Description |
|---|---|
session.update | Update session configuration |
input_audio_buffer.append | Append audio data |
input_audio_buffer.commit | Finalize audio input |
response.create | Request response |
response.cancel | Cancel current response |
Server Events (Receive)
| Event Type | Description |
|---|---|
session.created | Session established |
session.updated | Configuration updated |
response.created | Response started |
response.audio.delta | Audio chunk |
response.text.delta | Text chunk |
response.done | Response complete |
error | Error occurred |
Models
const (
ModelQwenOmniTurboRealtimeLatest = "qwen-omni-turbo-realtime-latest"
ModelQwen3OmniFlashRealtimeLatest = "qwen3-omni-flash-realtime-latest"
)
Error Handling
for event, err := range session.Events() {
if err != nil {
// Connection error
log.Fatal(err)
}
if event.Type == dashscope.EventError {
// API error
log.Printf("API Error [%s]: %s", event.Error.Code, event.Error.Message)
}
}
Complete Example
func main() {
client := dashscope.NewClient(os.Getenv("DASHSCOPE_API_KEY"))
session, err := client.Realtime.Connect(context.Background(), &dashscope.RealtimeConfig{
Model: dashscope.ModelQwenOmniTurboRealtimeLatest,
})
if err != nil {
log.Fatal(err)
}
defer session.Close()
// Configure session
session.UpdateSession(&dashscope.SessionUpdate{
Voice: "Cherry",
})
// Send audio (from microphone, etc.)
session.AppendAudio(audioData)
session.CommitAudio()
session.CreateResponse()
// Receive and play response
for event, err := range session.Events() {
if err != nil {
break
}
if event.Type == dashscope.EventResponseAudioDelta {
player.Write(event.Delta)
}
}
}
DashScope SDK - Rust Implementation
Crate: giztoy-dashscope
Client
#![allow(unused)] fn main() { pub struct Client { // Internal configuration } impl Client { pub fn realtime(&self) -> RealtimeService; } }
Constructor:
#![allow(unused)] fn main() { use giztoy_dashscope::{Client, DEFAULT_REALTIME_URL}; // Basic let client = Client::new("sk-xxxxxxxx")?; // With builder let client = Client::builder("sk-xxxxxxxx") .workspace("ws-xxxxxxxx") .base_url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") .build()?; }
RealtimeService
Connect Session
#![allow(unused)] fn main() { use giztoy_dashscope::{RealtimeConfig, ModelQwenOmniTurboRealtimeLatest}; let session = client.realtime().connect(&RealtimeConfig { model: ModelQwenOmniTurboRealtimeLatest.to_string(), ..Default::default() }).await?; }
Send Events
#![allow(unused)] fn main() { // Update session session.update_session(&SessionUpdate { modalities: vec!["text".to_string(), "audio".to_string()], voice: Some("Cherry".to_string()), ..Default::default() }).await?; // Append audio session.append_audio(&audio_data).await?; // Commit audio session.commit_audio().await?; // Create response session.create_response().await?; }
Receive Events
#![allow(unused)] fn main() { use giztoy_dashscope::ServerEvent; while let Some(event) = session.recv().await { let event = event?; match event { ServerEvent::ResponseAudioDelta { delta, .. } => { // Play audio player.write(&delta)?; } ServerEvent::ResponseTextDelta { delta, .. } => { // Print text print!("{}", delta); } ServerEvent::ResponseDone { .. } => { // Complete break; } ServerEvent::Error { error } => { eprintln!("Error: {}", error.message); } _ => {} } } }
Events
Client Events (Send)
#![allow(unused)] fn main() { pub enum ClientEvent { SessionUpdate(SessionUpdate), InputAudioBufferAppend { audio: Vec<u8> }, InputAudioBufferCommit, ResponseCreate(ResponseCreateOptions), ResponseCancel, } }
Server Events (Receive)
#![allow(unused)] fn main() { pub enum ServerEvent { SessionCreated { session: SessionInfo }, SessionUpdated { session: SessionInfo }, ResponseCreated { response: ResponseInfo }, ResponseAudioDelta { delta: Vec<u8> }, ResponseTextDelta { delta: String }, ResponseDone { response: ResponseInfo }, Error { error: ErrorInfo }, // ... more events } }
Models
#![allow(unused)] fn main() { pub const MODEL_QWEN_OMNI_TURBO_REALTIME_LATEST: &str = "qwen-omni-turbo-realtime-latest"; pub const MODEL_QWEN3_OMNI_FLASH_REALTIME_LATEST: &str = "qwen3-omni-flash-realtime-latest"; }
Error Handling
#![allow(unused)] fn main() { use giztoy_dashscope::{Error, Result}; match session.recv().await { Some(Ok(event)) => { // Process event } Some(Err(Error::WebSocket(e))) => { eprintln!("WebSocket error: {}", e); } Some(Err(Error::Api { code, message })) => { eprintln!("API error [{}]: {}", code, message); } None => { // Connection closed } } }
Complete Example
use giztoy_dashscope::{Client, RealtimeConfig, ServerEvent}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let api_key = std::env::var("DASHSCOPE_API_KEY")?; let client = Client::new(&api_key)?; let session = client.realtime().connect(&RealtimeConfig { model: "qwen-omni-turbo-realtime-latest".to_string(), ..Default::default() }).await?; // Configure session.update_session(&SessionUpdate { voice: Some("Cherry".to_string()), ..Default::default() }).await?; // Send audio session.append_audio(&audio_data).await?; session.commit_audio().await?; session.create_response().await?; // Receive response while let Some(event) = session.recv().await { match event? { ServerEvent::ResponseAudioDelta { delta, .. } => { player.write(&delta)?; } ServerEvent::ResponseDone { .. } => break, _ => {} } } Ok(()) }
Differences from Go
| Feature | Go | Rust |
|---|---|---|
| Event receiving | iter.Seq2 (sync-like) | async Stream |
| Session lifetime | Manual defer Close() | Drop trait |
| Audio encoding | []byte | Vec<u8> |
| WebSocket | gorilla/websocket | tokio-tungstenite |
DashScope SDK - Known Issues
🟡 Minor Issues
DS-001: Go NewClient panics on empty API key
File: go/pkg/dashscope/client.go:39-41
Description:
NewClient panics instead of returning an error:
func NewClient(apiKey string, opts ...Option) *Client {
if apiKey == "" {
panic("dashscope: API key is required")
}
Impact: Unrecoverable error at construction time.
Suggestion: Return (*Client, error) like Rust version.
DS-002: Limited to Realtime API only
Description:
SDK only implements Realtime API. Text/Chat APIs require separate OpenAI-compatible SDK.
Impact: Users need two SDKs for full DashScope usage.
Note: This is intentional design choice - text APIs are OpenAI-compatible.
DS-003: No HTTP API implementation
Description:
No HTTP client for non-realtime operations (file upload, app calls, etc.).
Suggestion: Add HTTP service for app/agent API calls.
DS-004: Video input support limited
Description:
Qwen3-Omni-Flash supports video input, but SDK support may be incomplete.
Status: ⚠️ Needs verification.
🔵 Enhancements
DS-005: No automatic reconnection
Description:
WebSocket sessions don't auto-reconnect on disconnection.
Suggestion: Add reconnection with backoff for long-running sessions.
DS-006: No audio transcoding
Description:
Audio must be in correct format (PCM16/PCM24). No built-in transcoding.
Suggestion: Add optional audio format conversion.
DS-007: No VAD (Voice Activity Detection) integration
Description:
Manual audio buffer management. No built-in VAD for automatic speech detection.
Suggestion: Integrate with audio/pcm for silence detection.
DS-008: Missing tool call examples
Description:
Function/tool calling is supported but not well documented with examples.
⚪ Notes
DS-009: Clean WebSocket event model
Description:
Both Go and Rust implement clean event-based model matching OpenAI Realtime API patterns. This is well-designed.
DS-010: Model constants provided
Description:
Both SDKs provide model name constants:
const ModelQwenOmniTurboRealtimeLatest = "qwen-omni-turbo-realtime-latest"
Good for discoverability and avoiding typos.
DS-011: Workspace support
Description:
Both SDKs support workspace isolation via WithWorkspace() option:
client := dashscope.NewClient(apiKey, dashscope.WithWorkspace("ws-xxx"))
Useful for enterprise environments.
DS-012: International endpoint support
Description:
SDKs support both China and international endpoints:
- China:
wss://dashscope.aliyuncs.com/... - International:
wss://dashscope-intl.aliyuncs.com/...
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| DS-001 | 🟡 Minor | Open | Go Client |
| DS-002 | 🟡 Minor | Note | Both |
| DS-003 | 🟡 Minor | Open | Both |
| DS-004 | 🟡 Minor | Open | Both |
| DS-005 | 🔵 Enhancement | Open | Both |
| DS-006 | 🔵 Enhancement | Open | Both |
| DS-007 | 🔵 Enhancement | Open | Both |
| DS-008 | 🔵 Enhancement | Open | Both |
| DS-009 | ⚪ Note | N/A | Both |
| DS-010 | ⚪ Note | N/A | Both |
| DS-011 | ⚪ Note | N/A | Both |
| DS-012 | ⚪ Note | N/A | Both |
Overall: Focused SDK for Realtime API with clean design. Main limitation is narrow scope (Realtime only), which is intentional since text APIs are OpenAI-compatible. Both Go and Rust implementations are feature-complete for their scope.
Doubao Speech SDK
Go and Rust SDK for Volcengine Doubao Speech API (豆包语音).
Official API Documentation: api/README.md
Design Goals
- Dual API Version Support: V1 (Classic) and V2/V3 (BigModel) APIs
- Multiple Auth Methods: Bearer Token, API Key, V2 API Key
- Comprehensive Coverage: TTS, ASR, Voice Clone, Realtime, Meeting, Podcast, etc.
- Streaming-first: WebSocket-based streaming for real-time scenarios
API Versions
Doubao Speech has two API generations:
| Version | Name | Features | Recommended |
|---|---|---|---|
| V1 | Classic | Basic TTS/ASR | Legacy use |
| V2/V3 | BigModel | Advanced TTS/ASR, Realtime | ✅ New projects |
API Coverage
| Feature | V1 (Classic) | V2 (BigModel) | Go | Rust |
|---|---|---|---|---|
| TTS Sync | ✅ | ✅ | ✅ | ✅ |
| TTS Stream | ✅ | ✅ | ✅ | ✅ |
| TTS Async (Long Text) | ✅ | ✅ | ✅ | ⚠️ |
| ASR One-sentence | ✅ | ✅ | ✅ | ✅ |
| ASR Stream | ✅ | ✅ | ✅ | ✅ |
| ASR File | ✅ | ✅ | ✅ | ⚠️ |
| Voice Clone | N/A | ✅ | ✅ | ✅ |
| Realtime Dialogue | N/A | ✅ | ✅ | ✅ |
| Meeting Transcription | N/A | ✅ | ✅ | ✅ |
| Podcast Synthesis | N/A | ✅ | ✅ | ✅ |
| Translation (SIMT) | N/A | ✅ | ✅ | ✅ |
| Media Subtitle | N/A | ✅ | ✅ | ✅ |
| Console API | N/A | ✅ | ✅ | ✅ |
Architecture
graph TB
subgraph client["Client"]
subgraph v1["V1 Services (Classic)"]
tts1[TTS]
asr1[ASR]
end
subgraph v2["V2 Services (BigModel)"]
tts2[TTSV2]
asr2[ASRV2]
advanced["VoiceClone<br/>Realtime<br/>Meeting<br/>Podcast<br/>Translation<br/>Media"]
end
end
subgraph console["Console Client"]
aksig["AK/SK Signature<br/>Authentication"]
end
client --> api["Volcengine API"]
console --> api
Authentication Methods
Speech API Client
| Method | Header | Use Case |
|---|---|---|
| API Key | x-api-key: {key} | Simplest, recommended |
| Bearer Token | Authorization: Bearer;{token} | V1 APIs |
| V2 API Key | X-Api-Access-Key, X-Api-App-Key | V2/V3 APIs |
Console Client
Uses Volcengine OpenAPI AK/SK signature (HMAC-SHA256).
Resource IDs (V2/V3)
| Service | Resource ID |
|---|---|
| TTS 2.0 | seed-tts-2.0 |
| TTS 2.0 Concurrent | seed-tts-2.0-concurr |
| ASR Stream | volc.bigasr.sauc.duration |
| ASR File | volc.bigasr.auc.duration |
| Realtime | volc.speech.dialog |
| Podcast | volc.service_type.10050 |
| Translation | volc.megatts.simt |
| Voice Clone | seed-icl-2.0 |
Clusters (V1)
| Cluster | Service |
|---|---|
volcano_tts | TTS Standard |
volcano_mega | TTS BigModel |
volcano_icl | Voice Clone |
volcengine_streaming_common | ASR Streaming |
Examples Directory
examples/go/doubaospeech/- Go SDK examplesexamples/cmd/doubaospeech/- CLI test scripts
Related
- CLI tool:
go/cmd/doubaospeech/ - CLI tests:
examples/cmd/doubaospeech/
豆包语音(Doubao Speech)API 文档
原始文档
- 文档首页: https://www.volcengine.com/docs/6561/162929
- 控制台: https://console.volcengine.com/speech/app
如果本文档信息不完整,请访问上述链接获取最新内容。
产品体系
豆包语音分为两代产品:大模型版(2.0) 和 经典版(1.0)。推荐使用大模型版。
语音合成(TTS)
大模型语音合成 2.0
| 接口 | 端点 | Resource ID | 文档 |
|---|---|---|---|
| 单向流式 HTTP V3 | POST /api/v3/tts/unidirectional | seed-tts-2.0 | stream-http.md |
| 单向流式 WebSocket V3 | WSS /api/v3/tts/unidirectional | seed-tts-2.0 | stream-ws.md |
| 双向流式 WebSocket V3 | WSS /api/v3/tts/bidirection | seed-tts-2.0 | duplex-ws.md |
| 异步长文本 | POST /api/v3/tts/async/submit | seed-tts-2.0-concurr | async.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1598757 (单向流式HTTP-V3)
- https://www.volcengine.com/docs/6561/1719100 (单向流式WebSocket-V3)
- https://www.volcengine.com/docs/6561/1329505 (双向流式WebSocket-V3)
- https://www.volcengine.com/docs/6561/1330194 (异步长文本)
经典版语音合成 1.0
| 接口 | 端点 | Cluster | 文档 |
|---|---|---|---|
| HTTP 一次性合成 | POST /api/v1/tts | volcano_tts | http.md |
| WebSocket 流式 | WSS /api/v1/tts/ws_binary | volcano_tts | websocket.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/79820 (HTTP接口)
- https://www.volcengine.com/docs/6561/79821 (WebSocket接口)
- https://www.volcengine.com/docs/6561/97465 (参数说明)
精品长文本语音合成
| 接口 | 端点 | 文档 |
|---|---|---|
| 异步长文本 | POST /api/v1/long_tts/submit | long-tts.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1096680
语音识别(ASR)
大模型语音识别 2.0
| 接口 | 端点 | Resource ID | 文档 |
|---|---|---|---|
| 流式识别 WebSocket | WSS /api/v3/sauc/bigmodel | volc.bigasr.sauc.duration | streaming.md |
| 录音文件识别(标准版) | POST /api/v3/asr/bigmodel/submit | volc.bigasr.auc.duration | file-standard.md |
| 录音文件识别(极速版) | POST /api/v3/asr/bigmodel_async/submit | volc.bigasr.auc.duration | file-fast.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1354869 (大模型流式语音识别)
- https://www.volcengine.com/docs/6561/1354868 (大模型录音文件识别标准版)
- https://www.volcengine.com/docs/6561/1631584 (大模型录音文件极速版)
- https://www.volcengine.com/docs/6561/1840838 (大模型录音文件闲时版)
经典版语音识别 1.0
| 接口 | 端点 | Cluster | 文档 |
|---|---|---|---|
| 一句话识别 | POST /api/v1/asr | volcengine_input_common | one-sentence.md |
| 流式识别 | WSS /api/v2/asr | volcengine_streaming_common | streaming.md |
| 录音文件标准版 | POST /api/v1/asr/submit | volc.megatts.default | file-standard.md |
| 录音文件极速版 | POST /api/v1/asr/async/submit | volc.megatts.default | file-fast.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/104897 (一句话识别)
- https://www.volcengine.com/docs/6561/80816 (流式语音识别)
- https://www.volcengine.com/docs/6561/80818 (录音文件识别标准版)
- https://www.volcengine.com/docs/6561/80820 (录音文件识别极速版)
声音复刻
| 接口 | 端点 | Cluster | 文档 |
|---|---|---|---|
| 训练提交 | POST /api/v1/mega_tts/audio/upload | volcano_icl | api.md |
| 状态查询 | POST /api/v1/mega_tts/status | volcano_icl | api.md |
| 激活音色 | POST /api/v1/mega_tts/audio/activate | volcano_icl | api.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1305191 (声音复刻API)
- https://www.volcengine.com/docs/6561/1829010 (声音复刻下单及使用指南)
实时语音大模型
| 接口 | 端点 | Resource ID | 文档 |
|---|---|---|---|
| 实时对话 | WSS /api/v3/realtime/dialogue | volc.speech.dialog | api.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1257584 (端到端实时语音大模型API)
播客合成
| 接口 | 端点 | Resource ID | 文档 |
|---|---|---|---|
| WebSocket V3 | WSS /api/v3/sami/podcasttts | volc.megatts.podcast | api.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1668014 (播客API-websocket-v3协议)
同声传译
| 接口 | 端点 | Resource ID | 文档 |
|---|---|---|---|
| WebSocket V3 | WSS /api/v3/saas/simt | volc.megatts.simt | api.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/xxx (同声传译2.0-API)
语音妙记(会议纪要)
| 接口 | 端点 | 文档 |
|---|---|---|
| 异步提交 | POST /api/v1/meeting/submit | api.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/xxx (豆包语音妙记-API)
音视频字幕
| 接口 | 端点 | 文档 |
|---|---|---|
| 字幕生成 | POST /api/v1/subtitle/submit | subtitle.md |
| 字幕打轴 | POST /api/v1/subtitle/align | align.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/192519 (音视频字幕生成)
- https://www.volcengine.com/docs/6561/113635 (自动字幕打轴)
控制台管理 API
| 接口 | 端点 | 认证方式 | 文档 |
|---|---|---|---|
| 大模型音色列表 | POST /ListBigModelTTSTimbres | AK/SK | timbre.md |
| 大模型音色列表(新) | POST /ListSpeakers | AK/SK | timbre.md |
| API Key 管理 | POST /ListAPIKeys | AK/SK | apikey.md |
| 服务状态管理 | POST /ServiceStatus | AK/SK | service.md |
| 配额监控 | POST /QuotaMonitoring | AK/SK | monitoring.md |
| 声音复刻状态 | POST /ListMegaTTSTrainStatus | AK/SK | voice-clone-status.md |
原始文档链接:
- https://www.volcengine.com/docs/6561/1770994 (ListBigModelTTSTimbres)
- https://www.volcengine.com/docs/6561/2160690 (ListSpeakers)
认证方式
Speech API(语音服务)
语音服务使用以下认证方式:
| 认证方式 | Header | 适用场景 |
|---|---|---|
| Access Token | Authorization: Bearer; {token} | HTTP/WebSocket V1-V2 |
| X-Api 认证 | X-Api-App-Id, X-Api-Access-Key | WebSocket V3 |
| Request Body | app.token | 部分 HTTP 接口 |
Console API(控制台服务)
控制台 API 使用 Volcengine OpenAPI AK/SK 签名认证:
Authorization: HMAC-SHA256 Credential={AccessKeyId}/...
详见 auth.md
快速选择
| 需求 | 推荐接口 | 文档 |
|---|---|---|
| 短文本实时合成 | TTS 2.0 单向流式 HTTP V3 | stream-http.md |
| 长文本批量合成 | TTS 2.0 异步接口 | async.md |
| 实时语音交互 | 实时对话 API | realtime/api.md |
| 定制音色 | 声音复刻 API | voice-clone/api.md |
| 实时语音识别 | ASR 2.0 流式 | asr2.0/streaming.md |
| 录音文件转写 | ASR 2.0 文件识别 | asr2.0/file-standard.md |
| 播客生成 | 播客 API | podcast/api.md |
Doubao Speech SDK - Go Implementation
Import: github.com/haivivi/giztoy/pkg/doubaospeech
Clients
Speech API Client
type Client struct {
// V1 Services (Classic)
TTS *TTSService
ASR *ASRService
// V2 Services (BigModel)
TTSV2 *TTSServiceV2
ASRV2 *ASRServiceV2
// Shared Services
VoiceClone *VoiceCloneService
Realtime *RealtimeService
Meeting *MeetingService
Podcast *PodcastService
Translation *TranslationService
Media *MediaService
}
Constructor:
// With API Key (recommended)
client := doubaospeech.NewClient("app-id",
doubaospeech.WithAPIKey("your-api-key"),
doubaospeech.WithCluster("volcano_tts"),
)
// With Bearer Token
client := doubaospeech.NewClient("app-id",
doubaospeech.WithBearerToken("your-token"),
)
// With V2 API Key (for BigModel APIs)
client := doubaospeech.NewClient("app-id",
doubaospeech.WithV2APIKey("access-key", "app-key"),
doubaospeech.WithResourceID("seed-tts-2.0"),
)
Console API Client
console := doubaospeech.NewConsole("access-key", "secret-key")
Services
TTS V1 (Classic)
// Synchronous
resp, err := client.TTS.Synthesize(ctx, &doubaospeech.TTSRequest{
Text: "你好,世界!",
VoiceType: "zh_female_cancan",
})
// resp.Audio contains audio bytes
// Streaming (Go 1.23+ iter.Seq2)
for chunk, err := range client.TTS.SynthesizeStream(ctx, req) {
if err != nil {
return err
}
buf.Write(chunk.Audio)
}
TTS V2 (BigModel)
// Streaming HTTP
for chunk, err := range client.TTSV2.SynthesizeStream(ctx, &doubaospeech.TTSV2Request{
Text: "你好,世界!",
VoiceType: "zh_female_cancan",
ResourceID: "seed-tts-2.0",
}) {
// Process chunk
}
// Async (long text)
task, err := client.TTSV2.SubmitAsync(ctx, &doubaospeech.AsyncTTSRequest{
Text: longText,
})
result, err := task.Wait(ctx)
ASR (Speech Recognition)
// One-sentence (V1)
resp, err := client.ASR.Recognize(ctx, &doubaospeech.ASRRequest{
Audio: audioData,
Format: "pcm",
Language: "zh-CN",
})
// Streaming (WebSocket)
session, err := client.ASR.OpenStreamSession(ctx, &doubaospeech.StreamASRConfig{
Format: "pcm",
SampleRate: 16000,
})
defer session.Close()
// Send audio chunks
session.SendAudio(ctx, audioData, false)
session.SendAudio(ctx, lastData, true)
// Receive results
for chunk, err := range session.Recv() {
if err != nil {
break
}
fmt.Println(chunk.Text)
}
Voice Clone
// Upload audio for training
result, err := client.VoiceClone.Upload(ctx, &doubaospeech.VoiceCloneRequest{
AudioData: audioData,
VoiceID: "my-custom-voice",
})
// Check status
status, err := client.VoiceClone.GetStatus(ctx, "my-custom-voice")
// Activate voice
err := client.VoiceClone.Activate(ctx, "my-custom-voice")
Realtime Dialogue
session, err := client.Realtime.Connect(ctx, &doubaospeech.RealtimeConfig{
Model: "speech-dialog-001",
})
defer session.Close()
// Send audio
session.SendAudio(audioData)
// Receive events
for event := range session.Events() {
switch event.Type {
case "asr_result":
fmt.Println("User:", event.AsrResult.Text)
case "tts_audio":
play(event.TtsAudio)
}
}
Console API
// List available voices
voices, err := console.ListSpeakers(ctx, &doubaospeech.ListSpeakersRequest{})
// List timbres
timbres, err := console.ListTimbres(ctx, &doubaospeech.ListTimbresRequest{})
// Check voice clone status
status, err := console.ListVoiceCloneStatus(ctx, &doubaospeech.ListVoiceCloneStatusRequest{
VoiceID: "my-custom-voice",
})
Options
| Option | Description |
|---|---|
WithAPIKey(key) | x-api-key authentication |
WithBearerToken(token) | Bearer token authentication |
WithV2APIKey(access, app) | V2/V3 API authentication |
WithCluster(cluster) | Set cluster name (V1) |
WithResourceID(id) | Set resource ID (V2) |
WithBaseURL(url) | Custom HTTP base URL |
WithWebSocketURL(url) | Custom WebSocket URL |
WithHTTPClient(client) | Custom HTTP client |
WithTimeout(duration) | Request timeout |
WithUserID(id) | User identifier |
Error Handling
if err != nil {
if e, ok := doubaospeech.AsError(err); ok {
fmt.Printf("Error %d: %s\n", e.Code, e.Message)
if e.IsRateLimit() {
// Handle rate limiting
}
}
}
Doubao Speech SDK - Rust Implementation
Crate: giztoy-doubaospeech
Clients
Speech API Client
#![allow(unused)] fn main() { pub struct Client { // Internal HTTP/WebSocket clients } impl Client { pub fn tts(&self) -> TtsService; pub fn asr(&self) -> AsrService; pub fn voice_clone(&self) -> VoiceCloneService; pub fn realtime(&self) -> RealtimeService; pub fn meeting(&self) -> MeetingService; pub fn podcast(&self) -> PodcastService; pub fn translation(&self) -> TranslationService; pub fn media(&self) -> MediaService; } }
Constructor:
#![allow(unused)] fn main() { use giztoy_doubaospeech::Client; // With API Key (recommended) let client = Client::builder("app-id") .api_key("your-api-key") .cluster("volcano_tts") .build()?; // With Bearer Token let client = Client::builder("app-id") .bearer_token("your-token") .build()?; // With V2 API Key let client = Client::builder("app-id") .v2_api_key("access-key", "app-key") .resource_id("seed-tts-2.0") .build()?; }
Console Client
#![allow(unused)] fn main() { use giztoy_doubaospeech::Console; let console = Console::new("access-key", "secret-key"); }
Services
TTS Service
#![allow(unused)] fn main() { use giztoy_doubaospeech::{TtsRequest, TtsService}; // Synchronous let response = client.tts().synthesize(&TtsRequest { text: "你好,世界!".to_string(), voice_type: "zh_female_cancan".to_string(), ..Default::default() }).await?; // response.audio contains bytes // Streaming let stream = client.tts().synthesize_stream(&req).await?; while let Some(chunk) = stream.next().await { let chunk = chunk?; if let Some(audio) = chunk.audio { buf.extend(&audio); } } }
ASR Service
#![allow(unused)] fn main() { use giztoy_doubaospeech::{OneSentenceRequest, StreamAsrConfig}; // One-sentence let result = client.asr().recognize(&OneSentenceRequest { audio: audio_data, format: "pcm".to_string(), language: "zh-CN".to_string(), ..Default::default() }).await?; // Streaming let session = client.asr().open_stream_session(&StreamAsrConfig { format: "pcm".to_string(), sample_rate: 16000, ..Default::default() }).await?; // Send audio session.send_audio(&audio_data, false).await?; session.send_audio(&last_data, true).await?; // Receive results while let Some(result) = session.recv().await { let chunk = result?; println!("Text: {}", chunk.text); } }
Voice Clone Service
#![allow(unused)] fn main() { // Upload for training let result = client.voice_clone().upload(&VoiceCloneTrainRequest { audio_data: audio_bytes, voice_id: "my-custom-voice".to_string(), ..Default::default() }).await?; // Check status let status = client.voice_clone().get_status("my-custom-voice").await?; }
Realtime Service
#![allow(unused)] fn main() { use giztoy_doubaospeech::{RealtimeConfig, RealtimeEventType}; let session = client.realtime().connect(&RealtimeConfig { model: "speech-dialog-001".to_string(), ..Default::default() }).await?; // Send audio session.send_audio(&audio_data).await?; // Receive events while let Some(event) = session.recv().await { let event = event?; match event.event_type { RealtimeEventType::AsrResult => { println!("User: {}", event.asr_result.text); } RealtimeEventType::TtsAudio => { play(&event.tts_audio); } _ => {} } } }
Console API
#![allow(unused)] fn main() { use giztoy_doubaospeech::{Console, ListSpeakersRequest}; let console = Console::new("access-key", "secret-key"); // List speakers let speakers = console.list_speakers(&ListSpeakersRequest::default()).await?; // List timbres let timbres = console.list_timbres(&ListTimbresRequest::default()).await?; }
Builder Options
| Method | Description |
|---|---|
api_key(key) | x-api-key authentication |
bearer_token(token) | Bearer token authentication |
v2_api_key(access, app) | V2/V3 API authentication |
cluster(cluster) | Set cluster name (V1) |
resource_id(id) | Set resource ID (V2) |
base_url(url) | Custom HTTP base URL |
ws_url(url) | Custom WebSocket URL |
timeout(duration) | Request timeout |
user_id(id) | User identifier |
Error Handling
#![allow(unused)] fn main() { use giztoy_doubaospeech::{Error, Result}; match client.tts().synthesize(&req).await { Ok(resp) => { /* ... */ } Err(Error::Api { code, message }) => { eprintln!("API Error {}: {}", code, message); } Err(e) => { eprintln!("Error: {}", e); } } }
Differences from Go
| Feature | Go | Rust |
|---|---|---|
| V1/V2 service access | Separate fields (TTS, TTSV2) | Single service with version param |
| Streaming | iter.Seq2 | Stream<Item=Result<T>> |
| Session management | Manual close | Drop trait |
| WebSocket | gorilla/websocket | tokio-tungstenite |
Doubao Speech SDK - Known Issues
🟡 Minor Issues
DBS-001: Go auth header format unusual
File: go/pkg/doubaospeech/client.go:238-239
Description:
Bearer token format is Bearer;{token} instead of standard Bearer {token}:
req.Header.Set("Authorization", "Bearer;"+c.config.accessToken)
Impact: Non-standard but required by Volcengine API.
Note: This is API requirement, not SDK issue.
DBS-002: Multiple auth method complexity
Description:
SDK supports 4+ authentication methods:
- API Key (
x-api-key) - Bearer Token (
Authorization: Bearer;) - V2 API Key (
X-Api-Access-Key,X-Api-App-Key) - Resource-specific fixed keys
Impact: Confusing for users which method to use for which service.
Suggestion: Add helper methods like NewTTSClient(), NewRealtimeClient() with correct defaults.
DBS-003: Resource ID vs Cluster confusion
Description:
V1 uses "cluster", V2 uses "resource_id" for service selection:
- V1:
WithCluster("volcano_tts") - V2:
WithResourceID("seed-tts-2.0")
Impact: Easy to mix up, unclear which to use when.
DBS-004: Rust async TTS incomplete
Description:
Rust implementation for async long-text TTS may be incomplete or missing compared to Go.
Status: ⚠️ Needs verification.
DBS-005: Rust file ASR incomplete
Description:
Rust implementation for file-based ASR may be incomplete compared to Go.
Status: ⚠️ Needs verification.
DBS-006: Fixed app keys hardcoded
File: go/pkg/doubaospeech/client.go:17-24
Description:
Some V3 APIs use fixed app keys from documentation:
const (
AppKeyRealtime = "PlgvMymc7f3tQnJ6"
AppKeyPodcast = "aGjiRDfUWi"
)
Impact: If Volcengine changes these, SDK breaks until updated.
Note: This is documented API behavior.
🔵 Enhancements
DBS-007: No automatic service version selection
Description:
User must manually choose between V1 and V2 services. No automatic selection based on features needed.
Suggestion: Add unified service that routes to correct version.
DBS-008: No connection pooling documentation
Description:
WebSocket connections for streaming services could benefit from pooling documentation.
DBS-009: No retry for WebSocket connections
Description:
HTTP requests have retry, but WebSocket connections don't auto-reconnect on failure.
Suggestion: Add reconnection logic for streaming sessions.
DBS-010: Console API missing some endpoints
Description:
Console client may not cover all management APIs available on Volcengine.
⚪ Notes
DBS-011: Dual API version design
Description:
Having both V1 (Classic) and V2 (BigModel) services in same client reflects Volcengine's actual API structure. This is intentional, not a flaw.
DBS-012: Protocol module for WebSocket
Description:
Both Go and Rust have a protocol module for WebSocket message serialization. This is well-structured for the binary protocol requirements.
DBS-013: Comprehensive service coverage
Description:
SDK covers nearly all Doubao Speech services:
- TTS (sync, stream, async)
- ASR (one-sentence, stream, file)
- Voice Clone
- Realtime Dialogue
- Meeting Transcription
- Podcast Synthesis
- Translation
- Media Subtitle
This is impressive coverage.
DBS-014: Console uses AK/SK signature
Description:
Console API uses Volcengine OpenAPI signature (HMAC-SHA256), not simple token. This is standard for Volcengine management APIs.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| DBS-001 | 🟡 Minor | Note | Go Auth |
| DBS-002 | 🟡 Minor | Open | Both |
| DBS-003 | 🟡 Minor | Open | Both |
| DBS-004 | 🟡 Minor | Open | Rust |
| DBS-005 | 🟡 Minor | Open | Rust |
| DBS-006 | 🟡 Minor | Note | Both |
| DBS-007 | 🔵 Enhancement | Open | Both |
| DBS-008 | 🔵 Enhancement | Open | Both |
| DBS-009 | 🔵 Enhancement | Open | Both |
| DBS-010 | 🔵 Enhancement | Open | Both |
| DBS-011 | ⚪ Note | N/A | Both |
| DBS-012 | ⚪ Note | N/A | Both |
| DBS-013 | ⚪ Note | N/A | Both |
| DBS-014 | ⚪ Note | N/A | Both |
Overall: Comprehensive SDK with excellent API coverage. Main complexity is from Volcengine's dual API version system and multiple authentication methods. Rust implementation may have some gaps compared to Go.
Jiutian API Documentation
九天大模型 (Jiutian) - China Mobile's AI Service Platform.
Official API Documentation: api/README.md
Overview
Jiutian is China Mobile's terminal intelligent agent service management platform for AI/LLM cloud integration.
Note: This is API documentation only. No Go/Rust SDK implementation exists in this repository.
API Features
- Chat Completions: OpenAI-compatible chat API
- Device Integration: Device registration and heartbeat protocols
- Assistant Management: Configure AI assistants
Integration Notes
For integration with Jiutian API:
- Use OpenAI-compatible SDK with custom base URL
- Follow device authentication protocols
- See api/tutorial.md for quick start
Authentication
Requires:
- AI Token (obtained via email application)
- IP whitelist registration
- Product ID from management platform
Environment
| Environment | URL |
|---|---|
| Test | https://z5f3vhk2.cxzfdm.com:30101 |
| Production | https://ivs.chinamobiledevice.com:30100 |
SDK Implementation Status
| Language | Status |
|---|---|
| Go | ❌ Not implemented |
| Rust | ❌ Not implemented |
For basic integration, use OpenAI-compatible SDK:
Go:
config := openai.DefaultConfig(jiutianToken)
config.BaseURL = "https://ivs.chinamobiledevice.com:30100/v1"
client := openai.NewClientWithConfig(config)
Related Documentation
九天大模型 API 文档
终端智能体服务管理平台 AI 大模型云云对接文档
文档索引
| 文档 | 说明 |
|---|---|
| tutorial.md | 🚀 快速入门教程(推荐先看) |
| concepts.md | 关键词说明(文本生成模型、助手、令牌) |
| auth.md | 身份验证说明 |
| chat.md | Chat Completions API |
| device.md | 设备接入协议(获取设备信息、心跳上报) |
| faq.md | 常见问题 Q&A |
文档变更记录
| 日期 | 版本 | 操作内容 | 操作人 |
|---|---|---|---|
| 25.02.28 | V1.0 | 定义AI硬件厂商中控平台对接终端智能体服务管理平台大模型服务的接口协议 | 邹益强 |
| 25.04.30 | v1.0.1 | 调整文档格式 | 邹益强 |
| 25.11.14 | v1.0.2 | 增加申请邮件说明 | 邹益强 |
| 25.12.29 | v1.0.3 | 增加非生成式ai接入说明 | 邹益强 |
AI 服务接入流程
- 请提供厂商服务器IP开通访问白名单,以及向纳管平台申请的产品id(productId)来申请AI token
- 邮件发送至:
- zouyiqiang_fx@cmdc.chinamobile.com
- zhucaiwen_fx@cmdc.chinamobile.com
- 抄送:zhengzhongwei_fx@cmdc.chinamobile.com
- 白名单开通后,厂商服务器就可以使用 AI TOKEN 调用九天大模型接口
环境配置
| 环境 | 地址 |
|---|---|
| 测试环境 | https://z5f3vhk2.cxzfdm.com:30101 |
| 生产环境 | https://ivs.chinamobiledevice.com:30100 |
测试 Token: sk-Y73NAU0tArvGRlpUE9060529470b42Ac8bA34d40F48b0564
系统提示词: 您好,我是中国移动的智能助理灵犀。如果您询问我的身份,我会回答:"您好,我是中国移动智能助理灵犀"。
模型上下文长度: 8K
Jiutian - Go Implementation
Status: Not Implemented
No native Go SDK for Jiutian API exists in this repository.
Recommendation
Use OpenAI-compatible SDK since Jiutian API follows OpenAI chat completions format:
import "github.com/sashabaranov/go-openai"
config := openai.DefaultConfig("sk-your-jiutian-token")
config.BaseURL = "https://ivs.chinamobiledevice.com:30100/v1"
client := openai.NewClientWithConfig(config)
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "jiutian",
Messages: []openai.ChatCompletionMessage{
{
Role: "system",
Content: "您好,我是中国移动的智能助理灵犀。",
},
{
Role: "user",
Content: "你是谁?",
},
},
})
Device Protocol
For device-specific features (registration, heartbeat), implement HTTP client directly:
// Device heartbeat
type HeartbeatRequest struct {
DeviceID string `json:"device_id"`
ProductID string `json:"product_id"`
Timestamp int64 `json:"timestamp"`
}
func sendHeartbeat(ctx context.Context, client *http.Client, req *HeartbeatRequest) error {
// POST to /api/device/heartbeat
}
Future Work
A native SDK could provide:
- Device registration/heartbeat management
- Token refresh handling
- Jiutian-specific features
See api/device.md for device protocol details.
Jiutian - Rust Implementation
Status: Not Implemented
No native Rust SDK for Jiutian API exists in this repository.
Recommendation
Use OpenAI-compatible SDK since Jiutian API follows OpenAI chat completions format:
#![allow(unused)] fn main() { use async_openai::{Client, config::OpenAIConfig}; let config = OpenAIConfig::new() .with_api_key("sk-your-jiutian-token") .with_api_base("https://ivs.chinamobiledevice.com:30100/v1"); let client = Client::with_config(config); let request = CreateChatCompletionRequestArgs::default() .model("jiutian") .messages([ ChatCompletionRequestMessage::System( ChatCompletionRequestSystemMessageArgs::default() .content("您好,我是中国移动的智能助理灵犀。") .build()? ), ChatCompletionRequestMessage::User( ChatCompletionRequestUserMessageArgs::default() .content("你是谁?") .build()? ), ]) .build()?; let response = client.chat().create(request).await?; }
Device Protocol
For device-specific features, implement HTTP client using reqwest:
#![allow(unused)] fn main() { use serde::{Deserialize, Serialize}; #[derive(Serialize)] struct HeartbeatRequest { device_id: String, product_id: String, timestamp: i64, } async fn send_heartbeat( client: &reqwest::Client, base_url: &str, req: &HeartbeatRequest, ) -> Result<(), Error> { client .post(format!("{}/api/device/heartbeat", base_url)) .json(req) .send() .await?; Ok(()) } }
Future Work
A native SDK could provide:
- Device registration/heartbeat management
- Token refresh handling
- Jiutian-specific features
See api/device.md for device protocol details.
Jiutian - Known Issues
🔴 Major Issues
JT-001: No SDK implementation
Description:
No Go or Rust SDK implementation exists for Jiutian API.
Impact: Users must use OpenAI-compatible SDK or implement HTTP calls directly.
Recommendation:
- For chat completions: Use OpenAI SDK with custom base URL
- For device features: Implement direct HTTP calls
🔵 Enhancements
JT-002: Native SDK desired
Description:
A native SDK would be useful for:
- Device registration/heartbeat protocols
- Token management
- Jiutian-specific error handling
Priority: Low - OpenAI SDK covers main use case.
JT-003: Device protocol documentation only
Description:
Device registration and heartbeat protocols are documented but not implemented.
Files affected:
⚪ Notes
JT-004: OpenAI-compatible API
Description:
Jiutian chat API is OpenAI-compatible, so existing OpenAI SDKs work:
- Go:
github.com/sashabaranov/go-openai - Rust:
async-openai - Python:
openai
Just set custom base URL and use Jiutian token.
JT-005: Access requirements
Description:
Jiutian API requires:
- IP whitelist registration
- Product ID from management platform
- AI token (obtained via email application)
This is documented in api/README.md.
JT-006: China Mobile specific
Description:
This API is specific to China Mobile's terminal intelligent agent service management platform. May not be relevant for all users of this repository.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| JT-001 | 🔴 Major | Open | Both |
| JT-002 | 🔵 Enhancement | Open | Both |
| JT-003 | 🔵 Enhancement | Open | Both |
| JT-004 | ⚪ Note | N/A | Both |
| JT-005 | ⚪ Note | N/A | Both |
| JT-006 | ⚪ Note | N/A | Both |
Overall: Documentation-only module without SDK implementation. OpenAI-compatible SDK recommended for chat functionality.
mqtt0
Overview
mqtt0 is a lightweight MQTT implementation focused on QoS 0. It provides both client and broker components with explicit control over authentication and ACL. The Go implementation is synchronous and net.Conn-based, while the Rust implementation is async (Tokio) with optional TLS/WebSocket transport features.
Design Goals
- Minimal MQTT feature set with strong QoS 0 focus
- Explicit ACL/auth hooks for connect/publish/subscribe
- Simple broker suitable for embedded or internal services
- Support MQTT 3.1.1 (v4) and MQTT 5.0 (v5)
- Provide transport flexibility (TCP/TLS/WebSocket)
Key Concepts
- Client: QoS 0 publish/subscribe, keepalive, protocol v4/v5
- Broker: connection lifecycle, ACL checks, topic routing
- Shared subscriptions: $share/{group}/{topic}
- Topic alias (v5): reduce bandwidth by reusing alias per client
- Transports: TCP/TLS/WebSocket based on URL scheme or feature flags
Components
- Client
- Broker
- Protocol parser/encoder
- Topic trie (subscription routing)
- Transport layer
Protocol and Transport Support
- MQTT 3.1.1 and MQTT 5.0 for client and broker
- TCP and TLS by default
- WebSocket/WSS when enabled (Rust feature flags)
Examples
- Go: use
Connect,Subscribe,Publish,Recv - Rust: use
Client::connect,client.subscribe,client.publish,client.recv
Related Modules
docs/lib/trie(topic routing)docs/lib/encoding(protocol helpers)
mqtt0 (Go)
Package Layout
doc.go: high-level overview and usage examplesclient.go: QoS 0 client implementationbroker.go: broker implementation with ACL hookspacket_v4.go,packet_v5.go,packet.go: protocol encode/decodelistener.go,dialer.go: transport helperstrie.go: subscription routing
Public Interfaces
ClientConfig: broker address, protocol version, TLS config, keepalive, etc.Client:Connect,Subscribe,Unsubscribe,Publish,Recv,CloseBroker:Serve,ServeConn, ACL hooks, callbacksAuthenticator: access control on connect/publish/subscribeHandler: callback for inbound broker messagesMessage,ProtocolVersion,QoS
Design Notes
- Single connection with separate read/write locks to guard concurrent access.
- Request/response operations (SUBSCRIBE/UNSUBSCRIBE) read from the same stream as inbound PUBLISH messages.
- Keepalive runs in a goroutine when
AutoKeepaliveis enabled. - Shared subscriptions and topic aliasing are handled in the broker.
Transport
- URL-based address parsing:
tcp://,tls://,ws://,wss:// Dialerhook allows custom connection logic- TLS config supported via
ClientConfig.TLSConfig
Notable Behaviors
- QoS 0 only; no packet persistence or retransmission.
- Broker drops messages when per-client channel is full (non-blocking send).
mqtt0 (Rust)
Crate Layout
lib.rs: public exports, crate overviewclient.rs: async QoS 0 clientbroker.rs: async broker with ACL hooksprotocol.rs: MQTT encode/decodetransport.rs: TCP/TLS/WebSocket transport abstractiontrie.rs: subscription routingtypes.rs: public types and traits
Public Interfaces
Client,ClientConfig: async connect/subscribe/publish/recvBroker,BrokerConfig,BrokerBuilder: broker setup and lifecycleAuthenticator,Handler: ACL and message handlingMessage,ProtocolVersion,QoSTransportType,Transport(feature-gated TLS/WebSocket)
Design Notes
- Fully async, based on Tokio and mpsc channels.
- Builder pattern for broker configuration and hooks.
- Transport features are behind Cargo feature flags (TLS, WebSocket).
Differences vs Go
- Rust uses async traits for client/broker operations.
- TLS/WebSocket support is feature-gated.
- Broker construction encourages builder configuration.
mqtt0 - Known Issues
🟠 Major Issues
MQTT0-001: Go Client read path is not demultiplexed
File: go/pkg/mqtt0/client.go
Description:
Subscribe, Unsubscribe, and other request/response operations read directly
from the same stream as Recv(). If callers run Recv() concurrently with
Subscribe() or Unsubscribe(), whichever acquires readMu first may consume
packets that belong to the other operation, causing unexpected packet errors.
Impact: Hard-to-debug race between subscription changes and inbound message handling.
Suggestion:
Introduce a single read loop with protocol demuxing, or document that
Recv() must not run concurrently with subscribe/unsubscribe calls.
🟡 Minor Issues
MQTT0-002: Go Broker drops messages on backpressure
File: go/pkg/mqtt0/broker.go
Description: The broker uses a bounded channel for each client. When the channel is full, messages are dropped with a debug log.
Impact: Message loss under bursty load beyond QoS 0 expectations; may surprise users.
Suggestion: Document the drop behavior clearly or make buffer size configurable.
MQTT0-003: Rust WebSocket transport requires special handling
File: rust/mqtt0/src/transport.rs
Description:
Transport::WebSocket implements AsyncRead/AsyncWrite by returning
Unsupported errors. If code treats Transport uniformly, WebSocket
connections will fail at runtime.
Impact: Surprising runtime errors for WebSocket clients if not handled explicitly.
Suggestion: Expose dedicated websocket read/write APIs or document the required handling.
Buffer Package
Thread-safe streaming buffer implementations for producer-consumer patterns.
Design Goals
- Type-safe: Generic buffers for any element type (not limited to bytes)
- Thread-safe: All operations support concurrent access
- Blocking Semantics: Support blocking read/write with proper close mechanics
- Flow Control: Different policies for handling full buffers
Buffer Types
| Type | Full Behavior | Empty Behavior | Use Case |
|---|---|---|---|
| Buffer | Grow | Block | Variable-size data, unknown total size |
| BlockBuffer | Block | Block | Flow control, bounded memory |
| RingBuffer | Overwrite | Block | Sliding window, latest data only |
Buffer (Growable)
A dynamically growing buffer that never blocks writes. Ideal for scenarios where:
- Data size is not known in advance
- Memory is not constrained
- Writer should never block
flowchart LR
W[Writer] --> B["[Grows on demand]"]
B --> R["Reader<br/>(blocks when empty)"]
BlockBuffer (Fixed, Blocking)
A fixed-size circular buffer that blocks on both read and write. Provides backpressure for flow control:
- Writer blocks when buffer is full
- Reader blocks when buffer is empty
- Predictable memory usage
flowchart LR
W["Writer<br/>(blocks when full)"] --> B["[Fixed Size]"]
B --> R["Reader<br/>(blocks when empty)"]
RingBuffer (Fixed, Overwriting)
A fixed-size circular buffer that overwrites oldest data when full. Ideal for:
- Maintaining a sliding window of latest data
- Real-time data where older samples are stale
- Memory-bounded with freshness priority
flowchart LR
W[Writer] --> B["[Overwrites oldest]"]
B --> R["Reader<br/>(blocks when empty)"]
Common Interface
All buffer types share a consistent interface:
| Operation | Description |
|---|---|
Write([]T) | Write slice of elements |
Read([]T) | Read into slice |
Add(T) | Add single element |
Next() | Read single element (iterator pattern) |
Discard(n) | Skip n elements without reading |
Len() | Current element count |
Reset() | Clear all data |
CloseWrite() | Graceful close (allow drain) |
CloseWithError(err) | Immediate close with error |
Close() | Close both ends |
Error() | Get close error (if any) |
Close Semantics
CloseWrite() - Graceful
sequenceDiagram
participant W as Writer
participant B as Buffer
participant R as Reader
W->>B: CloseWrite()
Note over B: No new writes
R->>B: Read()
B-->>R: Remaining data
R->>B: Read()
B-->>R: EOF
CloseWithError(err) - Immediate
sequenceDiagram
participant W as Writer
participant B as Buffer
participant R as Reader
W->>B: CloseWithError(err)
Note over B: Both ends closed
R->>B: Read()
B-->>R: err
Examples Directory
examples/go/buffer/- Go usage examplesexamples/rust/buffer/- Rust usage examples
Implementation Notes
Memory Layout
| Type | Layout |
|---|---|
| Buffer | Dynamic slice → grows via append |
| BlockBuffer | Fixed circular → head/tail pointers wrap |
| RingBuffer | Fixed circular → overwrites when head catches tail |
Notification Mechanism
- Go: Channel-based (
writeNotify chan struct{}) or Cond variables - Rust: Condvar-based (
Condvar::notify_one/all)
Thread Safety
- Go:
sync.Mutex+sync.Cond/ channels - Rust:
Mutex<State>+Condvar, wrapped inArcfor cloning
Related Packages
audio/pcm- Uses buffers for PCM audio streamschatgear- Uses buffers for audio frame transmissionopusrt- Uses RingBuffer for jitter buffering
Buffer Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/buffer
Types
Buffer[T]
Growable buffer with generic type support.
type Buffer[T any] struct {
writeNotify chan struct{}
mu sync.Mutex
closeWrite bool
closeErr error
buf []T
}
Key Methods:
| Method | Signature | Description |
|---|---|---|
N | func N[T any](n int) *Buffer[T] | Create with initial capacity |
Write | (b *Buffer[T]) Write(p []T) (int, error) | Append elements |
Read | (b *Buffer[T]) Read(p []T) (int, error) | Read elements (blocks) |
Add | (b *Buffer[T]) Add(t T) error | Add single element |
Next | (b *Buffer[T]) Next() (T, error) | Pop from end (LIFO) |
Bytes | (b *Buffer[T]) Bytes() []T | Get internal slice (unsafe) |
BlockBuffer[T]
Fixed-size circular buffer with blocking semantics.
type BlockBuffer[T any] struct {
cond *sync.Cond
mu sync.Mutex
buf []T
head, tail int64
closeWrite bool
closeErr error
}
Key Methods:
| Method | Signature | Description |
|---|---|---|
Block | func Block[T any](buf []T) *BlockBuffer[T] | Create from existing slice |
BlockN | func BlockN[T any](size int) *BlockBuffer[T] | Create with size |
Write | (bb *BlockBuffer[T]) Write(p []T) (int, error) | Write (blocks when full) |
Read | (bb *BlockBuffer[T]) Read(p []T) (int, error) | Read (blocks when empty) |
Next | (bb *BlockBuffer[T]) Next() (T, error) | Read single (FIFO) |
RingBuffer[T]
Fixed-size circular buffer with overwrite semantics.
type RingBuffer[T any] struct {
writeNotify chan struct{}
mu sync.Mutex
buf []T
head, tail int64
closeWrite bool
closeErr error
}
Key Methods:
| Method | Signature | Description |
|---|---|---|
RingN | func RingN[T any](size int) *RingBuffer[T] | Create with size |
Write | (rb *RingBuffer[T]) Write(p []T) (int, error) | Write (overwrites oldest) |
Add | (rb *RingBuffer[T]) Add(t T) error | Add single (overwrites) |
BytesBuffer Interface
Common interface for byte buffers:
type BytesBuffer interface {
Write(p []byte) (n int, err error)
Read(p []byte) (n int, err error)
Discard(n int) (err error)
Close() error
CloseWrite() error
CloseWithError(err error) error
Error() error
Reset()
Bytes() []byte
Len() int
}
Convenience Functions
func Bytes16KB() *BlockBuffer[byte] // 16KB blocking buffer
func Bytes4KB() *BlockBuffer[byte] // 4KB blocking buffer
func Bytes1KB() *BlockBuffer[byte] // 1KB blocking buffer
func Bytes256B() *BlockBuffer[byte] // 256B blocking buffer
func Bytes() *Buffer[byte] // 1KB growable buffer
func BytesRing(size int) *RingBuffer[byte] // Ring buffer
Error Handling
var ErrIteratorDone = errors.New("iterator done")
ErrIteratorDone: Returned byNext()when buffer is closed and emptyio.EOF: Returned byRead()when buffer is closed and emptyio.ErrClosedPipe: Default error for closed buffers
Usage Patterns
Producer-Consumer with BlockBuffer
buf := buffer.Bytes4KB()
// Producer goroutine
go func() {
for data := range source {
_, err := buf.Write(data)
if err != nil {
return
}
}
buf.CloseWrite()
}()
// Consumer goroutine
tmp := make([]byte, 1024)
for {
n, err := buf.Read(tmp)
if err == io.EOF {
break
}
process(tmp[:n])
}
Sliding Window with RingBuffer
buf := buffer.RingN[float64](100) // Keep last 100 samples
// Streaming producer
go func() {
for sample := range stream {
buf.Add(sample) // Overwrites oldest when full
}
buf.CloseWrite()
}()
// Periodic consumer
ticker := time.NewTicker(time.Second)
for range ticker.C {
samples := buf.Bytes() // Get current window
average := computeAverage(samples)
}
Iterator Pattern
buf := buffer.N[Event](100)
// Using Next() for iteration
for {
event, err := buf.Next()
if errors.Is(err, buffer.ErrIteratorDone) {
break
}
if err != nil {
log.Error(err)
break
}
handleEvent(event)
}
Implementation Details
Circular Buffer Arithmetic
BlockBuffer and RingBuffer use virtual counters for head/tail:
// Position in physical buffer
pos := head % int64(len(buf))
// Available data
available := tail - head
// Check if full (BlockBuffer only)
isFull := tail - head == int64(len(buf))
Notification Mechanism
- Buffer: Uses buffered channel
make(chan struct{}, 1)for non-blocking notification - BlockBuffer: Uses
sync.Condfor precise signal/broadcast control - RingBuffer: Uses buffered channel (same as Buffer)
Lock Patterns
All types use sync.Mutex with deferred unlock:
func (b *Buffer[T]) Read(p []T) (n int, err error) {
b.mu.Lock()
defer b.mu.Unlock()
// Wait loop with unlock/relock
for len(b.buf) == 0 {
if b.closeWrite {
return 0, io.EOF
}
b.mu.Unlock()
<-b.writeNotify // Wait for notification
b.mu.Lock()
// Re-check state after relock
}
// ... read logic
}
Buffer Package - Rust Implementation
Crate: giztoy-buffer
Types
Buffer
Growable buffer using VecDeque<T> for O(1) front operations.
#![allow(unused)] fn main() { pub struct Buffer<T> { inner: Arc<BufferInner<T>>, } struct BufferInner<T> { state: Mutex<BufferState<T>>, write_notify: Condvar, } struct BufferState<T> { buf: VecDeque<T>, close_write: bool, close_err: Option<Arc<dyn Error + Send + Sync>>, } }
Key Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new() -> Self | Create empty buffer |
with_capacity | fn with_capacity(capacity: usize) -> Self | Create with capacity hint |
write | fn write(&self, data: &[T]) -> Result<usize, BufferError> | Append elements |
read | fn read(&self, buf: &mut [T]) -> Result<usize, BufferError> | Read elements (blocks) |
add | fn add(&self, item: T) -> Result<(), BufferError> | Add single element |
next | fn next(&self) -> Result<T, Done> | Pop from front (FIFO) |
to_vec | fn to_vec(&self) -> Vec<T> | Copy to Vec |
BlockBuffer
Fixed-size circular buffer with blocking semantics.
#![allow(unused)] fn main() { pub struct BlockBuffer<T> { inner: Arc<BlockBufferInner<T>>, } struct BlockBufferInner<T> { state: Mutex<BlockBufferState<T>>, not_full: Condvar, not_empty: Condvar, } struct BlockBufferState<T> { buf: Vec<Option<T>>, head: usize, tail: usize, count: usize, close_write: bool, close_err: Option<Arc<dyn Error + Send + Sync>>, } }
Key Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(capacity: usize) -> Self | Create with capacity |
from_vec | fn from_vec(data: Vec<T>) -> Self | Create from Vec (full) |
write | fn write(&self, data: &[T]) -> Result<usize, BufferError> | Write (blocks when full) |
read | fn read(&self, buf: &mut [T]) -> Result<usize, BufferError> | Read (blocks when empty) |
capacity | fn capacity(&self) -> usize | Get capacity |
is_full | fn is_full(&self) -> bool | Check if full |
RingBuffer
Fixed-size circular buffer with overwrite semantics.
#![allow(unused)] fn main() { pub struct RingBuffer<T> { inner: Arc<RingBufferInner<T>>, } struct RingBufferInner<T> { state: Mutex<RingBufferState<T>>, write_notify: Condvar, } struct RingBufferState<T> { buf: Vec<Option<T>>, head: usize, // virtual counter (wraps) tail: usize, // virtual counter (wraps) close_write: bool, close_err: Option<Arc<dyn Error + Send + Sync>>, } }
Key Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(capacity: usize) -> Self | Create with capacity |
write | fn write(&self, data: &[T]) -> Result<usize, BufferError> | Write (overwrites oldest) |
add | fn add(&self, item: T) -> Result<(), BufferError> | Add single (overwrites) |
Error Types
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub enum BufferError { Closed, ClosedWithError(Arc<dyn Error + Send + Sync>), } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub struct Done; }
Convenience Functions
#![allow(unused)] fn main() { // Growable buffers pub fn bytes() -> Buffer<u8> // 1KB pub fn bytes_1kb() -> Buffer<u8> // 1KB pub fn bytes_4kb() -> Buffer<u8> // 4KB pub fn bytes_16kb() -> Buffer<u8> // 16KB pub fn bytes_64kb() -> Buffer<u8> // 64KB pub fn bytes_256b() -> Buffer<u8> // 256B // Blocking buffers pub fn block_bytes() -> BlockBuffer<u8> // 1KB pub fn block_bytes_1kb() -> BlockBuffer<u8> // 1KB pub fn block_bytes_4kb() -> BlockBuffer<u8> // 4KB pub fn block_bytes_16kb() -> BlockBuffer<u8> // 16KB pub fn block_bytes_64kb() -> BlockBuffer<u8> // 64KB // Ring buffers pub fn ring_bytes(size: usize) -> RingBuffer<u8> pub fn ring_bytes_1kb() -> RingBuffer<u8> // 1KB pub fn ring_bytes_4kb() -> RingBuffer<u8> // 4KB pub fn ring_bytes_16kb() -> RingBuffer<u8> // 16KB pub fn ring_bytes_64kb() -> RingBuffer<u8> // 64KB }
Thread Safety
All types implement Send + Sync and support Clone:
#![allow(unused)] fn main() { // Clone shares the underlying buffer via Arc let buf = Buffer::<i32>::new(); let buf_clone = buf.clone(); // Same underlying buffer // Safe to send to other threads std::thread::spawn(move || { buf_clone.add(42).unwrap(); }); }
Usage Patterns
Producer-Consumer
#![allow(unused)] fn main() { use giztoy_buffer::{BlockBuffer, Done}; use std::thread; let buf = BlockBuffer::<i32>::new(4); let producer_buf = buf.clone(); let producer = thread::spawn(move || { for i in 0..100 { producer_buf.add(i).unwrap(); } producer_buf.close_write().unwrap(); }); let mut collected = Vec::new(); loop { match buf.next() { Ok(item) => collected.push(item), Err(Done) => break, } } producer.join().unwrap(); }
Sliding Window
#![allow(unused)] fn main() { use giztoy_buffer::RingBuffer; let buf = RingBuffer::<f32>::new(100); // Write more than capacity - old data overwritten for i in 0..200 { buf.add(i as f32).unwrap(); } // Buffer contains only last 100 values assert_eq!(buf.len(), 100); let window = buf.to_vec(); // [100.0, 101.0, ..., 199.0] }
Implementation Details
VecDeque vs Vec
- Buffer: Uses
VecDeque<T>for O(1)pop_front() - BlockBuffer/RingBuffer: Use
Vec<Option<T>>for circular buffer
Wrapping Arithmetic
RingBuffer uses wrapping_add for counters to handle overflow:
#![allow(unused)] fn main() { state.tail = state.tail.wrapping_add(1); if state.tail.wrapping_sub(state.head) > capacity { state.head = state.head.wrapping_add(1); } }
Dual Condvar Pattern (BlockBuffer)
BlockBuffer uses two Condvars for precise signaling:
#![allow(unused)] fn main() { not_full: Condvar, // Signals writers when space available not_empty: Condvar, // Signals readers when data available }
Differences from Go Implementation
| Aspect | Go | Rust |
|---|---|---|
| Internal storage | []T slice | Vec<Option<T>> or VecDeque<T> |
Buffer.Next() | LIFO (pops from end) | FIFO (pops from front) |
Bytes() / to_vec() | Returns internal slice | Returns copy |
| Cloning | Not supported | Via Arc (shared) |
| Error type | error interface | BufferError enum |
| Default impl | Via interface | Via Default trait |
Buffer Package - Known Issues
🟠 Major Issues
BUF-001: Go Buffer.Next() uses LIFO instead of FIFO
File: go/pkg/buffer/buffer.go:259-262
Description:
The Next() method reads from the END of the buffer (LIFO behavior), while Read() reads from the front (FIFO). This inconsistency is confusing and likely unintentional.
// Current implementation (LIFO)
head := len(b.buf) - 1
t = b.buf[head]
b.buf = b.buf[:head]
Expected: Should read from b.buf[0] for FIFO consistency with Read().
Impact: Users expecting iterator-style sequential access get reversed order.
Status: ⚠️ Documented in code comment but should be fixed.
BUF-002: Go Buffer.Add() missing write notification
File: go/pkg/buffer/buffer.go:280-291
Description:
The Add() method appends an element but does NOT send a notification on writeNotify. If a reader is blocked waiting and only Add() is used for writing, the reader may block indefinitely.
func (b *Buffer[T]) Add(t T) error {
// ... error checks ...
b.buf = append(b.buf, t)
return nil // Missing: select { case b.writeNotify <- struct{}{}: default: }
}
Impact: Potential deadlock when using Add() exclusively.
Status: 🔴 Bug - needs fix.
🟡 Minor Issues
BUF-003: Go Buffer.Bytes() returns internal slice reference
File: go/pkg/buffer/buffer.go:335-339
Description:
Bytes() returns the internal slice directly, not a copy. Modifications to the returned slice will corrupt the buffer state.
func (b *Buffer[T]) Bytes() []T {
b.mu.Lock()
defer b.mu.Unlock()
return b.buf // Returns internal reference!
}
Impact: Data corruption if caller modifies the returned slice.
Workaround: Document clearly or change to return a copy.
BUF-004: Go BlockBuffer.Bytes() inconsistent copy behavior
File: go/pkg/buffer/block_buffer.go:356-365
Description:
Documentation says "returned slice is a copy" but when h < t, it returns a subslice of the internal buffer directly:
if h < t {
return bb.buf[h:t] // Not a copy!
}
return slices.Concat(bb.buf[h:], bb.buf[:t]) // This is a copy
Impact: Inconsistent behavior depending on buffer state.
BUF-005: Go RingBuffer.Bytes() same issue as BUF-004
File: go/pkg/buffer/ring_buffer.go:306-315
Description:
Same inconsistent copy behavior as BlockBuffer.
🔵 Enhancements
BUF-006: Rust BlockBuffer uses Vec<Option> overhead
File: rust/buffer/src/block_buffer.rs:82
Description:
Rust implementation uses Vec<Option<T>> which adds memory overhead (size of discriminant per element) compared to Go's direct slice approach.
Suggestion: Consider using MaybeUninit<T> with careful initialization tracking for zero-cost abstraction.
BUF-007: Go/Rust Buffer.Next() semantic difference
Description:
- Go:
Next()is LIFO (pops from end) - Rust:
next()is FIFO (pops from front via VecDeque)
This API inconsistency could cause bugs when porting code between languages.
Suggestion: Align Go implementation to match Rust (FIFO).
⚪ Notes
BUF-008: No io.Reader/io.Writer implementation in Rust
Description:
Go buffers implement io.Reader and io.Writer interfaces. Rust buffers don't implement std::io::Read and std::io::Write traits.
Reason: Rust buffers are generic over T: Clone, not just bytes.
Suggestion: Add byte-specific wrapper types that implement std::io traits.
BUF-009: Missing Bytes() equivalent in Go BytesBuffer interface
File: go/pkg/buffer/bytes.go:12-23
Description:
The BytesBuffer interface includes Bytes() []byte but this is dangerous given BUF-003/004/005.
Suggestion: Consider removing from interface or ensuring all implementations return copies.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| BUF-001 | 🟠 Major | Open | Go Buffer |
| BUF-002 | 🟠 Major | Open | Go Buffer |
| BUF-003 | 🟡 Minor | Open | Go Buffer |
| BUF-004 | 🟡 Minor | Open | Go BlockBuffer |
| BUF-005 | 🟡 Minor | Open | Go RingBuffer |
| BUF-006 | 🔵 Enhancement | Open | Rust BlockBuffer |
| BUF-007 | 🔵 Enhancement | Open | Go/Rust parity |
| BUF-008 | ⚪ Note | N/A | Rust |
| BUF-009 | ⚪ Note | N/A | Go Interface |
Encoding Package
JSON-serializable encoding types for binary data.
Design Goals
- Seamless JSON Integration: Binary data that automatically serializes to human-readable formats
- Type Safety: Distinct types for different encodings prevent mixing
- Zero-Copy Where Possible: Minimal allocations during serialization
Types
| Type | Encoding | JSON Example | Use Case |
|---|---|---|---|
StdBase64Data | Standard Base64 | "aGVsbG8=" | Binary payloads, files |
HexData | Hexadecimal | "deadbeef" | Hashes, IDs, debugging |
Features
JSON Serialization
Both types implement JSON marshal/unmarshal:
{
"payload": "aGVsbG8gd29ybGQ=",
"hash": "a1b2c3d4"
}
Null Handling
- JSON
nulldeserializes to empty/nil slice - Empty string
""deserializes to empty slice
String Representation
Both types implement String() / Display for easy logging:
StdBase64Data("hello") -> "aGVsbG8="
HexData([0xde, 0xad]) -> "dead"
Use Cases
API Payloads
Many APIs return binary data as Base64-encoded JSON strings:
{
"audio_data": "UklGRi4AAABXQVZFZm10IBAAAAABAAEA..."
}
Hash Values
Cryptographic hashes are typically represented as hex:
{
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
Binary Protocol Debugging
Hex encoding is useful for debugging binary protocols:
{
"raw_frame": "0102030405"
}
Examples Directory
examples/go/encoding/- Go usage examples (if any)examples/rust/encoding/- Rust usage examples (if any)
Related Packages
minimax- Uses Base64 for audio data in API responsesdoubaospeech- Uses Base64 for audio payloadsdashscope- Uses Base64 for binary data
Encoding Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/encoding
Types
StdBase64Data
type StdBase64Data []byte
A byte slice that serializes to/from standard Base64 in JSON.
Methods:
| Method | Signature | Description |
|---|---|---|
MarshalJSON | () ([]byte, error) | Encode to JSON Base64 string |
UnmarshalJSON | (data []byte) error | Decode from JSON Base64 string |
String | () string | Return Base64-encoded string |
HexData
type HexData []byte
A byte slice that serializes to/from hexadecimal in JSON.
Methods:
| Method | Signature | Description |
|---|---|---|
MarshalJSON | () ([]byte, error) | Encode to JSON hex string |
UnmarshalJSON | (data []byte) error | Decode from JSON hex string |
String | () string | Return hex-encoded string |
Usage
In Struct Fields
type Message struct {
ID string `json:"id"`
Payload StdBase64Data `json:"payload"`
Hash HexData `json:"hash"`
}
msg := Message{
ID: "msg-123",
Payload: StdBase64Data([]byte("hello world")),
Hash: HexData([]byte{0xab, 0xcd, 0xef}),
}
// Marshals to:
// {"id":"msg-123","payload":"aGVsbG8gd29ybGQ=","hash":"abcdef"}
data, _ := json.Marshal(msg)
Standalone Encoding
// Base64
data := StdBase64Data([]byte("hello"))
fmt.Println(data.String()) // "aGVsbG8="
// Hex
hash := HexData([]byte{0xde, 0xad})
fmt.Println(hash.String()) // "dead"
Null Handling
var data StdBase64Data
json.Unmarshal([]byte(`null`), &data) // data is nil
json.Unmarshal([]byte(`""`), &data) // data is []byte{}
Implementation Details
UnmarshalJSON Logic
Both types handle multiple JSON input types:
func (b *StdBase64Data) UnmarshalJSON(data []byte) error {
switch data[0] {
case 'n': // null
return nil
case '"': // string
// decode Base64
default:
return error
}
}
Direct Slice Alias
Go implementation uses direct type alias type StdBase64Data []byte, which means:
- No wrapper overhead
- Can be cast directly to/from
[]byte - Shares underlying array with original slice
Dependencies
encoding/base64(stdlib)encoding/hex(stdlib)
Encoding Package - Rust Implementation
Crate: giztoy-encoding
Types
StdBase64Data
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq, Hash, Default)] pub struct StdBase64Data(Vec<u8>); }
A newtype wrapper around Vec<u8> that serializes to/from standard Base64.
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(data: Vec<u8>) -> Self | Create from Vec |
empty | fn empty() -> Self | Create empty |
as_bytes | fn as_bytes(&self) -> &[u8] | Get byte slice reference |
as_bytes_mut | fn as_bytes_mut(&mut self) -> &mut Vec<u8> | Get mutable reference |
into_bytes | fn into_bytes(self) -> Vec<u8> | Consume and return Vec |
is_empty | fn is_empty(&self) -> bool | Check if empty |
len | fn len(&self) -> usize | Get length |
encode | fn encode(&self) -> String | Encode to Base64 string |
decode | fn decode(s: &str) -> Result<Self, DecodeError> | Decode from Base64 |
Trait Implementations:
Serialize/Deserialize(serde)Display(formats as Base64)Deref<Target=[u8]>/DerefMutFrom<Vec<u8>>,From<&[u8]>,From<[u8; N]>AsRef<[u8]>
HexData
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq, Hash, Default)] pub struct HexData(Vec<u8>); }
A newtype wrapper around Vec<u8> that serializes to/from hexadecimal.
Methods:
Same API as StdBase64Data, but with hex encoding:
| Method | Signature | Description |
|---|---|---|
encode | fn encode(&self) -> String | Encode to hex string |
decode | fn decode(s: &str) -> Result<Self, FromHexError> | Decode from hex |
Usage
In Struct Fields
#![allow(unused)] fn main() { use giztoy_encoding::{StdBase64Data, HexData}; use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize)] struct Message { id: String, payload: StdBase64Data, hash: HexData, } let msg = Message { id: "msg-123".to_string(), payload: StdBase64Data::from(b"hello world".as_slice()), hash: HexData::from(vec![0xab, 0xcd, 0xef]), }; // Serializes to: // {"id":"msg-123","payload":"aGVsbG8gd29ybGQ=","hash":"abcdef"} let json = serde_json::to_string(&msg).unwrap(); }
Standalone Encoding
#![allow(unused)] fn main() { // Base64 let data = StdBase64Data::from(b"hello".as_slice()); println!("{}", data); // "aGVsbG8=" println!("{}", data.encode()); // "aGVsbG8=" // Hex let hash = HexData::from(vec![0xde, 0xad]); println!("{}", hash); // "dead" }
Deref Coercion
#![allow(unused)] fn main() { let data = StdBase64Data::from(vec![1, 2, 3]); // Can use as &[u8] directly fn process(bytes: &[u8]) { /* ... */ } process(&data); // Deref coercion // Access slice methods println!("len: {}", data.len()); println!("first: {:?}", data.first()); }
Null Handling
#![allow(unused)] fn main() { // Null deserializes to empty let data: StdBase64Data = serde_json::from_str("null").unwrap(); assert!(data.is_empty()); // Empty string also empty let data: StdBase64Data = serde_json::from_str(r#""""#).unwrap(); assert!(data.is_empty()); }
Dependencies
base64crate (for Base64 encoding)hexcrate (for hex encoding)serdecrate (for serialization)
Differences from Go
| Aspect | Go | Rust |
|---|---|---|
| Type structure | Type alias []byte | Newtype struct(Vec<u8>) |
| Conversion to bytes | Direct cast | .as_bytes() or Deref |
| Additional methods | None | is_empty(), len(), encode(), decode() |
| Hash/Eq traits | N/A (slice) | Implemented |
| Clone | Implicit | Explicit (implemented) |
Encoding Package - Known Issues
⚪ Notes
ENC-001: Go/Rust type structure difference
Description:
Go uses type alias (type StdBase64Data []byte) while Rust uses newtype wrapper (struct StdBase64Data(Vec<u8>)).
Impact:
- Go: Direct cast to
[]byte, shares memory - Rust: Requires
.as_bytes()or deref coercion, owns memory
Status: By design - idiomatic in each language.
ENC-002: Rust has more utility methods
Description:
Rust implementation has additional methods not present in Go:
is_empty()/len()encode()/decode()(standalone, not just JSON)empty()constructoras_bytes_mut()for mutation
Suggestion: Consider adding these to Go for parity.
ENC-003: Error handling difference
Description:
- Go: Returns
erroron unmarshal failure - Rust: Returns
Result<T, E>with specific error types (base64::DecodeError,hex::FromHexError)
Impact: Different error inspection patterns in each language.
🔵 Enhancements
ENC-004: Missing URL-safe Base64 variant
Description:
Only standard Base64 is implemented. URL-safe Base64 (base64.URLEncoding / URL_SAFE) is commonly needed for:
- JWT tokens
- URL parameters
- Filename-safe identifiers
Suggestion: Add UrlBase64Data type.
ENC-005: No raw Base64 (no padding) variant
Description:
Some APIs use raw Base64 without = padding. Neither implementation supports this variant.
Suggestion: Add RawBase64Data or add encoding options.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| ENC-001 | ⚪ Note | By design | Go/Rust |
| ENC-002 | ⚪ Note | Open | Go |
| ENC-003 | ⚪ Note | By design | Go/Rust |
| ENC-004 | 🔵 Enhancement | Open | Both |
| ENC-005 | 🔵 Enhancement | Open | Both |
Overall: Clean implementation with no bugs found. Minor parity differences between Go and Rust.
JsonTime Package
JSON-serializable time types for API integrations.
Design Goals
- API Compatibility: Many APIs use Unix timestamps instead of ISO 8601 strings
- Type Safety: Distinct types prevent mixing seconds/milliseconds
- Bidirectional: Both serialization and deserialization supported
Types
| Type | JSON Format | Example | Use Case |
|---|---|---|---|
Unix | Integer (seconds) | 1705315800 | General timestamps |
Milli | Integer (milliseconds) | 1705315800000 | High-precision timestamps |
Duration | String or Integer | "1h30m" or 5400000000000 | Time intervals |
Features
Unix Timestamps
Many APIs use Unix epoch timestamps rather than ISO 8601:
{
"created_at": 1705315800,
"updated_at": 1705316000
}
Millisecond Precision
JavaScript/browser APIs often use milliseconds:
{
"timestamp": 1705315800000,
"expires_at": 1705316000000
}
Flexible Duration Parsing
Duration supports both human-readable strings and raw nanoseconds:
{
"timeout": "30s",
"interval": "1h30m",
"precise_delay": 5000000000
}
Time Operations
Both Unix and Milli support common time operations:
| Operation | Description |
|---|---|
Before(t) | Is this time before t? |
After(t) | Is this time after t? |
Equal(t) | Are these times equal? |
Sub(t) | Duration between times |
Add(d) | Add duration to time |
IsZero() | Is this the zero time? |
Duration String Format
The Duration type uses Go-style duration strings:
| Unit | Symbol | Example |
|---|---|---|
| Hours | h | 2h |
| Minutes | m | 30m |
| Seconds | s | 45s |
| Combined | 1h30m45s |
Examples Directory
examples/go/jsontime/- Go usage examples (if any)examples/rust/jsontime/- Rust usage examples (if any)
Related Packages
minimax- Uses Milli for timestamps in API responsesdoubaospeech- Uses Unix/Milli for audio timestampsdashscope- Uses Duration for timeout configuration
JsonTime Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/jsontime
Types
Unix
type Unix time.Time
A time.Time that serializes to/from Unix seconds in JSON.
Methods:
| Method | Signature | Description |
|---|---|---|
NowEpoch | func NowEpoch() Unix | Current time as Unix |
Time | (ep Unix) Time() time.Time | Get underlying time.Time |
Before | (ep Unix) Before(t Unix) bool | Is ep before t? |
After | (ep Unix) After(t Unix) bool | Is ep after t? |
Equal | (ep Unix) Equal(t Unix) bool | Are times equal? |
Sub | (ep Unix) Sub(t Unix) time.Duration | Duration ep-t |
Add | (ep Unix) Add(d time.Duration) Unix | Return ep+d |
IsZero | (ep Unix) IsZero() bool | Is zero time? |
String | (ep Unix) String() string | Formatted string |
Milli
type Milli time.Time
A time.Time that serializes to/from Unix milliseconds in JSON.
Methods: Same as Unix.
| Method | Signature | Description |
|---|---|---|
NowEpochMilli | func NowEpochMilli() Milli | Current time as Milli |
Time | (ep Milli) Time() time.Time | Get underlying time.Time |
| ... | (same operations as Unix) |
Duration
type Duration time.Duration
A time.Duration that serializes to string (e.g., "1h30m") in JSON.
Methods:
| Method | Signature | Description |
|---|---|---|
FromDuration | func FromDuration(d time.Duration) *Duration | Create Duration pointer |
Duration | (d *Duration) Duration() time.Duration | Get underlying duration |
String | (d Duration) String() string | Formatted string (e.g., "1h30m") |
Seconds | (d Duration) Seconds() float64 | As floating point seconds |
Milliseconds | (d Duration) Milliseconds() int64 | As integer milliseconds |
Usage
In Struct Fields
type Event struct {
ID string `json:"id"`
CreatedAt Unix `json:"created_at"`
ExpiresAt Milli `json:"expires_at"`
Timeout Duration `json:"timeout"`
}
event := Event{
ID: "evt-123",
CreatedAt: NowEpoch(),
ExpiresAt: NowEpochMilli(),
Timeout: Duration(30 * time.Second),
}
// Marshals to:
// {"id":"evt-123","created_at":1705315800,"expires_at":1705315800000,"timeout":"30s"}
Duration Parsing
Duration accepts both string and integer (nanoseconds) when unmarshaling:
type Config struct {
Timeout Duration `json:"timeout"`
}
// String format
json.Unmarshal([]byte(`{"timeout":"1h30m"}`), &cfg)
fmt.Println(cfg.Timeout.Duration()) // 1h30m0s
// Integer format (nanoseconds)
json.Unmarshal([]byte(`{"timeout":5400000000000}`), &cfg)
fmt.Println(cfg.Timeout.Duration()) // 1h30m0s
Time Arithmetic
now := NowEpoch()
later := now.Add(24 * time.Hour)
if later.After(now) {
diff := later.Sub(now)
fmt.Println(diff) // 24h0m0s
}
Null Handling
var d Duration
json.Unmarshal([]byte(`null`), &d) // d remains zero value
Implementation Details
Type Aliases
All types are direct aliases, allowing easy conversion:
// Unix -> time.Time
t := time.Time(myUnix)
// time.Time -> Unix
u := Unix(time.Now())
// Duration -> time.Duration
d := time.Duration(myDuration)
JSON Marshal Output
| Type | Go Value | JSON Output |
|---|---|---|
| Unix | Unix(time.Now()) | 1705315800 |
| Milli | Milli(time.Now()) | 1705315800000 |
| Duration | Duration(90*time.Second) | "1m30s" |
Dependencies
time(stdlib)encoding/json(stdlib)
JsonTime Package - Rust Implementation
Crate: giztoy-jsontime
Types
Unix
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Unix(DateTime<Utc>); }
A timestamp that serializes to/from Unix seconds in JSON.
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(dt: DateTime<Utc>) -> Self | Create from DateTime |
now | fn now() -> Self | Current time |
from_secs | fn from_secs(secs: i64) -> Self | Create from seconds |
as_secs | fn as_secs(&self) -> i64 | Get seconds value |
datetime | fn datetime(&self) -> DateTime<Utc> | Get underlying DateTime |
before | fn before(&self, other: &Self) -> bool | Is this before other? |
after | fn after(&self, other: &Self) -> bool | Is this after other? |
is_zero | fn is_zero(&self) -> bool | Is zero time? |
sub | fn sub(&self, other: &Self) -> Duration | Duration between times |
add | fn add(&self, d: Duration) -> Self | Return self+d |
Trait Implementations:
Serialize/Deserialize(serde)DisplayFrom<DateTime<Utc>>,From<i64>PartialOrd,Ord,Hash
Milli
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Milli(DateTime<Utc>); }
A timestamp that serializes to/from Unix milliseconds in JSON.
Methods: Same as Unix, but with milliseconds.
| Method | Signature | Description |
|---|---|---|
from_millis | fn from_millis(ms: i64) -> Self | Create from milliseconds |
as_millis | fn as_millis(&self) -> i64 | Get milliseconds value |
| ... | (same operations as Unix) |
Duration
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] pub struct Duration(StdDuration); }
A duration that serializes to string (e.g., "1h30m") and deserializes from string or nanoseconds.
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(d: StdDuration) -> Self | Create from std Duration |
from_secs | fn from_secs(secs: u64) -> Self | Create from seconds |
from_millis | fn from_millis(ms: u64) -> Self | Create from milliseconds |
from_nanos | fn from_nanos(nanos: u64) -> Self | Create from nanoseconds |
as_std | fn as_std(&self) -> StdDuration | Get std Duration |
as_secs | fn as_secs(&self) -> u64 | Get whole seconds |
as_secs_f64 | fn as_secs_f64(&self) -> f64 | Get floating seconds |
as_millis | fn as_millis(&self) -> u128 | Get milliseconds |
as_nanos | fn as_nanos(&self) -> u128 | Get nanoseconds |
is_zero | fn is_zero(&self) -> bool | Is zero duration? |
Usage
In Struct Fields
#![allow(unused)] fn main() { use giztoy_jsontime::{Unix, Milli, Duration}; use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize)] struct Event { id: String, created_at: Unix, expires_at: Milli, timeout: Duration, } let event = Event { id: "evt-123".to_string(), created_at: Unix::now(), expires_at: Milli::now(), timeout: Duration::from_secs(30), }; // Serializes to: // {"id":"evt-123","created_at":1705315800,"expires_at":1705315800000,"timeout":"30s"} }
Duration Parsing
#![allow(unused)] fn main() { use giztoy_jsontime::Duration; // String format let d: Duration = serde_json::from_str(r#""1h30m""#).unwrap(); assert_eq!(d.as_secs(), 5400); // Integer format (nanoseconds) let d: Duration = serde_json::from_str("5400000000000").unwrap(); assert_eq!(d.as_secs(), 5400); }
Time Arithmetic
#![allow(unused)] fn main() { use giztoy_jsontime::Unix; use std::time::Duration; let now = Unix::now(); let later = now.add(Duration::from_secs(86400)); if later.after(&now) { let diff = later.sub(&now); println!("{:?}", diff); // 86400s } }
From Conversions
#![allow(unused)] fn main() { // From i64 let unix = Unix::from(1705315800i64); let milli = Milli::from(1705315800000i64); // From DateTime<Utc> let unix = Unix::from(Utc::now()); // From std::time::Duration let dur = Duration::from(std::time::Duration::from_secs(60)); }
Duration String Format
The parser supports Go-style duration strings:
| Input | Parsed As |
|---|---|
"1h" | 3600 seconds |
"30m" | 1800 seconds |
"45s" | 45 seconds |
"1h30m" | 5400 seconds |
"1h30m45s" | 5445 seconds |
"" | 0 seconds |
Dependencies
chronocrate (for DateTime handling)serdecrate (for serialization)
Differences from Go
| Aspect | Go | Rust |
|---|---|---|
| Time type | Type alias time.Time | Newtype over DateTime<Utc> |
| Duration range | Signed (int64 ns) | Unsigned (u64 + u32 ns) |
| Ordering | Via method calls | Via Ord trait |
| Hash support | N/A | Implemented |
| sub() return | Signed duration | Unsigned duration |
JsonTime Package - Known Issues
🟡 Minor Issues
JT-001: Rust Milli.sub() loses sign information
File: rust/jsontime/src/milli.rs:54-57
Description:
The sub() method returns an unsigned Duration, losing the sign when the result would be negative:
#![allow(unused)] fn main() { pub fn sub(&self, other: &Self) -> Duration { let diff = self.0.signed_duration_since(other.0); Duration::from_millis(diff.num_milliseconds().unsigned_abs()) } }
Impact: Cannot determine if self is before or after other from the result alone.
Workaround: Use before() or after() methods to check ordering first.
JT-002: Rust Unix.sub() same issue
File: rust/jsontime/src/unix.rs:54-57
Description:
Same issue as JT-001 - loses sign information.
JT-003: Rust Duration parsing more restrictive than Go
File: rust/jsontime/src/duration.rs:123-158
Description:
Go's time.ParseDuration supports more units:
ns(nanoseconds)us/µs(microseconds)ms(milliseconds)
Rust implementation only supports h, m, s.
Impact: Duration strings with sub-second units fail to parse in Rust.
Example:
// Go - works
d, _ := time.ParseDuration("100ms")
// Rust - fails
let d: Duration = serde_json::from_str(r#""100ms""#)?; // Error!
JT-004: Rust Duration cannot be negative
Description:
Go's time.Duration is signed (int64), Rust's std::time::Duration is unsigned.
Impact: Cannot represent negative durations in Rust.
Status: By design (Rust stdlib limitation).
🔵 Enhancements
JT-005: Missing microsecond timestamp type
Description:
Some APIs (particularly high-frequency systems) use microsecond timestamps. Neither Go nor Rust implementation provides a Micro type.
Suggestion: Add Micro type for microsecond precision.
JT-006: Missing nanosecond timestamp type
Description:
Some APIs use nanosecond timestamps. No Nano type provided.
Suggestion: Add Nano type for nanosecond precision.
JT-007: Go Duration lacks explicit constructors
Description:
Go implementation lacks explicit constructors like Rust has:
from_secs()from_millis()
Current Go usage:
d := Duration(30 * time.Second)
Suggested addition:
func DurationFromSeconds(s int64) Duration
func DurationFromMillis(ms int64) Duration
⚪ Notes
JT-008: Different underlying time libraries
Description:
- Go: Uses stdlib
time.Time - Rust: Uses
chrono::DateTime<Utc>
Impact: Rust has hard dependency on chrono crate.
JT-009: Rust types implement more traits
Description:
Rust types implement PartialOrd, Ord, Hash which enables use in collections:
#![allow(unused)] fn main() { use std::collections::HashSet; let mut times: HashSet<Unix> = HashSet::new(); times.insert(Unix::now()); }
Go types don't have equivalent functionality.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| JT-001 | 🟡 Minor | Open | Rust Milli |
| JT-002 | 🟡 Minor | Open | Rust Unix |
| JT-003 | 🟡 Minor | Open | Rust Duration |
| JT-004 | 🟡 Minor | By design | Rust Duration |
| JT-005 | 🔵 Enhancement | Open | Both |
| JT-006 | 🔵 Enhancement | Open | Both |
| JT-007 | 🔵 Enhancement | Open | Go |
| JT-008 | ⚪ Note | N/A | Rust |
| JT-009 | ⚪ Note | N/A | Rust |
Overall: Functional implementation. Main concern is duration parsing parity between Go and Rust.
Trie Package
Generic trie data structure for efficient path-based storage and retrieval with MQTT-style wildcard support.
Design Goals
- Efficient Path Matching: O(k) lookup where k is path depth
- Wildcard Support: MQTT-style single (
+) and multi-level (#) wildcards - Generic Storage: Store any value type at path nodes
- Zero-Copy Lookups: Minimize allocations during get operations
Wildcard Patterns
The trie supports MQTT-style topic patterns:
| Pattern | Description | Example Match |
|---|---|---|
a/b/c | Exact path | a/b/c only |
a/+/c | Single-level wildcard | a/X/c, a/Y/c |
a/# | Multi-level wildcard | a/b, a/b/c/d |
Pattern Rules
-
+(Plus): Matches exactly one path segmentdevice/+/statematchesdevice/gear-001/state- Does NOT match
device/gear-001/sub/state
-
#(Hash): Matches zero or more path segments- Must be the last segment in the pattern
logs/#matcheslogs,logs/app,logs/app/debug/line1
-
Priority: Exact matches take precedence over wildcards
- If both
device/gear-001/stateanddevice/+/stateexist, exact wins
- If both
Use Cases
MQTT Topic Routing
device/+/state -> state_handler
device/+/command -> command_handler
logs/# -> log_handler
API Path Routing
/users/{id}/profile -> profile_handler
/users/{id}/posts -> posts_handler
/admin/# -> admin_handler
Hierarchical Configuration
app/database/host -> "localhost"
app/database/port -> 5432
app/cache/# -> cache_config
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Set | O(k) | k = path depth |
| Get | O(k) | Zero allocation in Rust |
| Walk | O(n) | n = total nodes |
| Len | O(n) | Counts all values |
Examples Directory
examples/go/trie/- Go usage examples (if any)examples/rust/trie/- Rust usage examples (if any)
Related Packages
mqtt0- Uses trie for topic subscription matchingchatgear- Uses trie for message routing
Trie Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/trie
Types
Trie[T]
type Trie[T any] struct {
children map[string]*Trie[T] // exact path segment matches
matchAny *Trie[T] // single-level wildcard (+)
matchAll *Trie[T] // multi-level wildcard (#)
set bool // whether this node has a value
value T // the value stored
}
Methods:
| Method | Signature | Description |
|---|---|---|
New | func New[T any]() *Trie[T] | Create empty trie |
Set | (t *Trie[T]) Set(path string, setFunc func(*T, bool) error) error | Set with custom setter |
SetValue | (t *Trie[T]) SetValue(path string, value T) error | Set value directly |
Get | (t *Trie[T]) Get(path string) (*T, bool) | Get value pointer |
GetValue | (t *Trie[T]) GetValue(path string) (T, bool) | Get value copy |
Match | (t *Trie[T]) Match(path string) (route string, value *T, ok bool) | Get with matched route |
Walk | (t *Trie[T]) Walk(f func(path string, value T, set bool)) | Visit all nodes |
Len | (t *Trie[T]) Len() int | Count values |
String | (t *Trie[T]) String() string | Debug representation |
ErrInvalidPattern
var ErrInvalidPattern = errors.New("invalid path pattern...")
Returned when # wildcard is not at the end of the path.
Usage
Basic Set/Get
tr := trie.New[string]()
// Set exact path
tr.SetValue("device/gear-001/state", "online")
// Get value
val, ok := tr.GetValue("device/gear-001/state")
// val = "online", ok = true
Wildcard Patterns
tr := trie.New[string]()
// Single-level wildcard
tr.SetValue("device/+/state", "state_handler")
// Multi-level wildcard
tr.SetValue("logs/#", "log_handler")
// Match against patterns
val, _ := tr.GetValue("device/any-device/state")
// val = "state_handler"
val, _ = tr.GetValue("logs/app/debug/line1")
// val = "log_handler"
Custom Set Function
tr := trie.New[[]string]()
// Append to existing value
tr.Set("handlers/events", func(ptr *[]string, existed bool) error {
if !existed {
*ptr = []string{"handler1"}
} else {
*ptr = append(*ptr, "handler2")
}
return nil
})
Walk All Nodes
tr.Walk(func(path string, value string, set bool) {
if set {
fmt.Printf("%s: %s\n", path, value)
}
})
Match with Route
tr := trie.New[string]()
tr.SetValue("device/+/state", "handler")
route, value, ok := tr.Match("device/gear-001/state")
// route = "/+/state"
// value = "handler"
// ok = true
Implementation Details
Path Splitting
Paths are split by / and processed segment by segment:
// "device/gear-001/state" splits into:
// first="device", subseq="gear-001/state"
Value Storage
Uses a set boolean flag to distinguish between:
- Value not set (default zero value)
- Value explicitly set to zero value
Match Priority
- Exact child match
- Single-level wildcard (
+) - Multi-level wildcard (
#)
Benchmarks
Typical performance (from benchmarks):
| Operation | 100 paths | 1000 paths | 10000 paths |
|---|---|---|---|
| Set all | ~50µs | ~500µs | ~5ms |
| Get (exact) | ~10µs | ~100µs | ~1ms |
| Walk | ~5µs | ~50µs | ~500µs |
Trie Package - Rust Implementation
Crate: giztoy-trie
Types
Trie
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub struct Trie<T> { children: HashMap<String, Trie<T>>, match_any: Option<Box<Trie<T>>>, // single-level wildcard (+) match_all: Option<Box<Trie<T>>>, // multi-level wildcard (#) value: Option<T>, } }
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new() -> Self | Create empty trie |
set | fn set<F, E>(&mut self, path: &str, setter: F) -> Result<(), E> | Set with custom setter |
set_value | fn set_value(&mut self, path: &str, value: T) -> Result<(), InvalidPatternError> | Set value directly |
get | fn get(&self, path: &str) -> Option<&T> | Get value reference (zero-alloc) |
get_value | fn get_value(&self, path: &str) -> Option<T> | Get cloned value |
match_path | fn match_path(&self, path: &str) -> (String, Option<&T>) | Get with matched route |
walk | fn walk<F>(&self, f: F) | Visit all nodes |
len | fn len(&self) -> usize | Count values |
is_empty | fn is_empty(&self) -> bool | Check if empty |
Trait Implementations:
DefaultCloneDebugDisplay(whenT: Display)
InvalidPatternError
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq, thiserror::Error)] #[error("invalid path pattern: path should be /a/b/c or /a/+/c or /a/#")] pub struct InvalidPatternError; }
Usage
Basic Set/Get
#![allow(unused)] fn main() { use giztoy_trie::Trie; let mut trie = Trie::<String>::new(); // Set exact path trie.set_value("device/gear-001/state", "online".to_string()).unwrap(); // Get value (zero allocation) let val: Option<&String> = trie.get("device/gear-001/state"); // Get cloned value let val: Option<String> = trie.get_value("device/gear-001/state"); }
Wildcard Patterns
#![allow(unused)] fn main() { let mut trie = Trie::<String>::new(); // Single-level wildcard trie.set_value("device/+/state", "state_handler".to_string()).unwrap(); // Multi-level wildcard trie.set_value("logs/#", "log_handler".to_string()).unwrap(); // Match against patterns let val = trie.get("device/any-device/state"); assert_eq!(val, Some(&"state_handler".to_string())); let val = trie.get("logs/app/debug/line1"); assert_eq!(val, Some(&"log_handler".to_string())); }
Custom Set Function
#![allow(unused)] fn main() { let mut trie = Trie::<Vec<String>>::new(); // Set with custom logic trie.set("handlers/events", |existing| { match existing { Some(vec) => { vec.push("handler2".to_string()); Ok(vec.clone()) } None => Ok(vec!["handler1".to_string()]), } }).unwrap(); }
Walk All Nodes
#![allow(unused)] fn main() { trie.walk(|path, value| { println!("{}: {}", path, value); }); }
Match with Route
#![allow(unused)] fn main() { let mut trie = Trie::<String>::new(); trie.set_value("device/+/state", "handler".to_string()).unwrap(); let (route, value) = trie.match_path("device/gear-001/state"); // route = "/device/+/state" // value = Some(&"handler") }
Implementation Details
Zero-Allocation Lookup
The get() method performs zero allocations by:
- Using string slices for path splitting
- Returning references instead of cloned values
#![allow(unused)] fn main() { #[inline] fn split_path(path: &str) -> (&str, &str) { match path.find('/') { Some(idx) => (&path[..idx], &path[idx + 1..]), None => (path, ""), } } }
Value Storage
Uses Option<T> instead of a separate flag:
None= value not setSome(T)= value set
Wildcard Storage
match_any:Option<Box<Trie<T>>>for+wildcardmatch_all:Option<Box<Trie<T>>>for#wildcard
Boxed to avoid recursive type sizing issues.
Differences from Go
| Aspect | Go | Rust |
|---|---|---|
| Value storage | set bool + value T | Option<T> |
| Child storage | map[string]*Trie[T] | HashMap<String, Trie<T>> |
| Wildcard storage | *Trie[T] (pointer) | Option<Box<Trie<T>>> |
| Get return | (*T, bool) | Option<&T> |
| Clone support | Implicit (pointer) | Explicit Clone derive |
| Zero-alloc get | No (returns route string) | Yes (get() method) |
Trie Package - Known Issues
🟡 Minor Issues
TRI-001: Go Walk visits unset nodes
File: go/pkg/trie/trie.go:175-179
Description:
The Walk function visits ALL nodes including those without values set, passing the zero value:
func (t *Trie[T]) Walk(f func(path string, value T, set bool)) {
t.walkWithPath(nil, func(path []string, node *Trie[T]) {
f(strings.Join(path, "/"), node.value, node.set) // value may be zero
})
}
Impact: Callers must check the set boolean to filter actual values.
Suggestion: Consider only visiting nodes where set == true by default.
TRI-002: Go Len() is O(n) not O(1)
File: go/pkg/trie/trie.go:211-219
Description:
Len() walks the entire trie to count values:
func (t *Trie[T]) Len() int {
count := 0
t.Walk(func(_ string, _ T, set bool) {
if set {
count++
}
})
return count
}
Impact: Performance issue for large tries with frequent Len() calls.
Suggestion: Maintain a counter that increments on Set and decrements on Delete.
TRI-003: Rust Len() same O(n) issue
File: rust/trie/src/lib.rs:292-296
Description:
Same issue as Go - walks entire trie to count.
TRI-004: No Delete operation
Description:
Neither Go nor Rust implementation provides a way to delete/remove values from the trie.
Impact: Cannot remove stale subscriptions or routes without rebuilding.
Suggestion: Add Delete(path string) bool method.
TRI-005: Go Match returns route with leading slash inconsistency
File: go/pkg/trie/trie.go:142-172
Description:
When building the matched route string, it prepends "/" to each segment:
ch.match(matched+"/"+first, subseq) // Results in "/device/+/state"
But the root path returns empty string, creating inconsistency.
🔵 Enhancements
TRI-006: No thread safety
Description:
Neither implementation is thread-safe. Concurrent read/write will cause data races.
Go:
// UNSAFE: concurrent access
go trie.Set("a/b", value1)
go trie.Set("a/c", value2)
Suggestion: Add sync.RWMutex wrapper or document thread-safety requirements.
TRI-007: No path parameter extraction
Description:
When matching device/+/state against device/gear-001/state, there's no way to extract gear-001 as a parameter.
Current: Only returns the matched route pattern and value.
Suggestion: Add MatchParams(path) (params map[string]string, value *T, ok bool).
TRI-008: No prefix listing
Description:
Cannot list all paths under a prefix efficiently.
Example use case: List all devices under device/ prefix.
Suggestion: Add List(prefix string) []string method.
⚪ Notes
TRI-009: Different value storage approaches
Description:
- Go: Uses
set boolflag with zero value - Rust: Uses
Option<T>
Both approaches work but have different trade-offs:
- Go: Can distinguish "set to zero" vs "not set"
- Rust: More idiomatic, less memory overhead
TRI-010: Path leading slash handling
Description:
Paths can start with or without /:
"/a/b/c"and"a/b/c"are NOT equivalent- Leading
/creates an empty string segment
trie.SetValue("/a/b", "val1") // path segments: ["", "a", "b"]
trie.SetValue("a/b", "val2") // path segments: ["a", "b"]
Status: Documented behavior but may be confusing.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| TRI-001 | 🟡 Minor | Open | Go Walk |
| TRI-002 | 🟡 Minor | Open | Go Len |
| TRI-003 | 🟡 Minor | Open | Rust Len |
| TRI-004 | 🟡 Minor | Open | Both |
| TRI-005 | 🟡 Minor | Open | Go Match |
| TRI-006 | 🔵 Enhancement | Open | Both |
| TRI-007 | 🔵 Enhancement | Open | Both |
| TRI-008 | 🔵 Enhancement | Open | Both |
| TRI-009 | ⚪ Note | N/A | Both |
| TRI-010 | ⚪ Note | N/A | Both |
Overall: Solid implementation for basic trie operations. Missing Delete and thread-safety for production use.
CLI Package
Common CLI utilities for giztoy command-line tools.
Design Goals
- Consistent UX: Shared patterns across all giztoy CLI tools
- kubectl-style Contexts: Multiple API configurations with context switching
- Flexible Output: Support JSON, YAML, and raw output formats
- Cross-Platform Paths: Standard directory structure for config/cache/logs
Components
| Component | Description |
|---|---|
| Config | Multi-context configuration management |
| Output | Output formatting (JSON, YAML, raw) |
| Paths | Directory structure (~/.giztoy/ |
| Request | Load request data from YAML/JSON files |
| LogWriter | Capture logs for TUI display |
Directory Structure
graph LR
subgraph giztoy["~/.giztoy/"]
subgraph minimax["minimax/"]
m1[config.yaml]
m2[cache/]
m3[logs/]
m4[data/]
end
subgraph doubao["doubao/"]
d1[config.yaml]
d2[cache/]
end
subgraph dashscope["dashscope/"]
s1[...]
end
end
Configuration Format
current_context: production
contexts:
production:
name: production
api_key: "sk-..."
base_url: "https://api.example.com"
timeout: 30
extra:
region: "us-west"
development:
name: development
api_key: "sk-dev-..."
base_url: "https://dev-api.example.com"
Context System
Similar to kubectl, supports multiple API contexts:
# List contexts
myapp config list
# Use a context
myapp config use production
# Add a context
myapp config add staging --api-key=sk-...
# Delete a context
myapp config delete staging
Output Formats
| Format | Flag | Description |
|---|---|---|
| YAML | --output=yaml | Default, human-readable |
| JSON | --output=json | Machine-readable |
| Raw | --output=raw | Binary/raw data |
Use Cases
API CLI Tools
# minimax CLI
minimax chat "Hello" --context=production --output=json
# doubao CLI
doubao tts "Hello world" --output=audio.mp3
Configuration Management
# View current config
myapp config show
# Set default voice
myapp config set default_voice "zh-CN-Standard-A"
Examples Directory
go/cmd/minimax/- MiniMax CLI using this packagego/cmd/doubaospeech/- Doubao Speech CLIrust/cmd/minimax/- Rust MiniMax CLI
Related Packages
- Used by all CLI tools in the project
- Provides consistent user experience across Go and Rust implementations
CLI Package - Go Implementation
Import: github.com/haivivi/giztoy/pkg/cli
Types
Config
type Config struct {
AppName string `yaml:"-"`
CurrentContext string `yaml:"current_context,omitempty"`
Contexts map[string]*Context `yaml:"contexts,omitempty"`
}
Methods:
| Method | Signature | Description |
|---|---|---|
LoadConfig | func LoadConfig(appName string) (*Config, error) | Load from default path |
LoadConfigWithPath | func LoadConfigWithPath(appName, path string) (*Config, error) | Load from custom path |
Save | (c *Config) Save() error | Save to disk |
Path | (c *Config) Path() string | Get config file path |
Dir | (c *Config) Dir() string | Get config directory |
AddContext | (c *Config) AddContext(name string, ctx *Context) error | Add context |
DeleteContext | (c *Config) DeleteContext(name string) error | Delete context |
UseContext | (c *Config) UseContext(name string) error | Set current context |
GetContext | (c *Config) GetContext(name string) (*Context, error) | Get specific context |
GetCurrentContext | (c *Config) GetCurrentContext() (*Context, error) | Get current context |
ResolveContext | (c *Config) ResolveContext(name string) (*Context, error) | Resolve by name or current |
ListContexts | (c *Config) ListContexts() []string | List all context names |
Context
type Context struct {
Name string `yaml:"name"`
Client *ClientCredentials `yaml:"client,omitempty"`
Console *ConsoleCredentials `yaml:"console,omitempty"`
APIKey string `yaml:"api_key,omitempty"`
BaseURL string `yaml:"base_url,omitempty"`
Timeout int `yaml:"timeout,omitempty"`
MaxRetries int `yaml:"max_retries,omitempty"`
DefaultVoice string `yaml:"default_voice,omitempty"`
Extra map[string]string `yaml:"extra,omitempty"`
}
Output
type OutputFormat string
const (
FormatYAML OutputFormat = "yaml"
FormatJSON OutputFormat = "json"
FormatTable OutputFormat = "table"
FormatRaw OutputFormat = "raw"
)
type OutputOptions struct {
Format OutputFormat
File string
Indent string
Writer io.Writer
}
Functions:
| Function | Signature | Description |
|---|---|---|
Output | func Output(result any, opts OutputOptions) error | Write formatted output |
OutputBytes | func OutputBytes(data []byte, path string) error | Write binary data |
PrintSuccess | func PrintSuccess(format string, args ...any) | Print ✓ message |
PrintError | func PrintError(format string, args ...any) | Print error to stderr |
PrintInfo | func PrintInfo(format string, args ...any) | Print ℹ message |
PrintWarning | func PrintWarning(format string, args ...any) | Print ⚠ message |
PrintVerbose | func PrintVerbose(verbose bool, format string, args ...any) | Conditional verbose |
Paths
type Paths struct {
AppName string
HomeDir string
}
Methods:
| Method | Signature | Description |
|---|---|---|
NewPaths | func NewPaths(appName string) (*Paths, error) | Create paths instance |
BaseDir | (p *Paths) BaseDir() string | ~/.giztoy |
AppDir | (p *Paths) AppDir() string | ~/.giztoy/ |
ConfigFile | (p *Paths) ConfigFile() string | ~/.giztoy/ |
CacheDir | (p *Paths) CacheDir() string | ~/.giztoy/ |
LogDir | (p *Paths) LogDir() string | ~/.giztoy/ |
DataDir | (p *Paths) DataDir() string | ~/.giztoy/ |
EnsureAppDir | (p *Paths) EnsureAppDir() error | Create app dir |
CachePath | (p *Paths) CachePath(name string) string | Path in cache |
LogPath | (p *Paths) LogPath(name string) string | Path in logs |
DataPath | (p *Paths) DataPath(name string) string | Path in data |
Usage
Load Configuration
cfg, err := cli.LoadConfig("minimax")
if err != nil {
log.Fatal(err)
}
// Get current context
ctx, err := cfg.GetCurrentContext()
if err != nil {
log.Fatal(err)
}
fmt.Println("API Key:", cli.MaskAPIKey(ctx.APIKey))
Output Results
result := map[string]string{"status": "ok", "message": "done"}
// Output as JSON to stdout
cli.Output(result, cli.OutputOptions{
Format: cli.FormatJSON,
})
// Output as YAML to file
cli.Output(result, cli.OutputOptions{
Format: cli.FormatYAML,
File: "output.yaml",
})
Print Helpers
cli.PrintSuccess("Created context %q", "production")
cli.PrintError("Failed to connect: %v", err)
cli.PrintInfo("Using API endpoint: %s", baseURL)
cli.PrintWarning("Rate limit approaching")
cli.PrintVerbose(verbose, "Request: %+v", req)
Path Management
paths, _ := cli.NewPaths("minimax")
// Ensure directories exist
paths.EnsureCacheDir()
paths.EnsureLogDir()
// Get paths
cachePath := paths.CachePath("response.json")
logPath := paths.LogPath("2024-01-15.log")
Dependencies
github.com/goccy/go-yaml- YAML parsingencoding/json(stdlib) - JSON parsing
CLI Package - Rust Implementation
Crate: giztoy-cli
Types
Config
#![allow(unused)] fn main() { #[derive(Debug, Clone, Default, Serialize, Deserialize)] pub struct Config { #[serde(skip)] pub app_name: String, #[serde(default, skip_serializing_if = "String::is_empty")] pub current_context: String, #[serde(default, skip_serializing_if = "HashMap::is_empty")] pub contexts: HashMap<String, Context>, #[serde(skip)] config_path: PathBuf, } }
Methods:
| Method | Signature | Description |
|---|---|---|
default_config_dir | fn default_config_dir(app_name: &str) -> Option<PathBuf> | Get default config dir |
default_config_path | fn default_config_path(app_name: &str) -> Option<PathBuf> | Get default config path |
path | fn path(&self) -> &PathBuf | Get config file path |
dir | fn dir(&self) -> Option<&Path> | Get config directory |
save | fn save(&self) -> Result<()> | Save to disk |
add_context | fn add_context(&mut self, name: &str, ctx: Context) -> Result<()> | Add context |
delete_context | fn delete_context(&mut self, name: &str) -> Result<()> | Delete context |
use_context | fn use_context(&mut self, name: &str) -> Result<()> | Set current context |
get_context | fn get_context(&self, name: &str) -> Option<&Context> | Get specific context |
get_current_context | fn get_current_context(&self) -> Option<&Context> | Get current context |
resolve_context | fn resolve_context(&self, name: Option<&str>) -> Option<&Context> | Resolve by name or current |
list_contexts | fn list_contexts(&self) -> Vec<&str> | List all context names |
Free Functions:
| Function | Signature | Description |
|---|---|---|
load_config | fn load_config(app_name: &str, custom_path: Option<&str>) -> Result<Config> | Load config |
save_config | fn save_config(app_name: &str, config: &Config, custom_path: Option<&str>) -> Result<()> | Save config |
mask_api_key | fn mask_api_key(key: &str) -> String | Mask API key |
Context
#![allow(unused)] fn main() { #[derive(Debug, Clone, Default, Serialize, Deserialize)] pub struct Context { pub name: String, pub client: Option<ClientCredentials>, pub console: Option<ConsoleCredentials>, pub api_key: String, pub base_url: String, pub timeout: i32, pub max_retries: i32, pub default_voice: String, pub extra: HashMap<String, String>, } }
Output
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] pub enum OutputFormat { #[default] Yaml, Json, } pub struct Output { pub format: OutputFormat, pub file: Option<String>, } }
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(format: OutputFormat, file: Option<String>) -> Self | Create output config |
write | fn write<T: Serialize>(&self, value: &T) -> Result<()> | Write formatted output |
write_binary | fn write_binary(&self, data: &[u8], path: &str) -> Result<()> | Write binary data |
Free Functions:
| Function | Signature | Description |
|---|---|---|
print_verbose | fn print_verbose(enabled: bool, message: &str) | Print verbose message |
guess_extension | fn guess_extension(format: &str) -> &str | Guess file extension |
Paths
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub struct Paths { pub app_name: String, pub home_dir: PathBuf, } }
Methods:
| Method | Signature | Description |
|---|---|---|
new | fn new(app_name: impl Into<String>) -> io::Result<Self> | Create paths instance |
base_dir | fn base_dir(&self) -> PathBuf | ~/.giztoy |
app_dir | fn app_dir(&self) -> PathBuf | ~/.giztoy/ |
config_file | fn config_file(&self) -> PathBuf | ~/.giztoy/ |
cache_dir | fn cache_dir(&self) -> PathBuf | ~/.giztoy/ |
log_dir | fn log_dir(&self) -> PathBuf | ~/.giztoy/ |
data_dir | fn data_dir(&self) -> PathBuf | ~/.giztoy/ |
ensure_app_dir | fn ensure_app_dir(&self) -> io::Result<()> | Create app dir |
ensure_cache_dir | fn ensure_cache_dir(&self) -> io::Result<()> | Create cache dir |
cache_path | fn cache_path(&self, name: &str) -> PathBuf | Path in cache |
log_path | fn log_path(&self, name: &str) -> PathBuf | Path in logs |
data_path | fn data_path(&self, name: &str) -> PathBuf | Path in data |
Usage
Load Configuration
#![allow(unused)] fn main() { use giztoy_cli::{load_config, mask_api_key}; let cfg = load_config("minimax", None)?; if let Some(ctx) = cfg.get_current_context() { println!("API Key: {}", mask_api_key(&ctx.api_key)); } }
Output Results
#![allow(unused)] fn main() { use giztoy_cli::{Output, OutputFormat}; use serde::Serialize; #[derive(Serialize)] struct Result { status: String, message: String, } let result = Result { status: "ok".to_string(), message: "done".to_string(), }; // Output as JSON to stdout let output = Output::new(OutputFormat::Json, None); output.write(&result)?; // Output to file let output = Output::new(OutputFormat::Yaml, Some("output.yaml".to_string())); output.write(&result)?; }
Path Management
#![allow(unused)] fn main() { use giztoy_cli::Paths; let paths = Paths::new("minimax")?; // Ensure directories exist paths.ensure_cache_dir()?; paths.ensure_log_dir()?; // Get paths let cache_path = paths.cache_path("response.json"); let log_path = paths.log_path("2024-01-15.log"); }
Dependencies
serde+serde_yaml+serde_json- Serializationdirs- Home directory detectionanyhow- Error handling
Differences from Go
| Aspect | Go | Rust |
|---|---|---|
| Error handling | error return | anyhow::Result |
| Config loading | LoadConfig(app) | load_config(app, None) |
| Output formats | yaml, json, table, raw | yaml, json only |
| Print helpers | PrintSuccess, PrintError, etc. | print_verbose only |
| Path returns | string | PathBuf |
CLI Package - Known Issues
🟡 Minor Issues
CLI-001: Rust missing output formats
Description:
Go supports 4 output formats: yaml, json, table, raw.
Rust only supports: yaml, json.
Impact: Rust CLI tools cannot output raw binary to stdout or table format.
Suggestion: Add Raw and Table format support to Rust.
CLI-002: Rust missing print helpers
Description:
Go has multiple print helpers with icons:
PrintSuccess(✓)PrintErrorPrintInfo(ℹ)PrintWarning(⚠)PrintVerbose
Rust only has print_verbose.
Impact: Inconsistent user experience between Go and Rust CLIs.
Suggestion: Add print helper functions to Rust.
CLI-003: Config file permissions
File: go/pkg/cli/config.go:143
Description:
Config file is created with 0600 permissions (owner read/write only), which is good. But the config directory is created with 0755 (world-readable).
os.MkdirAll(dir, 0755) // Directory readable by others
os.WriteFile(c.configPath, data, 0600) // File only owner
Impact: Directory structure is visible to other users, though file content is protected.
Suggestion: Consider 0700 for the config directory.
CLI-004: Go Context.Extra returns empty string for missing keys
File: go/pkg/cli/config.go:224-228
Description:
GetExtra returns empty string "" for missing keys, making it impossible to distinguish between "key exists with empty value" and "key doesn't exist".
func (ctx *Context) GetExtra(key string) string {
if ctx.Extra == nil {
return ""
}
return ctx.Extra[key] // Returns "" for missing key
}
Impact: Cannot differentiate missing vs empty extra values.
Suggestion: Add HasExtra(key string) bool or return (string, bool).
🔵 Enhancements
CLI-005: No config file locking
Description:
Neither Go nor Rust implementation locks the config file during read/write operations. Concurrent CLI processes could corrupt the config.
Suggestion: Implement file locking for Save operations.
CLI-006: No config validation
Description:
Config is loaded without validation. Invalid URLs, negative timeouts, etc. are not detected until runtime errors occur.
Suggestion: Add Validate() error method to Config/Context.
CLI-007: Missing config migration
Description:
No mechanism to handle config format changes between versions. If schema changes, old configs may fail to load.
Suggestion: Add version field and migration support.
CLI-008: No environment variable support
Description:
API keys and other credentials must be stored in config file. No support for environment variable overrides.
Example:
# Desired behavior
export MINIMAX_API_KEY="sk-..."
minimax chat "Hello" # Uses env var instead of config
Suggestion: Add env var lookup with config fallback.
⚪ Notes
CLI-009: Different YAML libraries
Description:
- Go: Uses
github.com/goccy/go-yaml - Rust: Uses
serde_yaml
Both produce compatible output but may have minor formatting differences.
CLI-010: MaskAPIKey behavior for short keys
Description:
Both implementations mask entire key if length <= 8:
if len(key) <= 8 {
return strings.Repeat("*", len(key))
}
This means very short keys (e.g., "test") show as "****" with no visible characters.
CLI-011: Paths use dirs crate in Rust
Description:
- Go: Uses
os.UserHomeDir()(stdlib) - Rust: Uses
dirs::home_dir()(external crate)
Both handle cross-platform home directory detection correctly.
Summary
| ID | Severity | Status | Component |
|---|---|---|---|
| CLI-001 | 🟡 Minor | Open | Rust Output |
| CLI-002 | 🟡 Minor | Open | Rust Print |
| CLI-003 | 🟡 Minor | Open | Go Config |
| CLI-004 | 🟡 Minor | Open | Go Context |
| CLI-005 | 🔵 Enhancement | Open | Both |
| CLI-006 | 🔵 Enhancement | Open | Both |
| CLI-007 | 🔵 Enhancement | Open | Both |
| CLI-008 | 🔵 Enhancement | Open | Both |
| CLI-009 | ⚪ Note | N/A | Both |
| CLI-010 | ⚪ Note | N/A | Both |
| CLI-011 | ⚪ Note | N/A | Rust |
Overall: Functional CLI utilities. Main gaps are feature parity between Go and Rust implementations.