API Reference¶
Complete API documentation for all Mullama types, traits, and functions. This reference covers the full public API surface of the library, including core types, feature-gated modules, and utility functions.
How to Read This Reference¶
Each API page follows a consistent structure:
- Type signatures are shown as Rust code blocks with full generic bounds
- Parameter tables use the format: Name | Type | Default | Description
- Error conditions are listed under each method with the specific
MullamaErrorvariant returned - Feature gates are indicated with admonitions at the top of each section
- Examples demonstrate typical usage patterns with compilable code
Conventions¶
impl AsRef<Path>indicates any type convertible to a filesystem path (e.g.,&str,String,PathBuf)Arc<Model>indicates shared ownership -- the caller retains a reference&mut selfindicates the method mutates internal stateResult<T, MullamaError>is the standard return type for fallible operations
Core Types Diagram¶
+------------------+
| MullamaConfig |
| (ModelConfig, |
| ContextConfig, |
| SamplerConfig) |
+--------+---------+
|
v
+------------+ +-------+--------+ +----------------+
| ModelParams| ----> | Model | <---- | ModelBuilder |
+------------+ | (Arc<Inner>) | +----------------+
| Send + Sync |
+---+-----+------+
| |
tokenize() | | create_context()
v v
+----------+ +------------+ +---------------+
| Vec<Token> | Context | <---- | ContextParams |
+----------+ | (!Send) | +---------------+
+---+----+---+
| |
decode() | | generate()
v v
+-----------+ +----+------+
| Batch | | String |
| (SmallVec)| +-----------+
+-----------+
+---------------+
| SamplerChain |
| (Send + Sync) |
+---+-----------+
|
v
+--------+--------+
| Sampler |
| (Send + Sync) |
+-----------------+
Module Organization¶
| Module | Description | Feature Gate |
|---|---|---|
model |
Model loading, metadata, and tokenization | Core |
context |
Inference context, generation, and KV-cache | Core |
sampling |
Sampling strategies and composable chains | Core |
batch |
Efficient multi-token batch processing | Core |
embeddings |
Text embedding generation and similarity | Core |
config |
Configuration management with serde | Core |
error |
Error types and handling patterns | Core |
multimodal |
Text, image, and audio processing | multimodal |
async_support |
Async model, context, and runtime | async |
streaming |
Real-time token streaming | streaming |
Core Types Quick Reference¶
| Type | Module | Description |
|---|---|---|
Model |
model |
Loaded LLM model (Arc-based, Clone, Send+Sync) |
ModelParams |
model |
Parameters for model loading |
ModelBuilder |
model |
Fluent API for model configuration |
Context |
context |
Inference context with KV-cache (!Send) |
ContextParams |
context |
Context configuration parameters |
KvCacheType |
context |
KV-cache quantization level |
SamplerParams |
sampling |
High-level sampling configuration |
SamplerChain |
sampling |
Composable chain of samplers |
Sampler |
sampling |
Individual sampling strategy |
Batch |
batch |
Token batch with SmallVec optimization |
Embeddings |
embedding |
Generated embedding vectors |
EmbeddingGenerator |
embedding |
Embedding generation utility |
EmbeddingConfig |
embedding |
Embedding configuration |
PoolingStrategy |
embedding |
Embedding pooling method |
MullamaConfig |
config |
Top-level configuration struct |
MullamaError |
error |
Unified error type |
Feature-Gated Types¶
Types that require specific Cargo features to be enabled:
| Type | Required Feature | Description |
|---|---|---|
AsyncModel |
async |
Non-blocking model wrapper |
AsyncContext |
async |
Non-blocking context wrapper |
MullamaRuntime |
tokio-runtime |
Tokio runtime manager |
ModelPool |
tokio-runtime |
Connection pool for models |
TaskManager |
tokio-runtime |
Async task coordinator |
TokenStream |
streaming |
Real-time token stream |
StreamConfig |
streaming |
Streaming configuration |
MultimodalProcessor |
multimodal |
Cross-modal processing |
VisionEncoder |
multimodal |
Image encoding pipeline |
ImageInput |
multimodal |
Image data container |
AudioInput |
multimodal |
Audio data container |
StreamingAudioProcessor |
streaming-audio |
Real-time audio capture |
AudioStreamConfig |
streaming-audio |
Audio stream settings |
AppState |
web |
Axum application state |
RouterBuilder |
web |
Web route configuration |
WebSocketServer |
websockets |
WebSocket server |
WebSocketConfig |
websockets |
WebSocket settings |
Thread Safety Overview¶
Mullama types are designed with clear thread safety semantics:
| Type | Send | Sync | Clone | Notes |
|---|---|---|---|---|
Model |
Yes | Yes | Yes (cheap) | Arc-based sharing, reference count increment |
Context |
No | No | No | Bound to creating thread; use per-thread instances |
SamplerChain |
Yes | Yes | No | Can be moved between threads |
Sampler |
Yes | Yes | No | Can be moved between threads |
Batch |
Yes | No | Yes | Move between threads, not shared references |
Embeddings |
Yes | Yes | Yes | Plain data, freely shareable |
AsyncModel |
Yes | Yes | Yes (cheap) | Arc-based, designed for concurrent use |
Context Thread Safety
Context is intentionally not Send. It holds mutable state (KV-cache, position counters) that is not safe for concurrent access. Each thread must create its own Context from a shared Arc<Model>. For async contexts, use AsyncContext which handles thread-safety internally via spawn_blocking.
Typical Multi-Threaded Pattern¶
use mullama::{Model, Context, ContextParams};
use std::sync::Arc;
use std::thread;
let model = Arc::new(Model::load("model.gguf")?);
let handles: Vec<_> = (0..4).map(|i| {
let model = model.clone(); // Cheap Arc clone
thread::spawn(move || {
// Each thread creates its own context
let mut ctx = Context::new(model, ContextParams::default()).unwrap();
ctx.generate(&[1, 2, 3], 50).unwrap()
})
}).collect();
for handle in handles {
let result = handle.join().unwrap();
println!("{}", result);
}
Versioning and Stability Guarantees¶
Mullama follows Semantic Versioning:
- Major version (0.x): The library is in pre-1.0 development. Breaking changes may occur between minor versions.
- Minor version: May include new features and non-breaking API additions.
- Patch version: Bug fixes and documentation improvements only.
Stability Tiers¶
| Tier | Guarantee | Types |
|---|---|---|
| Stable | No breaking changes without major bump | Model, Context, SamplerParams, Batch, Embeddings, MullamaError |
| Unstable | May change between minor versions | Feature-gated types (AsyncModel, TokenStream, MultimodalProcessor) |
| Internal | No stability guarantee | Types in sys module, FFI bindings |
Feature Flags¶
[dependencies.mullama]
version = "0.3"
features = [
"async", # AsyncModel, AsyncContext
"streaming", # TokenStream, StreamConfig (requires "async")
"multimodal", # MultimodalProcessor, ImageInput, AudioInput
"streaming-audio", # StreamingAudioProcessor (requires "multimodal")
"format-conversion",# AudioConverter, ImageConverter (requires "multimodal")
"web", # Axum integration (requires "async")
"websockets", # WebSocket server (requires "async")
"parallel", # ParallelProcessor, batch operations
"tokio-runtime", # MullamaRuntime, TaskManager
"late-interaction", # ColBERT-style multi-vector embeddings
"daemon", # Background daemon mode
"full", # All features enabled
]
Dependency Chains¶
When enabling features, these dependencies are automatically resolved:
streamingrequiresasyncstreaming-audiorequiresmultimodalformat-conversionrequiresmultimodalwebrequiresasyncwebsocketsrequiresasyncfullenables all features
Cross-Language Equivalents¶
Where applicable, API pages note equivalent APIs in other language bindings:
| Rust | Node.js | Python |
|---|---|---|
Model::load(path) |
Model.load(path) |
Model.load(path) |
model.tokenize(text, true, false) |
model.tokenize(text) |
model.tokenize(text) |
ctx.generate(&tokens, max) |
ctx.generate(tokens, max) |
ctx.generate(tokens, max) |
SamplerParams::default() |
new SamplerParams() |
SamplerParams() |
See the Bindings documentation for complete language-specific guides.