Advanced Features¶
Mullama's advanced features unlock production capabilities for web services, real-time audio, parallel processing, and fine-grained retrieval. Each feature is independently gated behind Cargo feature flags.
Prerequisites
Before exploring advanced features, ensure you are comfortable with:
- Loading models and creating contexts
- Text generation and sampling parameters
- Async support for non-blocking operations
- Streaming for real-time token delivery
Feature Dependency Diagram¶
Advanced features build on core capabilities. Some features transitively enable others:
graph TD
Core[Core Library] --> Async[async]
Core --> Multimodal[multimodal]
Core --> Parallel[parallel]
Core --> LateInteraction[late-interaction]
Core --> ControlVectors[control-vectors]
Async --> Web[web]
Async --> WebSockets[websockets]
Multimodal --> StreamingAudio[streaming-audio]
Multimodal --> FormatConversion[format-conversion]
Web -.-> WebSockets
StreamingAudio -.-> WebSockets
style Core fill:#6b46c1,color:#fff
style Async fill:#805ad5,color:#fff
style Multimodal fill:#805ad5,color:#fff
style Web fill:#9f7aea,color:#fff
style WebSockets fill:#9f7aea,color:#fff
style StreamingAudio fill:#9f7aea,color:#fff
style FormatConversion fill:#9f7aea,color:#fff
style Parallel fill:#9f7aea,color:#fff
style LateInteraction fill:#9f7aea,color:#fff
style ControlVectors fill:#9f7aea,color:#fff
Key dependency chains:
webandwebsocketsrequireasync(Tokio runtime)streaming-audioandformat-conversionrequiremultimodalparallel,late-interaction, andcontrol-vectorsare standalone- SIMD optimizations are always available (no feature flag needed)
When to Use Advanced Features¶
| Feature | Use Case | Complexity |
|---|---|---|
| Web Framework | REST API server for LLM inference | Medium |
| WebSockets | Real-time chat, multi-user streaming | Medium |
| Streaming Audio | Voice assistants, live transcription | High |
| Format Conversion | Audio/image preprocessing pipelines | Low |
| Parallel Processing | Batch inference, bulk embeddings | Medium |
| Late Interaction | High-precision semantic search (ColBERT) | Medium |
| SIMD Optimizations | Faster sampling (automatic) | None |
| Control Vectors | Steering model behavior without fine-tuning | Low |
Performance vs Complexity Tradeoffs¶
Performance Gain
^
| * Parallel
| * SIMD
| * Web * Late Interaction
| * WebSockets
| * Format Conv * Streaming Audio
| * Control Vectors
+-----------------------------------------> Complexity
Low Medium High
Start Simple
Begin with the core library for prototyping. Add advanced features incrementally as your application requirements grow. Each feature flag increases compile time and binary size.
Enabling Multiple Features¶
Combine features as needed in your Cargo.toml:
Advanced Topics¶
-
Build production REST APIs with routing, middleware, SSE streaming, and metrics.
-
Real-time bidirectional communication with rooms, audio streaming, and compression.
-
Low-latency audio capture with voice activity detection and ring buffer architecture.
-
Convert between audio formats (WAV/MP3/FLAC) and image formats (JPEG/PNG/WebP).
-
Rayon-powered batch inference with work-stealing, NUMA awareness, and CPU pinning.
-
Per-token embeddings with MaxSim scoring for high-precision semantic retrieval.
-
Hardware-accelerated sampling with AVX2, AVX-512, and NEON support.
-
Steer model behavior (style, safety, personality) without fine-tuning.
Feature Flags Reference¶
| Feature Flag | Dependencies | Description |
|---|---|---|
async |
tokio | Async/await support |
streaming |
- | TokenStream with backpressure |
multimodal |
- | Text, image, audio processing |
web |
async |
Axum web framework integration |
websockets |
async |
WebSocket server and client |
streaming-audio |
multimodal |
Real-time audio capture |
format-conversion |
multimodal |
Audio/image format conversion |
parallel |
rayon | Parallel batch processing |
late-interaction |
- | ColBERT-style multi-vector embeddings |
control-vectors |
- | Behavior steering vectors |
full |
all above | Enable everything |
See Also¶
- Library Guide - Core library usage
- API Reference - Complete API documentation
- Examples - Full working examples