Tutorials & Examples¶
Build real applications with Mullama through step-by-step tutorials. Each tutorial includes complete, runnable code in Node.js and Python (our primary bindings), with Rust examples where native access is required.
Getting Started¶
Foundational tutorials for developers new to local LLM inference.
-
Build a Chatbot
Build a multi-turn conversational chatbot with streaming responses and conversation memory.
Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Core API, Chat Templates
-
Streaming Generation
Display text token-by-token as it generates. Console streaming, SSE, and WebSocket patterns.
Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Streaming, Async
Applications¶
Build complete applications using Mullama as the inference engine.
-
RAG Pipeline
Retrieval-Augmented Generation with document embeddings, vector search, and grounded answers.
Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Embeddings, Batch
-
API Server
Production API server with OpenAI-compatible endpoints, streaming SSE, and rate limiting.
Difficulty: Intermediate | Languages: Node.js (Express), Python (FastAPI) Features: Async, Streaming
-
Semantic Search
Build a semantic search engine with embedding-based retrieval and similarity ranking.
Difficulty: Intermediate | Languages: Node.js, Python Features: Embeddings, Batch
-
Batch Processing
Process multiple prompts efficiently with parallel execution and progress reporting.
Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Parallel, Batch
-
Multimodal Processing
Process images alongside text for captioning, visual QA, and multi-format understanding.
Difficulty: Intermediate | Languages: Rust (primary), Python Features: Multimodal
Advanced¶
Complex integrations requiring multiple features and deeper system knowledge.
-
Voice Assistant
Real-time voice-to-text-to-response pipeline with VAD and streaming output.
Difficulty: Advanced | Languages: Rust (primary) Features: Streaming Audio, Multimodal, Async
-
Edge Deployment
Deploy Mullama on Raspberry Pi, Jetson Nano, and other resource-constrained devices.
Difficulty: Advanced | Languages: Python, Bash Features: CPU Optimization, Quantization
Prerequisites¶
Before starting any tutorial, ensure you have:
-
Mullama installed for your language of choice:
-
A GGUF model file -- Download or pull via the daemon:
-
System dependencies -- See Platform Setup for your OS.
Difficulty Guide¶
| Level | Description | Time Estimate |
|---|---|---|
| Beginner | Core API usage, minimal configuration | 15-30 minutes |
| Intermediate | Multiple features, application architecture | 30-60 minutes |
| Advanced | System integration, performance tuning, hardware-specific | 60+ minutes |
Running Examples¶
Using the Daemon (Simplest)¶
The Mullama daemon provides an OpenAI-compatible API without writing any code:
# Start the daemon with a model
mullama run llama3.2:1b "Hello, world!"
# Or start the server for API access
mullama serve --model llama3.2:1b
Using Language Bindings¶
Feature Dependencies¶
Understanding which features each tutorial uses:
graph TD
A[Core - No Features] --> B[async]
B --> C[streaming]
B --> D[web]
D --> E[websockets]
A --> F[multimodal]
F --> G[streaming-audio]
F --> H[format-conversion]
A --> I[parallel]
What's Next¶
- New to Mullama? Start with Build a Chatbot
- Want an API? Jump to API Server
- Need search? Try Semantic Search
- Explore Language Bindings for API details