Tutorials & Examples¶

Build real applications with Mullama through step-by-step tutorials. Each tutorial includes complete, runnable code in Node.js and Python (our primary bindings), with Rust examples where native access is required.

Getting Started¶

Foundational tutorials for developers new to local LLM inference.

Build a Chatbot

Build a multi-turn conversational chatbot with streaming responses and conversation memory.

Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Core API, Chat Templates

Start Tutorial
Streaming Generation

Display text token-by-token as it generates. Console streaming, SSE, and WebSocket patterns.

Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Streaming, Async

Start Tutorial

Applications¶

Build complete applications using Mullama as the inference engine.

RAG Pipeline

Retrieval-Augmented Generation with document embeddings, vector search, and grounded answers.

Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Embeddings, Batch

Start Tutorial
API Server

Production API server with OpenAI-compatible endpoints, streaming SSE, and rate limiting.

Difficulty: Intermediate | Languages: Node.js (Express), Python (FastAPI) Features: Async, Streaming

Start Tutorial
Semantic Search

Build a semantic search engine with embedding-based retrieval and similarity ranking.

Difficulty: Intermediate | Languages: Node.js, Python Features: Embeddings, Batch

Start Tutorial
Batch Processing

Process multiple prompts efficiently with parallel execution and progress reporting.

Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Parallel, Batch

Start Tutorial
Multimodal Processing

Process images alongside text for captioning, visual QA, and multi-format understanding.

Difficulty: Intermediate | Languages: Rust (primary), Python Features: Multimodal

Start Tutorial

Advanced¶

Complex integrations requiring multiple features and deeper system knowledge.

Voice Assistant

Real-time voice-to-text-to-response pipeline with VAD and streaming output.

Difficulty: Advanced | Languages: Rust (primary) Features: Streaming Audio, Multimodal, Async

Start Tutorial
Edge Deployment

Deploy Mullama on Raspberry Pi, Jetson Nano, and other resource-constrained devices.

Difficulty: Advanced | Languages: Python, Bash Features: CPU Optimization, Quantization

Start Tutorial

Prerequisites¶

Before starting any tutorial, ensure you have:

Mullama installed for your language of choice:

Node.jsPythonRust

npm install mullama

pip install mullama

[dependencies]
mullama = { version = "0.3", features = ["full"] }

A GGUF model file -- Download or pull via the daemon:
```
mullama pull llama3.2:1b
```
System dependencies -- See Platform Setup for your OS.

Difficulty Guide¶

Level	Description	Time Estimate
Beginner	Core API usage, minimal configuration	15-30 minutes
Intermediate	Multiple features, application architecture	30-60 minutes
Advanced	System integration, performance tuning, hardware-specific	60+ minutes

Running Examples¶

Using the Daemon (Simplest)¶

The Mullama daemon provides an OpenAI-compatible API without writing any code:

# Start the daemon with a model
mullama run llama3.2:1b "Hello, world!"

# Or start the server for API access
mullama serve --model llama3.2:1b

Using Language Bindings¶

Node.jsPythonRust

node chatbot.js

python chatbot.py

cargo run --example chatbot --features full

Feature Dependencies¶

Understanding which features each tutorial uses:

graph TD
    A[Core - No Features] --> B[async]
    B --> C[streaming]
    B --> D[web]
    D --> E[websockets]
    A --> F[multimodal]
    F --> G[streaming-audio]
    F --> H[format-conversion]
    A --> I[parallel]

What's Next¶

New to Mullama? Start with Build a Chatbot
Want an API? Jump to API Server
Need search? Try Semantic Search
Explore Language Bindings for API details