Skip to content

Tutorials & Examples

Build real applications with Mullama through step-by-step tutorials. Each tutorial includes complete, runnable code in Node.js and Python (our primary bindings), with Rust examples where native access is required.


Getting Started

Foundational tutorials for developers new to local LLM inference.

  • Build a Chatbot


    Build a multi-turn conversational chatbot with streaming responses and conversation memory.

    Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Core API, Chat Templates

    Start Tutorial

  • Streaming Generation


    Display text token-by-token as it generates. Console streaming, SSE, and WebSocket patterns.

    Difficulty: Beginner | Languages: Node.js, Python, Rust Features: Streaming, Async

    Start Tutorial


Applications

Build complete applications using Mullama as the inference engine.

  • RAG Pipeline


    Retrieval-Augmented Generation with document embeddings, vector search, and grounded answers.

    Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Embeddings, Batch

    Start Tutorial

  • API Server


    Production API server with OpenAI-compatible endpoints, streaming SSE, and rate limiting.

    Difficulty: Intermediate | Languages: Node.js (Express), Python (FastAPI) Features: Async, Streaming

    Start Tutorial

  • Semantic Search


    Build a semantic search engine with embedding-based retrieval and similarity ranking.

    Difficulty: Intermediate | Languages: Node.js, Python Features: Embeddings, Batch

    Start Tutorial

  • Batch Processing


    Process multiple prompts efficiently with parallel execution and progress reporting.

    Difficulty: Intermediate | Languages: Node.js, Python, Rust Features: Parallel, Batch

    Start Tutorial

  • Multimodal Processing


    Process images alongside text for captioning, visual QA, and multi-format understanding.

    Difficulty: Intermediate | Languages: Rust (primary), Python Features: Multimodal

    Start Tutorial


Advanced

Complex integrations requiring multiple features and deeper system knowledge.

  • Voice Assistant


    Real-time voice-to-text-to-response pipeline with VAD and streaming output.

    Difficulty: Advanced | Languages: Rust (primary) Features: Streaming Audio, Multimodal, Async

    Start Tutorial

  • Edge Deployment


    Deploy Mullama on Raspberry Pi, Jetson Nano, and other resource-constrained devices.

    Difficulty: Advanced | Languages: Python, Bash Features: CPU Optimization, Quantization

    Start Tutorial


Prerequisites

Before starting any tutorial, ensure you have:

  1. Mullama installed for your language of choice:

    npm install mullama
    
    pip install mullama
    
    [dependencies]
    mullama = { version = "0.3", features = ["full"] }
    
  2. A GGUF model file -- Download or pull via the daemon:

    mullama pull llama3.2:1b
    

  3. System dependencies -- See Platform Setup for your OS.


Difficulty Guide

Level Description Time Estimate
Beginner Core API usage, minimal configuration 15-30 minutes
Intermediate Multiple features, application architecture 30-60 minutes
Advanced System integration, performance tuning, hardware-specific 60+ minutes

Running Examples

Using the Daemon (Simplest)

The Mullama daemon provides an OpenAI-compatible API without writing any code:

# Start the daemon with a model
mullama run llama3.2:1b "Hello, world!"

# Or start the server for API access
mullama serve --model llama3.2:1b

Using Language Bindings

node chatbot.js
python chatbot.py
cargo run --example chatbot --features full

Feature Dependencies

Understanding which features each tutorial uses:

graph TD
    A[Core - No Features] --> B[async]
    B --> C[streaming]
    B --> D[web]
    D --> E[websockets]
    A --> F[multimodal]
    F --> G[streaming-audio]
    F --> H[format-conversion]
    A --> I[parallel]

What's Next