Installation¶
Install Mullama for your language or as a standalone CLI/daemon. Pre-built binaries are available for all major platforms.
Install the Package¶
Install via npm (prebuilt native binaries included):
Or with yarn:
Or with pnpm:
Native Addon
The Node.js package includes prebuilt binaries for Linux (x86_64, aarch64), macOS (Apple Silicon, Intel), and Windows (x86_64). If a prebuilt binary is not available for your platform, it will compile from source during installation -- ensure you have a C++ compiler and CMake installed.
Install via pip:
Or with a virtual environment (recommended):
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install mullama
Or with conda:
Add to your Cargo.toml:
Or manually specify features:
CGo Required
The Go binding uses cgo to interface with the native library. Ensure you have a C compiler available and CGO_ENABLED=1 (the default).
FFI Extension
Requires PHP 7.4+ with the FFI extension enabled. Add extension=ffi to your php.ini if not already enabled.
Build the shared library from source:
git clone --recurse-submodules https://github.com/cognisoc/mullama.git
cd mullama/bindings/ffi
cargo build --release
The compiled library will be located at:
| Platform | Path |
|---|---|
| Linux | target/release/libmullama_ffi.so |
| macOS | target/release/libmullama_ffi.dylib |
| Windows | target/release/mullama_ffi.dll |
Copy the header file and library to your project:
Feature Flags (Rust)¶
Mullama uses Cargo feature flags to control which capabilities are compiled. This keeps binary size minimal and avoids unnecessary dependencies.
Common Presets¶
[dependencies]
# Web API service with streaming responses
mullama = { version = "0.3.0", features = ["web", "websockets", "streaming"] }
# Multimodal AI (text + image + audio)
mullama = { version = "0.3.0", features = ["multimodal", "streaming-audio", "format-conversion"] }
# High-throughput batch processing
mullama = { version = "0.3.0", features = ["parallel", "async", "tokio-runtime"] }
# Semantic search / RAG pipeline
mullama = { version = "0.3.0", features = ["late-interaction", "parallel", "async"] }
# Everything enabled
mullama = { version = "0.3.0", features = ["full"] }
Feature Reference¶
| Feature | Description | Key Dependencies |
|---|---|---|
async |
Tokio-based non-blocking operations | tokio, futures |
streaming |
Real-time token-by-token generation | async, async-stream |
web |
Axum REST API framework integration | async, axum, tower |
websockets |
Bidirectional real-time communication | async, tokio-tungstenite |
multimodal |
Text, image, and audio processing | image, hound, symphonia |
streaming-audio |
Real-time audio capture with VAD | multimodal, cpal, ringbuf |
format-conversion |
Audio/image format conversion | multimodal, ffmpeg-next |
parallel |
Rayon work-stealing parallelism | rayon |
late-interaction |
ColBERT-style semantic search | Core only |
tokio-runtime |
Advanced Tokio runtime management | tokio, tokio-util |
daemon |
Full CLI, daemon, TUI, and REST API | Multiple (see below) |
embedded-ui |
Embed Web UI in daemon binary | include_dir, mime_guess |
full |
All features enabled | All of the above |
Feature Dependency Chain¶
full
|-- async .............. tokio, futures, tokio-util
|-- streaming .......... async + async-stream
|-- web ................ async + axum, tower, tower-http
|-- websockets ......... async + axum, tokio-tungstenite
|-- multimodal ......... image, hound, symphonia, rubato, dasp
|-- streaming-audio .... multimodal + cpal, ringbuf
|-- format-conversion .. multimodal + ffmpeg-next
|-- parallel ........... rayon
|-- late-interaction ... (core only)
|-- tokio-runtime ...... tokio, tokio-util
|-- daemon ............. async, tokio-runtime, web + clap, ratatui, nng, ...
Minimal Builds
For the smallest possible binary, use no features:
This gives you synchronous model loading, inference, and sampling with zero additional dependencies.
Building from Source¶
For the latest features or custom builds, compile Mullama from source.
Step 1: Clone the Repository¶
Submodules are Required
Mullama includes llama.cpp as a git submodule. If you cloned without --recurse-submodules:
Step 2: Install Platform Dependencies¶
See Platform Setup for OS-specific packages. At minimum you need:
# Ubuntu/Debian
sudo apt install -y build-essential cmake pkg-config git
# macOS
xcode-select --install && brew install cmake pkg-config
# Windows: Install Visual Studio Build Tools + CMake
Step 3: Build¶
# Debug build (faster compilation, slower runtime)
cargo build --features full
# Release build (slower compilation, optimized runtime)
cargo build --release --features full
# Build only the daemon
cargo build --release --features daemon
Step 4: Run Tests¶
# Run all tests
cargo test --features full
# Run specific test modules
cargo test --features async test_async_generation
From Git (Latest Unreleased)¶
Use the git dependency for bleeding-edge features:
[dependencies]
mullama = { git = "https://github.com/cognisoc/mullama.git", features = ["full"] }
Verifying Installation¶
Confirm that Mullama is correctly installed for your language.
Expected output:
Expected output:
Or in code:
Docker¶
Run Mullama in a container for reproducible deployments.
CPU-Only¶
FROM rust:1.77-slim AS builder
RUN apt-get update && apt-get install -y \
build-essential cmake pkg-config git \
libasound2-dev libpulse-dev libflac-dev libvorbis-dev libopus-dev \
libpng-dev libjpeg-dev libtiff-dev libwebp-dev \
ffmpeg libavcodec-dev libavformat-dev libavutil-dev \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone --recurse-submodules https://github.com/cognisoc/mullama.git .
RUN cargo build --release --features daemon
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
libasound2 libpulse0 libflac12 libvorbis0a libopus0 \
libpng16-16 libjpeg62-turbo libtiff6 libwebp7 \
ffmpeg ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/mullama /usr/local/bin/mullama
EXPOSE 8080
VOLUME ["/models"]
ENTRYPOINT ["mullama"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8080", "--model-dir", "/models"]
With NVIDIA GPU¶
FROM nvidia/cuda:12.4.0-devel-ubuntu22.04 AS builder
RUN apt-get update && apt-get install -y \
build-essential cmake pkg-config git curl \
libasound2-dev libpulse-dev libflac-dev libvorbis-dev libopus-dev \
libpng-dev libjpeg-dev libtiff-dev libwebp-dev \
ffmpeg libavcodec-dev libavformat-dev libavutil-dev \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
WORKDIR /app
RUN git clone --recurse-submodules https://github.com/cognisoc/mullama.git .
ENV LLAMA_CUDA=1
RUN cargo build --release --features daemon
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
libasound2 libpulse0 libflac12 libvorbis0a libopus0 \
libpng16-16 libjpeg62-turbo libtiff6 libwebp7 \
ffmpeg ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/mullama /usr/local/bin/mullama
EXPOSE 8080
VOLUME ["/models"]
ENTRYPOINT ["mullama"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8080", "--model-dir", "/models"]
Running the Container¶
# CPU-only
docker build -t mullama .
docker run -p 8080:8080 -v ./models:/models mullama
# With NVIDIA GPU
docker build -f Dockerfile.cuda -t mullama-cuda .
docker run --gpus all -p 8080:8080 -v ./models:/models mullama-cuda
Docker Compose¶
services:
mullama:
build: .
ports:
- "8080:8080"
volumes:
- ./models:/models
environment:
- MULLAMA_MODEL=llama3.2:1b
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Pre-built Binaries¶
Download pre-built daemon binaries from the GitHub Releases page.
Troubleshooting¶
npm install fails with compilation errors
Ensure you have build tools installed:
pip install fails on Linux
Install the Python development headers:
cargo build fails with 'llama.cpp not found'
Initialize git submodules:
Linking errors on macOS
Ensure Homebrew paths are configured:
Next Steps
- Platform Setup -- Install OS-specific audio, image, and video dependencies
- GPU Acceleration -- Configure GPU for faster inference
- Your First Project -- Build a chatbot from scratch