Skip to content

Web UI

Mullama includes an embedded Web UI built with Vue 3, Vite, and Tailwind CSS that provides a browser-based interface for managing models, chatting, testing APIs, and monitoring the daemon.

Access URL: http://localhost:8080/ui/

Overview

The Web UI provides five main views:

View Path Description
Dashboard /ui/ System status, loaded models, quick actions
Models /ui/models Browse, download, load, and manage models
Chat /ui/chat Multi-conversation chat with streaming
Playground /ui/playground API testing and curl generation
Settings /ui/settings Theme, defaults, and display options

Building the Web UI

The Web UI is a Vue.js single-page application located in the ui/ directory. It must be built before being embedded into the daemon binary.

Prerequisites

  • Node.js (v18+)
  • npm

Build Steps

# Navigate to the UI directory
cd ui

# Install dependencies
npm install

# Build the production bundle
npm run build

# Return to project root
cd ..

Build Daemon with Embedded UI

After building the UI, compile the daemon binary with the embedded-ui feature flag:

cargo build --release --features daemon,embedded-ui

Build Order

The UI must be built before compiling the Rust binary. The embedded-ui feature uses Rust's include_dir! macro to embed the built UI assets at compile time.

Development Mode

For UI development with hot-reload:

cd ui
npm run dev

This starts a Vite dev server (typically on port 5173) that proxies API requests to the Mullama daemon running on port 8080. Make sure the daemon is running separately:

# In another terminal
mullama serve --model llama3.2:1b

Accessing the Web UI

Once the daemon is running with the embedded UI:

# Start the daemon
mullama serve --model llama3.2:1b

# Open in browser
open http://localhost:8080/ui/

The UI is served at /ui/ and all sub-paths (/ui/*) route to the Vue.js SPA using client-side routing.


Dashboard

The Dashboard provides an at-a-glance overview of the daemon state:

System Status Card

  • Uptime -- How long the daemon has been running
  • Version -- Mullama daemon version
  • HTTP Endpoint -- Address and port of the API server
  • IPC Socket -- Path to the IPC socket

Loaded Models Card

  • List of all currently loaded models
  • Each model shows: name, parameter count, GPU layers, context size
  • Default model indicator
  • Active request count per model
  • Quick actions: unload, set as default

Quick Actions

  • Load Default Model -- One-click to download and load a recommended model
  • Open Chat -- Navigate to the chat interface
  • View Metrics -- Link to raw Prometheus metrics

Statistics

  • Total Requests -- Lifetime request count
  • Tokens Generated -- Total tokens produced
  • Active Requests -- Currently processing
  • GPU Available -- Whether GPU acceleration is active

Models Page

The Models view provides comprehensive model management.

Browse Available Models

A card grid of pre-configured default models with:

  • Model name and description
  • Size indicator (1B, 7B, 14B, etc.)
  • Capability tags (chat, reasoning, code, vision, embeddings)
  • Download button with size estimate
  • Status indicator (not downloaded, downloading, available, loaded)

Download Progress

When pulling models, the UI displays:

  • File name and total size
  • Progress bar with percentage
  • Download speed (MB/s)
  • Estimated time remaining
  • Cancel button

Loaded Models Management

For each loaded model:

  • Unload -- Free memory by removing from inference engine
  • Set Default -- Make this the default model for API requests
  • Details -- View parameters, context size, GPU layers, file path
  • Active Requests -- Count of in-flight requests

Custom Model Loading

Form to load a custom model:

  • Model alias (text input)
  • GGUF file path (file picker or text input)
  • GPU layers (slider: 0-99)
  • Context size (dropdown: 2048, 4096, 8192, 16384, 32768)
  • Set as default (checkbox)

Chat

The Chat view provides a rich conversational interface.

Features

  • Real-time Streaming -- Tokens appear as they are generated via Server-Sent Events
  • Markdown Rendering -- Full markdown support including headings, lists, bold, italic, links
  • Code Highlighting -- Syntax-highlighted code blocks for 50+ languages with copy button
  • Thinking Display -- Collapsible reasoning blocks for models with thinking tokens
  • Model Selection -- Dropdown to switch between loaded models mid-conversation
  • Conversation History -- Sidebar with multiple conversations
  • System Prompts -- Configure system prompts per conversation
  • Stop Generation -- Button to halt streaming generation
  • Token Counter -- Shows prompt and completion token counts
  • Speed Display -- Tokens per second during generation

Conversation Management

  • New Conversation -- Start fresh with a clean context
  • Rename -- Give conversations meaningful names
  • Delete -- Remove conversations
  • Export -- Download as markdown

Thinking Display

For models configured with thinking tokens (e.g., DeepSeek-R1):

[Thinking] (click to expand)
  Let me work through this step by step...
  First, I need to consider the problem...
  The key insight is...

[Response]
The answer is 42.

The thinking section is collapsed by default and can be expanded with a click.

Code Blocks

Code blocks in responses feature:

  • Language detection and label
  • Syntax highlighting (highlight.js)
  • One-click copy to clipboard button
  • Line numbers (optional, toggled in settings)

Playground

The Playground provides direct API testing capabilities with a form-based interface.

Request Builder

  • Endpoint Selector -- Choose between:

    • Chat Completions (/v1/chat/completions)
    • Text Completions (/v1/completions)
    • Embeddings (/v1/embeddings)
    • Anthropic Messages (/v1/messages)
    • Raw Generate (/api/generate)
  • Model Selector -- Dropdown of loaded models

  • Messages Editor -- Add/remove/edit messages with role selection (system, user, assistant)

  • Prompt Input -- For text completion endpoint

Parameter Tuning

Adjustable parameters with sliders and inputs:

Parameter Control Range
Temperature Slider 0.0 - 2.0
Top P Slider 0.0 - 1.0
Top K Number input 1 - 100
Max Tokens Number input 1 - 32768
Presence Penalty Slider -2.0 - 2.0
Frequency Penalty Slider -2.0 - 2.0
Stream Toggle on/off

Response Viewer

  • Formatted JSON response with syntax highlighting
  • Expandable/collapsible JSON tree
  • Response metadata (status code, timing, token counts)
  • Raw text view for generated content

curl Generation

Every request configuration can be exported as a curl command:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:1b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 256,
    "stream": false
  }'

Copy-to-clipboard button for immediate terminal use.


Settings

The Settings view manages UI preferences and defaults.

Theme

  • Auto -- Follow system preference (light/dark)
  • Light -- Light color scheme
  • Dark -- Dark color scheme

Generation Defaults

Default values for the Playground and Chat:

  • Default temperature
  • Default max tokens
  • Default top_p
  • Default model (when multiple are loaded)

Display Options

  • Code highlighting theme (vs-dark, github, monokai, etc.)
  • Show line numbers in code blocks
  • Markdown rendering (on/off)
  • Show timestamps on messages
  • Show token counts

Connection

  • API endpoint URL (default: auto-detect from page URL)
  • API key (if configured on the server)

Technology Stack

Component Technology Purpose
Framework Vue 3 (Composition API) Reactive UI components
Build Tool Vite Fast builds, HMR in development
Styling Tailwind CSS Utility-first CSS framework
Icons Heroicons UI icons
Markdown markdown-it Markdown rendering
Code Highlighting highlight.js Syntax highlighting
HTTP Client Fetch API API communication
Streaming EventSource / fetch + ReadableStream SSE and NDJSON streaming
State Management Vue reactivity Application state
Routing Vue Router Client-side routing

API Communication

The Web UI communicates with the daemon through the same REST API documented in the REST API, OpenAI API, and Anthropic API pages.

UI Action API Endpoint Method
Dashboard status /api/system/status GET
List models /api/models GET
Load model /api/models/load POST
Unload model /api/models/:name/unload POST
Pull model /api/models/pull POST
Default models /api/defaults GET
Use default model /api/defaults/:name/use POST
Chat (streaming) /v1/chat/completions POST
Embeddings /v1/embeddings POST

Building Without Embedded UI

If you do not need the Web UI, build the daemon without the embedded-ui feature:

cargo build --release --features daemon

In this case, accessing /ui/ will return a message indicating the UI is not available:

{
  "error": "Web UI not available. Build with --features embedded-ui"
}

URL Routes

Route View Description
/ui/ Dashboard System overview and quick actions
/ui/chat Chat Conversational interface
/ui/chat/:id Chat Specific conversation
/ui/models Models Model management
/ui/playground Playground API testing
/ui/settings Settings UI configuration

All routes use client-side routing. Refreshing any page works correctly as the server returns the SPA for all /ui/* paths.


Troubleshooting

UI Not Loading

Common Issues

  • Ensure the daemon was built with --features daemon,embedded-ui
  • Check that npm run build completed successfully in the ui/ directory before compiling
  • Verify the daemon is running: mullama daemon status
  • Check the correct port: curl http://localhost:8080/ui/

API Connection Errors

  • The UI connects to the same host/port it is served from
  • Ensure no firewall rules block the HTTP port (default: 8080)
  • Check CORS is not being blocked (the daemon enables permissive CORS by default)
  • If using a reverse proxy, ensure WebSocket/SSE passthrough is configured

Stale UI Build

If the UI shows outdated content after code changes:

cd ui
rm -rf dist node_modules
npm install
npm run build
cd ..
cargo build --release --features daemon,embedded-ui

Streaming Not Working

If chat responses appear all at once instead of streaming:

  • Check that proxy_buffering off is set in your nginx config
  • Ensure chunked_transfer_encoding off is not blocking SSE
  • Verify the daemon's streaming endpoint works directly: curl -N http://localhost:8080/v1/chat/completions ...

Dark Mode Issues

If the theme does not match your system preference:

  • Use the Settings page to explicitly set light or dark mode
  • Check that your browser supports the prefers-color-scheme media query
  • Hard refresh the page (Ctrl+Shift+R) after changing system theme