Llamafu¶

On-device LLM inference for Flutter applications

Llamafu is a Flutter FFI plugin that brings the power of llama.cpp to mobile and desktop applications. Run large language models directly on-device with no cloud dependency.

Features¶

Text Generation - Generate text completions with customizable parameters
Multimodal Support - Process images and audio with vision-language models
Chat Sessions - Manage conversations with proper chat templates
LoRA Adapters - Load and apply fine-tuned adapters at runtime
Streaming - Real-time token-by-token output
Cross-Platform - Android, iOS, macOS, Linux, and Windows

Quick Example¶

import 'package:llamafu/llamafu.dart';

void main() async {
  // Initialize with a GGUF model
  final llamafu = await Llamafu.init(
    modelPath: 'models/llama-3.2-1b-q4.gguf',
    contextSize: 2048,
  );

  // Generate text
  final response = await llamafu.complete(
    'Explain quantum computing in simple terms:',
    maxTokens: 256,
    temperature: 0.7,
  );

  print(response);

  // Clean up
  llamafu.dispose();
}

Supported Models¶

Llamafu supports any model in GGUF format compatible with llama.cpp:

Model Type	Examples
Text LLMs	Llama 3.x, Mistral, Phi, Qwen, SmolLM
Vision LLMs	LLaVA, nanoLLaVA, Qwen2-VL, InternVL
Audio LLMs	Ultravox, Qwen2-Audio

Requirements¶

Flutter 3.16+
Dart 3.2+
GGUF model files (quantized recommended for mobile)

Installation¶

Add to your pubspec.yaml:

dependencies:
  llamafu: ^1.0.0

See the Installation Guide for detailed setup instructions.

Platform Support¶

Platform	Status	GPU Acceleration
Android	:white_check_mark: Supported	CPU only (NNAPI planned)
iOS	:white_check_mark: Supported	Metal
macOS	:white_check_mark: Supported	Metal
Linux	:white_check_mark: Supported	CPU (CUDA optional)
Windows	:white_check_mark: Supported	CPU (CUDA optional)

Getting Help¶

GitHub Issues - Bug reports and feature requests
API Reference - Complete API documentation
Examples - Working code examples