Skip to content

User Guide

This guide covers everything you need to know to use UniLLM effectively.

Overview

UniLLM provides a unified interface for working with large language models. This guide covers:

  • Loading Models - How to load models from different formats
  • Running Inference - Forward pass and batching
  • Text Generation - Autoregressive generation with sampling
  • Configuration - Customizing model and generation settings

Core Concepts

The Model Trait

All models in UniLLM implement the Model trait:

pub trait Model: Send + Sync {
    type Config: ModelConfig;

    fn new(config: Self::Config) -> Result<Self>;
    fn from_weights(config: Self::Config, weights: ModelWeights) -> Result<Self>;
    fn forward(&self, inputs: &ModelInputs) -> Result<ModelOutputs>;
    fn generate(&self, prompt: &str, config: &GenerationConfig) -> Result<String>;
    fn to_device(&mut self, device: &Device) -> Result<()>;
}

This provides a consistent interface across all 47 supported architectures.

Input/Output Types

UniLLM uses unified input/output types:

// Input types
enum ModelInputs {
    Text { input_ids, attention_mask, position_ids },
    Image { pixel_values, image_mask },
    Multimodal { input_ids, pixel_values, ... },
    Audio { input_features, attention_mask },
}

// Output types
enum ModelOutputs {
    Logits { logits, hidden_states },
    Embeddings { embeddings, pooled },
    Multimodal { text_logits, image_features, ... },
}

Quick Reference

Task API
Create model Model::new(config)
Load weights WeightLoader::from_gguf(path)
Forward pass model.forward(&inputs)
Generate text model.generate(prompt, &gen_config)
Move to GPU model.to_device(&Device::CUDA(0))

Sections