Skip to content

Model Catalog

UniLLM supports 47 model architectures across 10 categories. This catalog provides details on each supported model.

Overview

Category Count Description
Core LLMs 5 Modern decoder-only language models
GPT Family 4 Original GPT architecture variants
Code Models 3 Code generation specialists
MoE Models 6 Mixture-of-Experts architectures
RWKV Family 3 Linear attention / RNN models
Embedding Models 4 Encoder-only models for embeddings
Vision-Language 8 Multimodal vision-language models
Multimodal 3 Multi-input multimodal models
Audio 4 Speech and audio processing
Specialized 7 Encoder-decoder and unique architectures

Core LLMs

Modern decoder-only transformer architectures.

Model Parameters Context Key Features
LLaMA 7B-70B 4K-128K RoPE, GQA, SwiGLU
Qwen 0.5B-72B 8K-128K Strong multilingual
Gemma 2B-7B 8K Google's open model
Phi 1.3B-3B 2K-128K Efficient, high quality
Mistral 7B 8K-32K Sliding window attention

GPT Family

Original GPT-style decoder architectures.

Model Parameters Context Key Features
GPT-2 124M-1.5B 1K Original architecture
GPT-J 6B 2K Open-source GPT
GPT-NeoX 20B 2K Rotary embeddings
OPT 125M-175B 2K Meta's open model

Code Models

Specialized for code generation and understanding.

Model Parameters Context Key Features
StarCoder 1B-15B 8K Multi-language code
CodeLlama 7B-34B 16K Code-specialized LLaMA
CodeGen 350M-16B 2K Salesforce code model

MoE Models

Mixture-of-Experts architectures for efficiency at scale.

Model Parameters Context Key Features
Mixtral 8x7B 32K Top-2 gating
DeepSeek-MoE 16B 4K Fine-grained experts
Dbrx 132B 32K Databricks MoE
Grok 314B 8K xAI MoE
Arctic 480B 4K Snowflake MoE
Jamba 52B 256K Mamba + Attention hybrid

RWKV Family

Linear attention and RNN-based models.

Model Parameters Context Key Features
RWKV-4 169M-14B Unlimited Time-mixing mechanism
RWKV-6 1.6B-14B Unlimited Improved architecture
RecurrentGemma 2B 8K Griffin architecture

Embedding Models

Encoder-only models for text embeddings.

Model Parameters Context Key Features
BERT 110M-340M 512 Bidirectional encoder
RoBERTa 125M-355M 512 Optimized BERT
XLM-RoBERTa 270M-550M 512 Multilingual
MPNet 110M 512 Permuted language modeling

Vision-Language

Models that understand both images and text.

Model Parameters Context Key Features
CLIP 400M 77 Contrastive learning
LLaVA 7B-13B 4K Visual instruction tuning
Qwen2-VL 2B-72B 32K Native multimodal
Phi-3-Vision 4B 128K Efficient VLM
InternVL 1B-40B 4K Strong vision encoder
CogVLM 17B 4K Cognitive VLM
Idefics 9B-80B 4K Open Flamingo
Florence 230M-770M N/A Microsoft vision

Multimodal

Advanced multimodal architectures.

Model Parameters Context Key Features
Flamingo 3B-80B 4K Few-shot multimodal
BLIP-2 2.7B-12B 512 Q-Former architecture
PaLI 3B-17B N/A Pathways multimodal

Audio

Speech and audio processing models.

Model Parameters Context Key Features
Whisper 39M-1.5B 30s Speech recognition
Wav2Vec2 95M-1B N/A Self-supervised ASR
HuBERT 95M-1B N/A Hidden unit BERT
Encodec 15M-80M N/A Neural audio codec

Specialized

Unique architectures and encoder-decoder models.

Model Parameters Context Key Features
T5 60M-11B 512 Text-to-text
BART 140M-400M 1K Denoising autoencoder
Falcon 7B-180B 2K FlashAttention optimized
Bloom 560M-176B 2K Multilingual (46 languages)
StableLM 1.6B-7B 4K Stability AI
MusicGen 300M-3.3B N/A Music generation

Quick Start

Loading a Model

use unillm::models_v2::llama::{LlamaModelV2, LlamaConfig};
use unillm::weight_loader_core::WeightLoader;
use unillm::Model;

// Load from GGUF
let weights = WeightLoader::from_gguf("llama-7b.gguf")?;
let config = LlamaConfig::from_gguf_metadata(weights.metadata())?;
let model = LlamaModelV2::from_weights(config, weights)?;

// Generate text
let response = model.generate("Hello", &GenerationConfig::default())?;

Loading from Ollama

use unillm::ollama::OllamaRegistry;

// Download and load
let path = OllamaRegistry::pull("llama2:7b")?;
let weights = WeightLoader::from_gguf(&path)?;

Supported Formats

All models support:

Format Extension Quantization Notes
GGUF .gguf Q2-Q8, F16, F32 Recommended
SafeTensors .safetensors F16, F32 HuggingFace format
PyTorch .bin, .pt F16, F32 Legacy support

Model Selection Guide

By Use Case

Use Case Recommended Models
General chat LLaMA 3, Qwen 2, Mistral
Code generation CodeLlama, StarCoder
Creative writing Mixtral, LLaMA 3 70B
Embeddings BERT, MPNet
Image understanding LLaVA, Qwen2-VL
Speech transcription Whisper

By Hardware

Hardware Recommended Models
Consumer GPU (8GB) Phi-3, Gemma 2B, Q4 quantized 7B
Gaming GPU (16GB) 7B models, Mixtral Q4
Pro GPU (24GB+) 13B-34B models
Multi-GPU 70B+ models
CPU only Phi-2, Q4 quantized 7B