Qwen¶

Qwen (Tongyi Qianwen) is Alibaba's family of large language models, known for strong multilingual capabilities and competitive performance.

Overview¶

Property	Value
Architecture	Decoder-only Transformer
Parameters	0.5B, 1.8B, 4B, 7B, 14B, 72B
Context Length	8K-128K (version dependent)
Attention	Grouped Query Attention
Position Encoding	RoPE with NTK-aware scaling
Activation	SwiGLU
Normalization	RMSNorm

Quick Start¶

use unillm::models_v2::qwen::{QwenModelV2, QwenConfig};
use unillm::weight_loader_core::WeightLoader;
use unillm::{Model, GenerationConfig};

// Load model
let weights = WeightLoader::from_gguf("qwen-7b.gguf")?;
let config = QwenConfig::from_gguf_metadata(weights.metadata())?;
let model = QwenModelV2::from_weights(config, weights)?;

// Generate
let response = model.generate(
    "Explain machine learning:",
    &GenerationConfig::default(),
)?;

Configuration¶

model_config!(QwenConfig {
    vocab_size: usize = 151936,
    hidden_size: usize = 4096,
    intermediate_size: usize = 11008,
    num_hidden_layers: usize = 32,
    num_attention_heads: usize = 32,
    num_key_value_heads: usize = 32,
    max_position_embeddings: usize = 8192,
    rope_theta: f32 = 10000.0,
    rms_norm_eps: f32 = 1e-6,
    use_sliding_window: bool = false,
    sliding_window: usize = 32768,
});

Model Sizes¶

Variant	hidden_size	num_layers	num_heads	kv_heads
Qwen 0.5B	1024	24	16	16
Qwen 1.8B	2048	24	16	16
Qwen 4B	2560	40	20	20
Qwen 7B	4096	32	32	32
Qwen 14B	5120	40	40	40
Qwen 72B	8192	80	64	8

Features¶

Extended Vocabulary¶

Qwen uses a large vocabulary optimized for multilingual support:

let config = QwenConfig {
    vocab_size: 151936,  // Large vocab for multilingual
    ..Default::default()
};

NTK-Aware RoPE Scaling¶

Dynamic position interpolation for extended context:

let config = QwenConfig {
    rope_theta: 1000000.0,  // High theta for long context
    max_position_embeddings: 32768,
    ..Default::default()
};

Sliding Window Attention¶

Optional sliding window for efficiency:

let config = QwenConfig {
    use_sliding_window: true,
    sliding_window: 32768,
    ..Default::default()
};

Qwen Versions¶

Qwen 1.5¶

Improved base model
Better instruction following
Extended context to 32K

Qwen 2¶

Latest release (2024)
Sizes: 0.5B to 72B
128K context for larger models
Strong coding abilities

Qwen 2.5¶

Most recent version
Improved reasoning
Better multilingual support

Loading from Ollama¶

use unillm::ollama::OllamaRegistry;

// Qwen 2
let path = OllamaRegistry::pull("qwen2:7b")?;

// Qwen 2.5
let path = OllamaRegistry::pull("qwen2.5:7b")?;

// Quantized
let path = OllamaRegistry::pull("qwen2:7b-q4_0")?;

Generation Examples¶

Chat Format¶

let prompt = r#"<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
"#;

let config = GenerationConfig {
    max_new_tokens: 256,
    temperature: 0.7,
    stop_sequences: vec!["<|im_end|>".to_string()],
    ..Default::default()
};

let response = model.generate(prompt, &config)?;

Multilingual¶

// Chinese
let response = model.generate("用中文解释人工智能", &config)?;

// Japanese
let response = model.generate("人工知能について説明してください", &config)?;

Memory Requirements¶

Model	F32	F16	Q8_0	Q4_K_M
0.5B	2 GB	1 GB	0.5 GB	0.3 GB
7B	28 GB	14 GB	7 GB	4 GB
14B	56 GB	28 GB	14 GB	8 GB
72B	288 GB	144 GB	72 GB	40 GB