Phi¶
Phi is Microsoft's family of small language models that achieve strong performance despite their compact size, making them ideal for edge deployment.
Overview¶
| Property | Value |
|---|---|
| Architecture | Decoder-only Transformer |
| Parameters | 1.3B (Phi-1), 2.7B (Phi-2), 3.8B (Phi-3) |
| Context Length | 2K-128K (version dependent) |
| Attention | Multi-head / Grouped Query |
| Position Encoding | RoPE |
| Activation | GELU (Phi-1/2), SwiGLU (Phi-3) |
| Normalization | LayerNorm (Phi-1/2), RMSNorm (Phi-3) |
Quick Start¶
use unillm::models_v2::phi::{PhiModelV2, PhiConfig};
use unillm::weight_loader_core::WeightLoader;
use unillm::{Model, GenerationConfig};
// Load model
let weights = WeightLoader::from_gguf("phi-3-mini.gguf")?;
let config = PhiConfig::from_gguf_metadata(weights.metadata())?;
let model = PhiModelV2::from_weights(config, weights)?;
// Generate
let response = model.generate(
"Write a Python function to calculate factorial:",
&GenerationConfig::default(),
)?;
Configuration¶
model_config!(PhiConfig {
vocab_size: usize = 32064,
hidden_size: usize = 3072,
intermediate_size: usize = 8192,
num_hidden_layers: usize = 32,
num_attention_heads: usize = 32,
num_key_value_heads: usize = 32,
max_position_embeddings: usize = 4096,
rope_theta: f32 = 10000.0,
rms_norm_eps: f32 = 1e-5,
partial_rotary_factor: f32 = 0.5,
});
Model Sizes¶
| Variant | hidden_size | num_layers | num_heads | Context |
|---|---|---|---|---|
| Phi-1 | 2048 | 24 | 32 | 2K |
| Phi-1.5 | 2048 | 24 | 32 | 2K |
| Phi-2 | 2560 | 32 | 32 | 2K |
| Phi-3 Mini | 3072 | 32 | 32 | 4K-128K |
| Phi-3 Small | 4096 | 32 | 32 | 8K-128K |
| Phi-3 Medium | 5120 | 40 | 40 | 4K-128K |
Features¶
Partial Rotary Embedding¶
Phi uses partial RoPE application:
let config = PhiConfig {
partial_rotary_factor: 0.5, // Apply RoPE to half of head dim
..Default::default()
};
Extended Context (Phi-3)¶
Long context with RoPE scaling:
let config = PhiConfig {
max_position_embeddings: 131072, // 128K context
rope_theta: 10000.0,
..Default::default()
};
Phi Versions¶
Phi-1 (1.3B)¶
- Code-focused training
- Strong at Python generation
- 2K context
Phi-2 (2.7B)¶
- General purpose
- Textbooks and synthetic data
- Better reasoning than Phi-1
Phi-3 (3.8B-14B)¶
- Latest release (2024)
- Up to 128K context
- Competitive with much larger models
- Available in Mini, Small, Medium
Phi-3-Vision¶
Multimodal variant. See Phi-3-Vision documentation.
Loading from Ollama¶
use unillm::ollama::OllamaRegistry;
// Phi-2
let path = OllamaRegistry::pull("phi:2.7b")?;
// Phi-3
let path = OllamaRegistry::pull("phi3:mini")?;
let path = OllamaRegistry::pull("phi3:medium")?;
// Quantized
let path = OllamaRegistry::pull("phi3:mini-q4_0")?;
Generation Examples¶
Code Generation¶
let config = GenerationConfig {
temperature: 0.2,
max_new_tokens: 256,
..Default::default()
};
let prompt = "def fibonacci(n):
'''Return the nth Fibonacci number'''";
let response = model.generate(prompt, &config)?;
Instruction Following (Phi-3)¶
let prompt = "<|user|>
Explain quantum entanglement in simple terms.
<|end|>
<|assistant|>
";
let config = GenerationConfig {
stop_sequences: vec!["<|end|>".to_string()],
..Default::default()
};
let response = model.generate(prompt, &config)?;
Memory Requirements¶
| Model | F32 | F16 | Q8_0 | Q4_K_M |
|---|---|---|---|---|
| Phi-2 | 11 GB | 5.5 GB | 2.8 GB | 1.6 GB |
| Phi-3 Mini | 15 GB | 7.5 GB | 3.8 GB | 2.2 GB |
| Phi-3 Medium | 56 GB | 28 GB | 14 GB | 8 GB |
Use Cases¶
Ideal For¶
- Edge/mobile deployment - Very small footprint
- Code completion - Strong coding ability
- Resource-constrained - Runs on CPU
- Quick prototyping - Fast inference
Performance Comparison¶
Phi-3 Mini often matches or exceeds models 3-5x its size:
| Task | Phi-3 Mini | LLaMA 2 7B | Mistral 7B |
|---|---|---|---|
| MMLU | 68.8 | 45.3 | 60.1 |
| HumanEval | 59.1 | 12.8 | 30.5 |
| GSM8K | 75.6 | 14.6 | 52.2 |
Best Practices¶
- Use Phi-3 for latest capabilities
- Lower temperature for code generation
- Use 128K context carefully (memory intensive)
- Quantize aggressively - Phi handles Q4 well