Model Parameters¶
Parameters for model initialization and configuration.
ModelParams¶
Parameters passed to Llamafu.init().
class ModelParams {
final String modelPath;
final String? mmprojPath;
final int contextSize;
final int threads;
final int threadsBatch;
final int gpuLayers;
final bool useMmap;
final bool useMlock;
final int seed;
}
Parameter Details¶
modelPath¶
Path to the GGUF model file.
- Type:
String - Required: Yes
- Example:
'models/llama-3.2-1b-q4.gguf'
Supports:
- Absolute paths: /home/user/models/model.gguf
- Relative paths: models/model.gguf
- Asset paths (Flutter): assets/models/model.gguf
mmprojPath¶
Path to multimodal projector file for vision/audio models.
- Type:
String? - Default:
null - Example:
'models/mmproj.gguf'
Required for: - LLaVA models - nanoLLaVA - Qwen2-VL - Ultravox (audio)
contextSize¶
Maximum context window size in tokens.
- Type:
int - Default:
2048 - Range:
64to model maximum
| Use Case | Recommended Size |
|---|---|
| Short Q&A | 512 |
| Chat | 2048 |
| Long documents | 4096-8192 |
| Full context | Model maximum |
Memory usage scales with context size:
threads¶
Number of CPU threads for inference.
- Type:
int - Default:
0(auto-detect) - Range:
1to CPU core count
Recommendations:
- Mobile: 2-4
- Desktop: 4-8
- Auto (0): Uses physical core count
threadsBatch¶
Number of threads for batch operations.
- Type:
int - Default:
0(same asthreads)
Usually set equal to threads unless doing batch inference.
gpuLayers¶
Number of layers to offload to GPU.
- Type:
int - Default:
0(CPU only) - Range:
0to layer count
| Value | Behavior |
|---|---|
0 |
CPU only |
1-n |
Partial GPU offload |
99 |
Full GPU offload (all layers) |
Requirements: - macOS/iOS: Metal-capable device - Linux/Windows: CUDA toolkit + NVIDIA GPU
useMmap¶
Use memory mapping for model file.
- Type:
bool - Default:
true
Benefits: - Faster model loading - Reduced RAM usage (OS manages pages) - Shared memory across processes
Disable if: - Model is on network drive - Experiencing stability issues
useMlock¶
Lock model in RAM (prevent swapping).
- Type:
bool - Default:
false
Benefits: - Consistent performance - No page faults during inference
Requirements: - Sufficient RAM for entire model - May require elevated privileges
seed¶
Random seed for reproducibility.
- Type:
int - Default:
0(random)
Same seed + same parameters = same output (for non-zero temperature).
Configuration Examples¶
Mobile (Low Memory)¶
final llamafu = await Llamafu.init(
modelPath: 'models/smollm-135m-q4.gguf',
contextSize: 512,
threads: 2,
useMmap: true,
useMlock: false,
);
Desktop (Balanced)¶
final llamafu = await Llamafu.init(
modelPath: 'models/llama-3.2-1b-q4.gguf',
contextSize: 4096,
threads: 6,
useMmap: true,
);
Desktop (GPU)¶
final llamafu = await Llamafu.init(
modelPath: 'models/llama-3.2-7b-q4.gguf',
contextSize: 8192,
gpuLayers: 99,
threads: 4,
);
Vision Model¶
final llamafu = await Llamafu.init(
modelPath: 'models/nanollava.gguf',
mmprojPath: 'models/nanollava-mmproj.gguf',
contextSize: 2048,
);
Reproducible Output¶
final llamafu = await Llamafu.init(
modelPath: 'models/model.gguf',
seed: 42,
);
// Same prompt + seed = same output
final response1 = await llamafu.complete(prompt, seed: 42);
final response2 = await llamafu.complete(prompt, seed: 42);
// response1 == response2
Runtime Configuration¶
Some parameters can be adjusted after initialization:
// Adjust threads
llamafu.setThreadCount(threads: 4, threadsBatch: 2);
// Cannot change after init:
// - modelPath
// - mmprojPath
// - contextSize
// - gpuLayers
// - useMmap
// - useMlock
Validation¶
Invalid parameters throw LlamafuError:
try {
await Llamafu.init(
modelPath: 'nonexistent.gguf', // Throws
);
} on LlamafuModelLoadError catch (e) {
print('Failed to load: ${e.message}');
}
Common validation errors:
| Error | Cause |
|---|---|
INVALID_MODEL_PATH |
File doesn't exist |
INVALID_MODEL_FORMAT |
Not a valid GGUF file |
CONTEXT_TOO_LARGE |
Context exceeds model maximum |
OUT_OF_MEMORY |
Insufficient memory |