Examples and Tutorials¶
ZigLlama ships twelve self-contained example programs in the examples/ directory. Each can be compiled and executed independently; together they form a progressive curriculum that mirrors the six architectural layers.
Example Programs¶
| # | File | Layer | Description | Est. Time |
|---|---|---|---|---|
| 1 | simple_demo.zig | All | End-to-end summary of every layer; prints architecture stats. | 2 min |
| 2 | educational_demo.zig | All | Progressive walkthrough from tensors to generation. | 5 min |
| 3 | benchmark_demo.zig | 2 | Matrix-multiplication benchmarks: naive vs SIMD, varying sizes. | 3 min |
| 4 | model_architectures_demo.zig | 5 | Instantiate and inspect LLaMA, Mistral, GPT-2, Falcon, etc. | 3 min |
| 5 | gguf_demo.zig | 1, 5 | Parse a GGUF file header, list tensors, read metadata. | 2 min |
| 6 | parity_demo.zig | All | Compare ZigLlama outputs against llama.cpp reference values. | 5 min |
| 7 | multi_modal_demo.zig | 5 | Vision-language pipeline: image encoder + text decoder. | 4 min |
| 8 | multi_modal_concepts_demo.zig | 5 | Conceptual overview of multi-modal fusion strategies. | 3 min |
| 9 | threading_demo.zig | 1 | Thread-pool creation, parallel matmul, NUMA awareness. | 3 min |
| 10 | chat_templates_demo.zig | 5, 6 | Apply LLaMA 2, ChatML, Mistral templates to a conversation. | 2 min |
| 11 | perplexity_demo.zig | Eval | Configure evaluator, run benchmarks, compare quantisations. | 4 min |
| 12 | main.zig | All | Master demo that invokes highlights from every other example. | 5 min |
Running an example
Tutorials¶
The tutorials below provide annotated, step-by-step walkthroughs that go deeper than the standalone examples.
| Tutorial | What you will build |
|---|---|
| Your First Inference | Load a model, tokenise a prompt, generate text, and decode the output -- all in ~40 lines of Zig. |
| Understanding Attention | Construct Q, K, V tensors from scratch, compute scaled dot-product attention, and visualise multi-head splits. |
| Quantization in Practice | Take an FP32 model, quantise it to Q4_K, compare outputs, and measure the perplexity delta. |
| Building a Chatbot | Wire the HTTP server, chat templates, and streaming together into an interactive chatbot you can test with curl. |
Demo Walkthroughs¶
For a narrated, line-by-line walkthrough of every example program, see Demo Walkthroughs. Each section explains what the example demonstrates, how to run it, the key concepts it illustrates, and the expected terminal output.
Prerequisites¶
All examples assume you have:
- A working Zig toolchain (0.13+). See Installation.
- The ZigLlama repository cloned and the build system verified (
zig build test). - For model-loading examples: a GGUF model file. The tutorials indicate where to download one when needed.
No GPU required
Every example runs on CPU. SIMD acceleration (AVX2 / NEON) is used automatically when available but is not mandatory.