Skip to content

Architecture

This section provides a comprehensive treatment of ZigLlama's software architecture -- the design decisions, structural invariants, and module boundaries that make the project both an effective educational resource and a capable inference engine.


Section Map

Page Focus
Design Principles The philosophical and engineering tenets that guided every line of code -- educational clarity, progressive complexity, test-driven development, and documentation-as-code.
The 6-Layer Progressive Architecture A detailed walkthrough of each architectural layer, from low-level tensor storage through linear algebra, neural primitives, transformer blocks, model definitions, and finally the inference engine. Includes dependency diagrams, data-flow sequences, and per-layer component inventories.
Module Dependencies The full import graph, public API surface (26 re-exported modules), internal modules, and the strict layering rule that keeps the dependency DAG acyclic.
Comparison with llama.cpp A side-by-side feature, quantization, model-coverage, and performance analysis against the industry-standard C++ implementation, together with ZigLlama's unique value proposition.

Quick Orientation

ZigLlama is structured as a six-layer progressive stack. Each layer depends only on layers below it, producing a clean directed acyclic graph (DAG) of imports:

Layer 6  Inference        -- generation, caching, streaming, batching
Layer 5  Models           -- LLaMA + 17 other architectures, GGUF, tokenizers
Layer 4  Transformers     -- attention, feed-forward, transformer blocks
Layer 3  Neural Primitives-- activations, normalization, embeddings
Layer 2  Linear Algebra   -- SIMD matmul, quantization (Q/K/IQ)
Layer 1  Foundation       -- Tensor(T), MemoryMap, GGUF reader, BLAS, threading

Reading Order

If you are new to the project, start with Design Principles to understand why the architecture looks the way it does, then proceed through the 6-Layer Overview for the what, and finally consult Module Dependencies for the precise how of the import graph.


Key Metrics at a Glance

Metric Value
Architectural layers 6
Public modules (re-exported) 26
Internal modules 12
Model architectures supported 18
Quantization formats 18+ (Q4_0 through IQ4_NL)
Test count 285+
Lines of Zig (approx.) 15 000+