Skip to content
ZigLlama
Tags
Initializing search
cognisoc/zigllm
Home
Getting Started
Architecture
Layer 1: Foundations
Layer 2: Linear Algebra
Layer 3: Neural Primitives
Layer 4: Transformers
Layer 5: Models
Layer 6: Inference
Tools & Server
API Reference
Examples & Tutorials
Performance
References
ZigLlama
cognisoc/zigllm
Home
Getting Started
Getting Started
Installation
Quick Start
Building from Source
Project Structure
Architecture
Architecture
Design Principles
6-Layer Overview
Module Dependencies
Comparison with llama.cpp
Layer 1: Foundations
Layer 1: Foundations
Tensor Operations
Memory Management
Memory Mapping
GGUF Binary Format
BLAS Integration
CPU Threading & NUMA
Layer 2: Linear Algebra
Layer 2: Linear Algebra
SIMD Matrix Operations
Basic Quantization
K-Quantization
Importance Quantization
Performance Analysis
Layer 3: Neural Primitives
Layer 3: Neural Primitives
Activation Functions
Normalization Layers
Embedding Systems
Layer 4: Transformers
Layer 4: Transformers
Attention Mechanisms
Feed-Forward Networks
Transformer Blocks
Layer 5: Models
Layer 5: Models
Model Configuration
Tokenization
Chat Templates
GGUF Model Loading
LLaMA / LLaMA 2
Mistral
GPT-2
Falcon
Qwen
Phi
GPT-J
GPT-NeoX
BLOOM
Mamba
BERT
Gemma
StarCoder
Mixture of Experts
Multi-Modal
Layer 6: Inference
Layer 6: Inference
Text Generation
Sampling Strategies
Advanced Sampling
KV Cache
Streaming
Batch Processing
Grammar Constraints
Performance Profiling
Tools & Server
Tools & Server
HTTP Server
CLI Interface
Model Converter
Perplexity Evaluation
API Reference
API Reference
foundation.tensor
foundation.memory_mapping
foundation.gguf_format
foundation.blas_integration
foundation.threading
linear_algebra.matrix_ops
linear_algebra.quantization
linear_algebra.k_quantization
linear_algebra.iq_quantization
neural_primitives.activations
neural_primitives.normalization
neural_primitives.embeddings
transformers.attention
transformers.feed_forward
transformers.transformer_block
models.llama
models.config
models.tokenizer
models.gguf
inference.generation
inference.kv_cache
inference.streaming
inference.batching
inference.profiling
inference.advanced_sampling
inference.grammar_constraints
Examples & Tutorials
Examples & Tutorials
Tutorial: Your First Inference
Tutorial: Understanding Attention
Tutorial: Quantization in Practice
Tutorial: Building a Chatbot
Demo Walkthroughs
Performance
Performance
Benchmarks
Optimization Guide
Parity Analysis
Memory Profiling
References
References
Academic Papers
Glossary
Contributing
Changelog
Tags
Tags
¶
Back to top