Tools and Server¶

This section covers the production-oriented tooling that transforms ZigLlama from an educational library into a deployable inference system. Each tool is built on top of the six foundational layers and exposes their capabilities through ergonomic interfaces.

Section Overview¶

graph LR
    A[CLI Interface] --> B[HTTP Server]
    B --> C[Inference Engine]
    D[Model Converter] --> C
    E[Perplexity Evaluator] --> C
    style B fill:#7c4dff,color:#fff

Page	Description
HTTP Server	OpenAI-compatible REST API with streaming, authentication, and CORS support.
CLI Interface	Command-line launcher for the server with argument parsing, environment configuration, and interactive mode.
Model Converter	Convert between PyTorch, GGUF, SafeTensors, ONNX, and TensorFlow formats with optional quantization.
Perplexity Evaluation	Measure model quality using sliding-window perplexity, bits-per-token, and benchmark suites.

Quick Reference¶

Typical workflow

Convert a HuggingFace model to GGUF with the Model Converter.
Evaluate quantization quality using the Perplexity Evaluator.
Serve the model through the HTTP Server launched via the CLI.
Query the /v1/chat/completions endpoint from any OpenAI-compatible client.

All tools share ZigLlama's allocation-explicit design: memory is managed through Zig's GeneralPurposeAllocator, enabling deterministic cleanup and leak detection in debug builds.

Source Layout¶

src/
  server/
    http_server.zig   # ZigLlamaServer, OpenAI-compatible endpoints
    cli.zig           # Argument parsing, startup banner, help text
  tools/
    model_converter.zig   # ModelConverter, ConversionConfig, format I/O
    converter_cli.zig     # CLI wrapper for the converter
  evaluation/
    perplexity.zig        # PerplexityEvaluator, BenchmarkSuite