Architecture Overview¶

This document provides a high-level overview of the CLLM unikernel architecture.

Components¶

+-----------------------------------------------------------+
|  QEMU / Bare Metal  (x86, Multiboot)                     |
+-----------------------------------------------------------+
|  boot.S             Multiboot entry, stack, serial init   |
|  kernel.c           Kernel main, VGA terminal, serial I/O |
|  memory.c           Heap allocator (malloc/free)          |
|  string.c           libc subset (snprintf, memcpy, ...)   |
|  network.c          PCI enumeration + e1000 NIC driver    |
|  http.c / api.c     HTTP server, request routing          |
|  api_v1.c           llama.cpp-compatible REST API         |
|  llm.c              Model loading and inference interface |
+-----------------------------------------------------------+

Unikernel Core -- Minimal operating system services (boot, memory, I/O)
LLM Engine -- Integration of llama.cpp for model inference
HTTP Server -- REST API for model interaction
Configuration System -- JSON-based build and runtime configuration
GPU Support -- CUDA and other backend integrations (planned)

Data Flow¶

Configuration is loaded at build time or startup
Models are loaded into memory (or baked into the kernel binary)
HTTP requests are processed by the server
LLM engine performs inference
Results are returned via HTTP responses

Memory Management¶

The unikernel implements a custom heap allocator optimized for LLM workloads, providing malloc and free within a 4 MB statically-allocated arena.