Architecture Overview¶
This document provides a high-level overview of the CLLM unikernel architecture.
Components¶
+-----------------------------------------------------------+
| QEMU / Bare Metal (x86, Multiboot) |
+-----------------------------------------------------------+
| boot.S Multiboot entry, stack, serial init |
| kernel.c Kernel main, VGA terminal, serial I/O |
| memory.c Heap allocator (malloc/free) |
| string.c libc subset (snprintf, memcpy, ...) |
| network.c PCI enumeration + e1000 NIC driver |
| http.c / api.c HTTP server, request routing |
| api_v1.c llama.cpp-compatible REST API |
| llm.c Model loading and inference interface |
+-----------------------------------------------------------+
- Unikernel Core -- Minimal operating system services (boot, memory, I/O)
- LLM Engine -- Integration of llama.cpp for model inference
- HTTP Server -- REST API for model interaction
- Configuration System -- JSON-based build and runtime configuration
- GPU Support -- CUDA and other backend integrations (planned)
Data Flow¶
- Configuration is loaded at build time or startup
- Models are loaded into memory (or baked into the kernel binary)
- HTTP requests are processed by the server
- LLM engine performs inference
- Results are returned via HTTP responses
Memory Management¶
The unikernel implements a custom heap allocator optimized for LLM workloads, providing malloc and free within a 4 MB statically-allocated arena.