HTTP Server Design¶

Overview¶

The unikernel includes a minimal HTTP server for serving LLM inference requests. Since we're in a unikernel environment, standard networking libraries are not available -- the server is built on top of a custom network stack with PCI enumeration and an Intel e1000 NIC driver.

Implementation Approach¶

Network Stack¶

Ethernet Frame Handling -- Raw packet processing via e1000 driver
IP Protocol -- Basic IPv4 implementation
TCP Protocol -- Connection-oriented transport
HTTP/1.1 Subset -- Only the methods needed for our API

API Endpoints¶

Endpoint	Method	Description
`/health`	GET	Health check
`/models/load`	POST	Load a model
`/models/unload`	POST	Unload current model
`/completion`	POST	Generate text completion
`/embedding`	POST	Generate embeddings

Key Components¶

Network Interface (network.c) -- PCI device discovery, e1000 driver, packet I/O
HTTP Parser (http.c) -- Parse incoming HTTP/1.1 requests
Router (api.c) -- Route requests to appropriate handlers
JSON Handler (json.c) -- Parse and generate JSON request/response bodies
Connection Manager -- Handle connections via the network loop

Simplifications for Unikernel¶

Single Threaded -- No complex threading; one request at a time
Limited Connections -- Sequential request processing
No File System -- Serve responses directly from memory
Static Configuration -- All configuration baked in at build time

Implementation Order¶

Basic network interface -- get packets in and out
HTTP request parser -- parse incoming requests
Simple response generator -- generate HTTP responses
API endpoint handlers -- implement each endpoint
Integration with llama.cpp -- connect to LLM engine