Project Specification¶

The goal of this project is to accomplish the following in phases:

Have a working C unikernel in QEMU targeting x86
Compile llama.cpp into the unikernel to serve models over HTTP, with a JSON-based configuration provided for the unikernel build
Add GPU support to the unikernel, supporting all backends (starting with CUDA) as done by llama.cpp
Port optimizations from vLLM into llama.cpp to accelerate serving of transformer models

Each phase is important and needs to be done correctly. The project builds the right code abstractions with detailed comments throughout.