Skip to content

inference.profiling

Module Path

zigllama.inference.profiling

Source file: src/inference/profiling.zig


Public Types

MeasurementPoint

pub const MeasurementPoint = struct {
    name: []const u8,
    start_ns: u64,
    end_ns: ?u64,
    parent: ?*MeasurementPoint,
    children: std.ArrayList(*MeasurementPoint),
};

A single timed region. Measurement points form a tree so nested operations (e.g., attention inside a transformer block) are tracked hierarchically.

PerformanceStats

pub const PerformanceStats = struct {
    min: f64,
    max: f64,
    avg: f64,
    median: f64,
    p95: f64,
    p99: f64,
    std_dev: f64,
    count: usize,
    total: f64,
};

Aggregate statistics computed from a series of measurements. All timing values are in milliseconds.

Profiler

pub const Profiler = struct {
    measurements: std.StringHashMap(std.ArrayList(f64)),
    active_points: std.ArrayList(*MeasurementPoint),
    enabled: bool,
    allocator: std.mem.Allocator,
};

Collects timing data for named code regions. Thread-safe when each thread uses its own Profiler instance.

BenchmarkRunner

pub const BenchmarkRunner = struct {
    name: []const u8,
    warmup_iterations: usize,
    profiler: Profiler,
    allocator: std.mem.Allocator,
};

Runs a function multiple times with warm-up iterations and produces PerformanceStats.


Public Functions

Profiler.init

pub fn init(allocator: std.mem.Allocator) Profiler

Create a new profiler with no recorded measurements.

Profiler.deinit

pub fn deinit(self: *Profiler) void

Free all stored measurements.

Profiler.measureBlock

pub fn measureBlock(self: *Profiler, name: []const u8) MeasurementGuard

Begin timing a named block. Returns a guard object; timing stops when the guard goes out of scope (via defer guard.end()).

{
    var guard = profiler.measureBlock("attention");
    defer guard.end();
    // ... attention computation ...
}

Profiler.getStats

pub fn getStats(self: *Profiler, name: []const u8) ?PerformanceStats

Compute aggregate statistics for the named block. Returns null if no measurements have been recorded under that name.

Profiler.report

pub fn report(self: *Profiler, writer: anytype) !void

Print a formatted table of all recorded blocks with their statistics.

BenchmarkRunner.init

pub fn init(
    name: []const u8,
    warmup_iterations: usize,
    allocator: std.mem.Allocator,
) BenchmarkRunner

Create a benchmark runner.

BenchmarkRunner.run

pub fn run(
    self: *BenchmarkRunner,
    func: *const fn () void,
    iterations: usize,
) !PerformanceStats

Execute func for warmup_iterations + iterations times, discarding the warm-up measurements, and return statistics for the measured iterations.


Error Types

  • error{OutOfMemory}

Usage Example

const prof = @import("zigllama").inference.profiling;

var profiler = prof.Profiler.init(allocator);
defer profiler.deinit();

// Profile a forward pass
{
    var guard = profiler.measureBlock("forward_pass");
    defer guard.end();
    _ = try model.forward(tokens, allocator);
}

// Get statistics
if (profiler.getStats("forward_pass")) |stats| {
    std.debug.print("Forward pass: avg={d:.2}ms, p99={d:.2}ms\n", .{
        stats.avg, stats.p99,
    });
}

// Or run a benchmark
var bench = prof.BenchmarkRunner.init("inference", 5, allocator);
const stats = try bench.run(inferenceFunc, 100);
std.debug.print("Throughput: {d:.1} tokens/sec\n", .{1000.0 / stats.avg});