linear_algebra.k_quantization¶
Module Path¶
Source file: src/linear_algebra/k_quantization.zig
Internal module. This API may change between releases.
Public Constants¶
Super-block size shared by all K-quantization formats. Every 256 elements are quantized together, allowing finer scale granularity than the older Q4_0/Q8_0 formats (which use blocks of 32).
Public Types¶
KQuantType¶
The three K-quantization bit widths.
| Variant | Bits/Weight | Bytes per 256-element block |
|---|---|---|
Q4_K | 4.5 | 144 |
Q5_K | 5.5 | 176 |
Q6_K | 6.5 | 210 |
BlockQ4K¶
pub const BlockQ4K = struct {
d: f16, // super-block scale
dmin: f16, // super-block minimum
scales: [12]u8, // sub-block scales (packed)
qs: [QK_K / 2]u8, // quantized nibbles
};
On-disk and in-memory layout of a single Q4_K block.
BlockQ5K¶
pub const BlockQ5K = struct {
d: f16,
dmin: f16,
scales: [12]u8,
qh: [QK_K / 8]u8, // high bits
qs: [QK_K / 2]u8, // low nibbles
};
BlockQ6K¶
pub const BlockQ6K = struct {
ql: [QK_K / 2]u8, // low 4 bits
qh: [QK_K / 4]u8, // high 2 bits
scales: [QK_K / 16]i8, // sub-block scales
d: f16, // super-block scale
};
KQuantizer¶
Stateless quantizer/dequantizer for K-quant formats.
Public Functions¶
KQuantizer.quantize¶
Quantize a contiguous f32 buffer into packed K-quant bytes. The input length must be a multiple of QK_K (256).
Returns: caller-owned byte slice.
KQuantizer.dequantize¶
Dequantize packed K-quant bytes back to f32.
Error Types¶
error{InvalidAlignment}-- input length is not a multiple ofQK_K.error{OutOfMemory}
Usage Example¶
const kq = @import("zigllama").linear_algebra.k_quantization;
var quantizer = kq.KQuantizer{ .allocator = allocator };
const raw_weights: []const f32 = model_data[offset..][0..4096];
const packed = try quantizer.quantize(raw_weights, .Q4_K);
defer allocator.free(packed);
const restored = try quantizer.dequantize(packed, .Q4_K);
defer allocator.free(restored);
Related Modules¶
linear_algebra.quantization-- Higher-level quantization API.linear_algebra.iq_quantization-- Importance quantization, a complementary family.foundation.gguf_format--GGMLType.Q4_Ketc. map to these block formats.