transformers.transformer_block¶
Module Path¶
Source file: src/transformers/transformer_block.zig
Public Types¶
NormPlacement¶
| Variant | Description |
|---|---|
PreNorm | Normalize before attention/FFN (LLaMA, GPT-2 style) |
PostNorm | Normalize after attention/FFN (original Transformer) |
Pre-normalization provides better gradient flow in deep networks and is used by all LLaMA variants.
TransformerBlock¶
pub const TransformerBlock = struct {
attention: MultiHeadAttention,
ffn: FeedForward,
norm1: Tensor(f32), // attention sub-layer norm weights
norm2: Tensor(f32), // FFN sub-layer norm weights
norm_placement: NormPlacement,
norm_eps: f32,
allocator: std.mem.Allocator,
};
A single transformer layer containing attention, feed-forward, and normalization sub-layers with residual connections.
Transformer¶
pub const Transformer = struct {
blocks: []TransformerBlock,
d_model: usize,
num_layers: usize,
allocator: std.mem.Allocator,
};
Stack of TransformerBlock layers. The full encoder or decoder body of a transformer model.
Public Functions¶
TransformerBlock.init¶
pub fn init(
num_heads: usize,
d_model: usize,
d_ff: usize,
ffn_type: FFNType,
norm_placement: NormPlacement,
norm_eps: f32,
allocator: std.mem.Allocator,
) !TransformerBlock
Allocate a single transformer block with all sub-layer weights.
TransformerBlock.deinit¶
Free all owned tensors.
TransformerBlock.forward¶
pub fn forward(
self: TransformerBlock,
input: Tensor(f32),
mask: ?Tensor(f32),
allocator: std.mem.Allocator,
) !Tensor(f32)
Forward pass through one transformer layer.
For PreNorm (LLaMA):
Parameters:
| Name | Type | Description |
|---|---|---|
input | Tensor(f32) | Shape [seq_len, d_model] |
mask | ?Tensor(f32) | Optional causal or padding mask |
allocator | Allocator | Memory allocator |
Returns: !Tensor(f32) with shape [seq_len, d_model].
Transformer.init¶
pub fn init(
num_layers: usize,
num_heads: usize,
d_model: usize,
d_ff: usize,
ffn_type: FFNType,
norm_placement: NormPlacement,
norm_eps: f32,
allocator: std.mem.Allocator,
) !Transformer
Allocate the full stack of transformer blocks.
Transformer.deinit¶
Free all blocks.
Transformer.forward¶
pub fn forward(
self: Transformer,
input: Tensor(f32),
mask: ?Tensor(f32),
allocator: std.mem.Allocator,
) !Tensor(f32)
Run the input through every layer sequentially. Each layer receives the output of the previous one.
Error Types¶
TensorError.IncompatibleShapesTensorError.OutOfMemory
Usage Example¶
const tb = @import("zigllama").transformers.transformer_block;
const Tensor = @import("zigllama").foundation.tensor.Tensor;
// Build a 32-layer LLaMA-style transformer body
var transformer = try tb.Transformer.init(
32, // num_layers
32, // num_heads
4096, // d_model
11008, // d_ff
.SwiGLU,
.PreNorm,
1e-5,
allocator,
);
defer transformer.deinit();
var input = try Tensor(f32).init(allocator, &[_]usize{ 128, 4096 });
defer input.deinit();
var output = try transformer.forward(input, null, allocator);
defer output.deinit();
Related Modules¶
transformers.attention-- The attention sub-layer.transformers.feed_forward-- The FFN sub-layer.neural_primitives.normalization-- Normalization applied within each block.models.llama-- WrapsTransformerwith embeddings and an output head.