neural_primitives.activations¶
Module Path¶
Source file: src/neural_primitives/activations.zig
Public Functions¶
All activation functions operate element-wise and are designed for use in neural network layers. Functions accept and return f32.
relu¶
Rectified Linear Unit: max(0, x). The most widely used activation in deep learning, though LLaMA models prefer SiLU/SwiGLU.
gelu¶
Gaussian Error Linear Unit: 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3))). Used in BERT and GPT-2.
silu¶
Sigmoid Linear Unit (also called Swish): x * sigmoid(x). The primary activation in LLaMA feed-forward layers.
swiglu¶
SwiGLU gated activation: silu(a) * b. The gating mechanism used in LLaMA feed-forward networks. Takes two inputs -- one passes through SiLU, the other acts as a gate.
geglu¶
GeGLU gated activation: gelu(a) * b. Variant of SwiGLU using GELU instead of SiLU.
glu¶
Gated Linear Unit: sigmoid(a) * b. The original gating mechanism from Dauphin et al. (2017).
sigmoid¶
Logistic sigmoid: 1 / (1 + exp(-x)). Maps any real number to the range (0, 1).
tanh_activation¶
Hyperbolic tangent: (exp(x) - exp(-x)) / (exp(x) + exp(-x)). Maps to the range (-1, 1). Named tanh_activation to avoid shadowing std.math.tanh.
Error Types¶
Activation functions are pure scalar operations and do not return errors. Overflow/underflow is handled gracefully -- sigmoid clamps to avoid inf, and gelu uses the tanh approximation for numerical stability.
Usage Example¶
const act = @import("zigllama").neural_primitives.activations;
// Single-element activations
const x: f32 = -0.5;
const r = act.relu(x); // 0.0
const s = act.silu(x); // -0.1881
const g = act.gelu(x); // -0.1543
// SwiGLU gating (used in LLaMA FFN)
const gate_input: f32 = 1.2;
const up_input: f32 = 0.8;
const result = act.swiglu(gate_input, up_input);
// Apply to a tensor manually
for (tensor.data) |*elem| {
elem.* = act.silu(elem.*);
}
Related Modules¶
transformers.feed_forward-- Feed-forward networks that use these activations.neural_primitives.normalization-- Often applied before or after activation layers.