Chat Templates¶

Overview¶

Chat templates define how multi-turn conversations are formatted into the flat token sequences that language models consume. Different model families use different formatting conventions -- special delimiter tokens, role prefixes, system message placement -- and using the wrong template for a given model produces degraded or nonsensical output.

ZigLLM implements 10 chat template formats in src/models/chat_templates.zig, covering the major model families deployed in practice.

ChatMessage Struct¶

Every message in a conversation is represented as a ChatMessage:

pub const ChatMessage = struct {
    role: []const u8,       // "system", "user", "assistant", or "function"
    content: []const u8,    // The message text
    name: ?[]const u8 = null, // Optional sender name
};

Valid Roles

system: Instructions that set the assistant's behavior for the entire conversation.
user: Messages from the human interacting with the model.
assistant: Messages from the model (previous turns in multi-turn conversations).
function: Tool/function call results (used in agentic workflows).

Supported Templates¶

The TemplateType enum lists all supported formats:

pub const TemplateType = enum {
    Llama2,      // Meta LLaMA 2 chat format
    CodeLlama,   // Code-specific LLaMA format
    Llama3,      // Meta LLaMA 3 Instruct format
    Mistral,     // Mistral Instruct format
    ChatML,      // OpenAI ChatML format (used by Qwen, Yi, etc.)
    Alpaca,      // Stanford Alpaca instruction format
    Vicuna,      // LMSYS Vicuna format
    Orca,        // Microsoft Orca format
    GPT4,        // OpenAI GPT-4 style
    Claude,      // Anthropic Claude style
    Custom,      // User-defined template
};

Template Structure¶

Each ChatTemplate instance defines the delimiters for every component:

pub const ChatTemplate = struct {
    template_type: TemplateType,
    system_prefix: []const u8,
    system_suffix: []const u8,
    user_prefix: []const u8,
    user_suffix: []const u8,
    assistant_prefix: []const u8,
    assistant_suffix: []const u8,
    bos_token: []const u8,
    eos_token: []const u8,
    separator: []const u8,
    stop_sequences: [][]const u8,
    add_generation_prompt: bool,
};

Template Reference Table¶

The following table summarizes the key delimiters for each template.

Template	System Prefix	User Prefix	User Suffix	Asst Prefix	BOS	EOS
Llama2	`[INST] <<SYS>>\n`	`[INST]`	`[/INST]`		`<s>`	`</s>`
CodeLlama	`[INST] <<SYS>>\n`	`[INST]`	`[/INST]`		`<s>`	`</s>`
Llama3	`<\\|start_header_id\\|>system...`	`<\\|start_header_id\\|>user...`	`<\\|eot_id\\|>`	`<\\|start_header_id\\|>asst...`	`<\\|begin_of_text\\|>`	`<\\|end_of_text\\|>`
Mistral	(none)	`[INST]`	`[/INST]`	(empty)	`<s>`	`</s>`
ChatML	`<\\|im_start\\|>system\n`	`<\\|im_start\\|>user\n`	`<\\|im_end\\|>`	`<\\|im_start\\|>assistant\n`	(empty)	`<\\|endoftext\\|>`
Alpaca	`### System:\n`	`### Human:\n`	`\n\n`	`### Assistant:\n`	(empty)	(empty)
Vicuna	(none)	`USER:`	`\n`	`ASSISTANT:`	(empty)	`</s>`
Orca	`System:\n`	`User:\n`	`\n`	`Assistant:\n`	(empty)	`<\\|im_end\\|>`
GPT4	(none)	(empty)	(empty)	(empty)	(empty)	`<\\|endoftext\\|>`
Claude	(none)	`\n\nHuman:`	(empty)	`\n\nAssistant:`	(empty)	(empty)

Template Examples¶

Llama 2 Format¶

Given messages: system="Be helpful", user="Hello", assistant="Hi!", user="How are you?"

Formatted output:

<s>[INST] <<SYS>>
Be helpful
<</SYS>>

Hello [/INST] Hi! </s><s>[INST] How are you? [/INST]

Llama 3 Format¶

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Be helpful<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hi!<|eot_id|><|start_header_id|>user<|end_header_id|>

How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

ChatML Format¶

<|im_start|>system
Be helpful<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi!<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant

Claude Format¶

\n\nHuman: Hello\n\nAssistant: Hi!\n\nHuman: How are you?\n\nAssistant:

The formatMessages API¶

Creating a Template¶

Templates are created via the ChatTemplate.create() factory method:

const template = try ChatTemplate.create(.Llama2, allocator);
defer template.deinit(allocator);

Applying a Template¶

The apply() method formats a slice of ChatMessage into a single string:

const messages = [_]ChatMessage{
    .{ .role = "system", .content = "You are a helpful assistant." },
    .{ .role = "user", .content = "What is 2+2?" },
};

const formatted = try template.apply(&messages, allocator);
defer allocator.free(formatted);
// formatted now contains the properly delimited conversation string

Apply Algorithm

Append the BOS token.
Separate system messages from conversation messages.
If a system message exists, wrap it with system_prefix and system_suffix.
For each conversation message, wrap with the appropriate role prefix/suffix.
Insert separator between messages.
If add_generation_prompt is true and the last message is from the user, append separator + assistant_prefix to prompt the model.

flowchart LR
    A["ChatMessage[]"] --> B["Separate\nsystem msg"]
    B --> C["Wrap system"]
    C --> D["Wrap user/asst\nmessages"]
    D --> E{"Last msg\nis user?"}
    E -->|Yes| F["Append\nasst prefix"]
    E -->|No| G["Done"]
    F --> G

    style A fill:#f0f0f0,color:#333
    style G fill:#4a9eff,color:#fff

ChatTemplateManager¶

The ChatTemplateManager provides a higher-level API for managing multiple templates and auto-detecting the correct template from a model name.

var manager = ChatTemplateManager.init(allocator);
defer manager.deinit();

// Apply template directly (loads on demand)
const formatted = try manager.applyTemplate(.ChatML, &messages);

// Auto-detect template from model name
const detected = ChatTemplateManager.detectTemplate("meta-llama/Llama-2-7b-chat-hf");
// Returns .Llama2

Auto-Detection Rules¶

Model Name Contains	Detected Template
`llama-3`, `llama3`	Llama3
`code-llama`, `codellama`	CodeLlama
`llama-2`, `llama2`, `llama`	Llama2
`mistral`, `mixtral`	Mistral
`gpt-4`, `gpt4`	GPT4
`alpaca`	Alpaca
`vicuna`	Vicuna
`orca`	Orca
`claude`	Claude
(default)	ChatML

Default to ChatML

When the model name does not match any known pattern, the manager defaults to ChatML. This is a reasonable default because ChatML is widely adopted by community fine-tuned models and has clear, unambiguous delimiters.

Conversation Validation¶

The ChatTemplateUtils module provides utilities for validating and manipulating conversations before template application.

// Validate a single message
const is_valid = ChatTemplateUtils.validateMessage(message);

// Validate entire conversation (checks alternating user/assistant pattern)
const conv_valid = ChatTemplateUtils.validateConversation(&messages);

// Estimate token count (rough: 1 token per 4 characters)
const est_tokens = ChatTemplateUtils.estimateTokenCount(formatted_text);

// Truncate conversation to fit context window
const truncated = try ChatTemplateUtils.truncateConversation(
    &messages, max_tokens, allocator, template,
);

Conversation Structure

A valid conversation should follow the pattern: [system]? (user assistant)* user That is, an optional system message, followed by alternating user/assistant pairs, ending with a user message that the model will respond to.

Custom Templates¶

To define a custom template, use the Custom template type and modify its fields:

var custom = try ChatTemplate.create(.Custom, allocator);

// All fields start empty -- customize as needed
custom.system_prefix = "[SYS]";
custom.system_suffix = "[/SYS]";
custom.user_prefix = "[USER]";
custom.user_suffix = "[/USER]";
custom.assistant_prefix = "[BOT]";
custom.assistant_suffix = "[/BOT]";
custom.add_generation_prompt = true;

const formatted = try custom.apply(&messages, allocator);

Stop Sequences¶

Each template defines stop sequences that signal the end of generation. These are critical for chat applications -- without proper stop sequences, the model may generate text beyond its intended turn.

Template	Stop Sequences
Llama2	`</s>`, `[/INST]`
CodeLlama	`</s>`, `[/INST]`, ```
Llama3	`<\\|end_of_text\\|>`, `<\\|eot_id\\|>`, `<\\|start_header_id\\|>`
Mistral	`</s>`, `[/INST]`
ChatML	`<\\|im_end\\|>`, `<\\|im_start\\|>`, `<\\|endoftext\\|>`
Claude	`\n\nHuman:`, `\n\nAssistant:`

References¶

Touvron, H. et al. "Llama 2: Open Foundation and Fine-Tuned Chat Models." arXiv:2307.09288, 2023. ↩
Zheng, L. et al. "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena." NeurIPS, 2023. ↩