Skip to content

Chat Sessions

Build conversational applications with proper context management and chat templates.

Chat Templates

Modern LLMs use specific prompt formats. Llamafu automatically applies the correct template.

Automatic Template Detection

// Uses model's built-in template
final formatted = llamafu.applyChatTemplate(
  '',  // Empty string = use model's default
  [
    'user: Hello!',
    'assistant: Hi there! How can I help?',
    'user: What is the weather like?',
  ],
  addAssistant: true,
);

Common Template Formats

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
[INST] Hello! [/INST]

Custom Templates

final customTemplate = '''
{{#each messages}}
{{#ifEquals role "user"}}User: {{content}}
{{else}}Assistant: {{content}}
{{/ifEquals}}
{{/each}}
''';

final formatted = llamafu.applyChatTemplate(
  customTemplate,
  messages,
);

Creating Chat Sessions

Basic Session

final session = llamafu.createChatSession(
  systemPrompt: 'You are a helpful assistant.',
);

// Add user message and get response
session.addMessage('user', 'What is Python?');
final response = await session.generate(maxTokens: 200);

// Continue the conversation
session.addMessage('user', 'Show me an example');
final followUp = await session.generate(maxTokens: 300);

Session with History

final session = llamafu.createChatSession(
  systemPrompt: 'You are a coding assistant.',
  history: [
    ChatMessage(role: 'user', content: 'What is a function?'),
    ChatMessage(role: 'assistant', content: 'A function is...'),
  ],
);

Managing Context

Context Window

The context window limits conversation length:

final session = llamafu.createChatSession();

// Check remaining context
print('Used tokens: ${session.usedTokens}');
print('Remaining: ${session.remainingTokens}');

// Conversation will auto-truncate old messages when full

Manual Truncation

// Keep only the last N messages
session.truncateHistory(keepLast: 10);

// Or keep messages within token budget
session.truncateToFit(maxTokens: 1500);

Sliding Window

final session = llamafu.createChatSession(
  contextStrategy: ContextStrategy.slidingWindow,
  windowSize: 2048,
);

Streaming Responses

session.addMessage('user', 'Tell me a story');

await for (final token in session.generateStream(maxTokens: 500)) {
  stdout.write(token);
}

// Message is automatically added to history after completion

System Prompts

Setting the System Prompt

final session = llamafu.createChatSession(
  systemPrompt: '''You are a helpful coding assistant.
You write clean, well-documented code.
You explain your reasoning step by step.''',
);

Updating System Prompt

session.setSystemPrompt('You are now a creative writer.');

Multi-turn Example

class ChatBot {
  late final Llamafu _llamafu;
  late final ChatSession _session;

  Future<void> init() async {
    _llamafu = await Llamafu.init(
      modelPath: 'models/llama-3.2-1b.gguf',
      contextSize: 4096,
    );

    _session = _llamafu.createChatSession(
      systemPrompt: 'You are a friendly assistant.',
    );
  }

  Future<String> chat(String userMessage) async {
    _session.addMessage('user', userMessage);

    final response = await _session.generate(
      maxTokens: 500,
      temperature: 0.7,
    );

    return response;
  }

  List<ChatMessage> get history => _session.messages;

  void clearHistory() {
    _session.clear();
  }

  void dispose() {
    _llamafu.dispose();
  }
}

Role-Playing

final session = llamafu.createChatSession(
  systemPrompt: '''You are Sherlock Holmes, the famous detective.
You speak in Victorian English and love solving mysteries.
You often reference your past cases and your friend Watson.''',
);

session.addMessage('user', 'Mr. Holmes, I need your help!');
final response = await session.generate();
// "Ah, do come in and have a seat by the fire..."

Function Calling Pattern

Implement tool use with structured output:

final session = llamafu.createChatSession(
  systemPrompt: '''You are an assistant with access to tools.
When you need to use a tool, respond with JSON:
{"tool": "tool_name", "args": {...}}

Available tools:
- weather: Get weather for a location. Args: {"location": "city"}
- calculate: Do math. Args: {"expression": "2+2"}
''',
);

session.addMessage('user', 'What is the weather in Paris?');
final response = await session.generate();
// {"tool": "weather", "args": {"location": "Paris"}}

// Parse and execute tool, then continue
final toolResult = await executeWeatherTool('Paris');
session.addMessage('assistant', response);
session.addMessage('user', 'Tool result: $toolResult');
final finalResponse = await session.generate();

Saving and Restoring Sessions

Export History

final historyJson = session.toJson();
await File('chat_history.json').writeAsString(historyJson);

Restore Session

final historyJson = await File('chat_history.json').readAsString();
final session = llamafu.createChatSession();
session.fromJson(historyJson);

Best Practices

1. Keep System Prompts Concise

// Good: Focused instructions
systemPrompt: 'You are a helpful coding assistant. Be concise.';

// Avoid: Lengthy instructions that consume context

2. Handle Long Conversations

if (session.usedTokens > session.maxTokens * 0.8) {
  // Approaching limit, summarize or truncate
  session.truncateHistory(keepLast: 5);
}

3. Use Temperature Appropriately

// Lower for consistent responses
await session.generate(temperature: 0.3);

// Higher for creative responses
await session.generate(temperature: 0.9);

Next Steps