Chat Sessions¶
Build conversational applications with proper context management and chat templates.
Chat Templates¶
Modern LLMs use specific prompt formats. Llamafu automatically applies the correct template.
Automatic Template Detection¶
// Uses model's built-in template
final formatted = llamafu.applyChatTemplate(
'', // Empty string = use model's default
[
'user: Hello!',
'assistant: Hi there! How can I help?',
'user: What is the weather like?',
],
addAssistant: true,
);
Common Template Formats¶
Custom Templates¶
final customTemplate = '''
{{#each messages}}
{{#ifEquals role "user"}}User: {{content}}
{{else}}Assistant: {{content}}
{{/ifEquals}}
{{/each}}
''';
final formatted = llamafu.applyChatTemplate(
customTemplate,
messages,
);
Creating Chat Sessions¶
Basic Session¶
final session = llamafu.createChatSession(
systemPrompt: 'You are a helpful assistant.',
);
// Add user message and get response
session.addMessage('user', 'What is Python?');
final response = await session.generate(maxTokens: 200);
// Continue the conversation
session.addMessage('user', 'Show me an example');
final followUp = await session.generate(maxTokens: 300);
Session with History¶
final session = llamafu.createChatSession(
systemPrompt: 'You are a coding assistant.',
history: [
ChatMessage(role: 'user', content: 'What is a function?'),
ChatMessage(role: 'assistant', content: 'A function is...'),
],
);
Managing Context¶
Context Window¶
The context window limits conversation length:
final session = llamafu.createChatSession();
// Check remaining context
print('Used tokens: ${session.usedTokens}');
print('Remaining: ${session.remainingTokens}');
// Conversation will auto-truncate old messages when full
Manual Truncation¶
// Keep only the last N messages
session.truncateHistory(keepLast: 10);
// Or keep messages within token budget
session.truncateToFit(maxTokens: 1500);
Sliding Window¶
final session = llamafu.createChatSession(
contextStrategy: ContextStrategy.slidingWindow,
windowSize: 2048,
);
Streaming Responses¶
session.addMessage('user', 'Tell me a story');
await for (final token in session.generateStream(maxTokens: 500)) {
stdout.write(token);
}
// Message is automatically added to history after completion
System Prompts¶
Setting the System Prompt¶
final session = llamafu.createChatSession(
systemPrompt: '''You are a helpful coding assistant.
You write clean, well-documented code.
You explain your reasoning step by step.''',
);
Updating System Prompt¶
Multi-turn Example¶
class ChatBot {
late final Llamafu _llamafu;
late final ChatSession _session;
Future<void> init() async {
_llamafu = await Llamafu.init(
modelPath: 'models/llama-3.2-1b.gguf',
contextSize: 4096,
);
_session = _llamafu.createChatSession(
systemPrompt: 'You are a friendly assistant.',
);
}
Future<String> chat(String userMessage) async {
_session.addMessage('user', userMessage);
final response = await _session.generate(
maxTokens: 500,
temperature: 0.7,
);
return response;
}
List<ChatMessage> get history => _session.messages;
void clearHistory() {
_session.clear();
}
void dispose() {
_llamafu.dispose();
}
}
Role-Playing¶
final session = llamafu.createChatSession(
systemPrompt: '''You are Sherlock Holmes, the famous detective.
You speak in Victorian English and love solving mysteries.
You often reference your past cases and your friend Watson.''',
);
session.addMessage('user', 'Mr. Holmes, I need your help!');
final response = await session.generate();
// "Ah, do come in and have a seat by the fire..."
Function Calling Pattern¶
Implement tool use with structured output:
final session = llamafu.createChatSession(
systemPrompt: '''You are an assistant with access to tools.
When you need to use a tool, respond with JSON:
{"tool": "tool_name", "args": {...}}
Available tools:
- weather: Get weather for a location. Args: {"location": "city"}
- calculate: Do math. Args: {"expression": "2+2"}
''',
);
session.addMessage('user', 'What is the weather in Paris?');
final response = await session.generate();
// {"tool": "weather", "args": {"location": "Paris"}}
// Parse and execute tool, then continue
final toolResult = await executeWeatherTool('Paris');
session.addMessage('assistant', response);
session.addMessage('user', 'Tool result: $toolResult');
final finalResponse = await session.generate();
Saving and Restoring Sessions¶
Export History¶
Restore Session¶
final historyJson = await File('chat_history.json').readAsString();
final session = llamafu.createChatSession();
session.fromJson(historyJson);
Best Practices¶
1. Keep System Prompts Concise¶
// Good: Focused instructions
systemPrompt: 'You are a helpful coding assistant. Be concise.';
// Avoid: Lengthy instructions that consume context
2. Handle Long Conversations¶
if (session.usedTokens > session.maxTokens * 0.8) {
// Approaching limit, summarize or truncate
session.truncateHistory(keepLast: 5);
}
3. Use Temperature Appropriately¶
// Lower for consistent responses
await session.generate(temperature: 0.3);
// Higher for creative responses
await session.generate(temperature: 0.9);
Next Steps¶
- Text Generation - Advanced generation options
- Examples: Chatbot - Complete chatbot example
- API: Chat Sessions