Text Generation¶
Comprehensive guide to text generation with Llamafu.
Generation Parameters¶
Temperature¶
Controls randomness in output. Higher values produce more creative but less predictable text.
// Deterministic output
final response = await llamafu.complete(prompt, temperature: 0.0);
// Balanced creativity
final response = await llamafu.complete(prompt, temperature: 0.7);
// Maximum creativity
final response = await llamafu.complete(prompt, temperature: 1.5);
| Value | Use Case |
|---|---|
| 0.0 | Factual Q&A, code generation |
| 0.3-0.5 | Summarization, translation |
| 0.7-0.9 | Creative writing, chat |
| 1.0+ | Brainstorming, poetry |
Top-K Sampling¶
Limits token selection to the K most likely tokens.
Top-P (Nucleus) Sampling¶
Selects from the smallest set of tokens whose cumulative probability exceeds P.
Combining Samplers¶
Samplers are applied in order: Top-K → Top-P → Temperature
Repetition Control¶
Repeat Penalty¶
Penalizes tokens that have appeared recently.
final response = await llamafu.complete(
prompt,
repeatPenalty: 1.1, // Slight penalty (1.0 = no penalty)
);
Frequency and Presence Penalties¶
Fine-grained control over repetition:
final sampler = llamafu.createPenaltySampler(
penaltyLastN: 64, // Look back 64 tokens
penaltyRepeat: 1.1, // Repeat penalty
penaltyFreq: 0.1, // Frequency penalty
penaltyPresent: 0.1, // Presence penalty
);
Structured Output¶
JSON Output¶
final json = llamafu.generateStructured(
'Generate a user profile with name and age',
format: OutputFormat.json,
prettyPrint: true,
);
// Output: {"name": "Alice", "age": 28}
Grammar-Constrained Generation¶
Use GBNF grammar to constrain output format:
final grammar = '''
root ::= object
object ::= "{" pair ("," pair)* "}"
pair ::= string ":" value
string ::= "\\"" [a-zA-Z]+ "\\""
value ::= string | number
number ::= [0-9]+
''';
final response = await llamafu.completeWithGrammar(
'Generate a key-value pair:',
grammar: grammar,
maxTokens: 50,
);
Stop Sequences¶
Stop generation when specific sequences are encountered:
final response = await llamafu.complete(
'List three fruits:\n1.',
maxTokens: 100,
stopSequences: ['\n4.', '\n\n'], // Stop after 3 items
);
Streaming Generation¶
Basic Streaming¶
With Progress Callback¶
int tokenCount = 0;
await for (final token in llamafu.completeStream(prompt)) {
tokenCount++;
updateUI(token, tokenCount);
}
Cancellable Streaming¶
bool cancelled = false;
llamafu.setAbortCallback(() => cancelled);
final stream = llamafu.completeStream(prompt);
// In UI
void onCancelPressed() {
cancelled = true;
}
Batch Processing¶
Generate multiple completions efficiently:
final prompts = [
'Translate to French: Hello',
'Translate to French: Goodbye',
'Translate to French: Thank you',
];
final responses = <String>[];
for (final prompt in prompts) {
responses.add(await llamafu.complete(prompt, maxTokens: 20));
}
Embeddings¶
Extract text embeddings for similarity search:
final embeddings = llamafu.getEmbeddings('Hello, world!');
print('Embedding dimension: ${embeddings.length}');
print('First 5 values: ${embeddings.take(5)}');
Performance Tips¶
1. Set Appropriate Max Tokens¶
// Don't set higher than needed
final response = await llamafu.complete(
'What is 2+2?',
maxTokens: 10, // Short answer expected
);
2. Use Seed for Reproducibility¶
final response = await llamafu.complete(
prompt,
seed: 42,
temperature: 0.7,
);
// Same seed + temperature = same output
3. Warm Up Before Benchmarking¶
await llamafu.warmup();
final stats = llamafu.benchmark(promptTokens: 128, generatedTokens: 128);
print('Tokens/second: ${stats.tokensPerSecond}');
Common Patterns¶
Question Answering¶
final qa = '''
Context: The Eiffel Tower is located in Paris, France.
Question: Where is the Eiffel Tower?
Answer:''';
final answer = await llamafu.complete(qa, maxTokens: 50, temperature: 0.1);
Text Summarization¶
Code Generation¶
final code = await llamafu.complete(
'# Python function to calculate fibonacci\ndef fibonacci(n):',
maxTokens: 200,
temperature: 0.2,
stopSequences: ['\n\n', '# '],
);
Next Steps¶
- Chat Sessions - Multi-turn conversations
- Samplers - Advanced sampling configuration
- Performance - Optimization techniques