PHP Bindings¶
PHP bindings for the Mullama LLM library, using PHP's built-in FFI extension for direct access to the native shared library. Provides model loading, text generation, embeddings, and sampler presets with framework integrations for Laravel and Symfony.
Installation¶
Via Composer¶
Prerequisites¶
The PHP bindings require:
- PHP >= 8.1 with the FFI extension enabled
- The pre-built
libmullama_ffishared library - The
mullama.hheader file
Building the Shared Library¶
# From the mullama source directory
cargo build --release -p mullama-ffi
# Output locations:
# Linux: target/release/libmullama_ffi.so
# macOS: target/release/libmullama_ffi.dylib
# Windows: target/release/mullama_ffi.dll
Enabling PHP FFI¶
Ensure the FFI extension is enabled in your php.ini:
extension=ffi
; For development (allows FFI::cdef)
ffi.enable=true
; For production (preloaded only)
; ffi.enable=preload
Library Placement¶
Place the shared library and header file where the PHP bindings can find them. The bindings search in this order:
- Path set via
Mullama::setLibraryPath() bindings/ffi/include/mullama.h(relative to package)/usr/local/include/mullama.hand/usr/local/lib/libmullama_ffi.so/usr/include/mullama.hand/usr/lib/libmullama_ffi.so
# System-wide installation (Linux)
sudo cp target/release/libmullama_ffi.so /usr/local/lib/
sudo cp bindings/ffi/include/mullama.h /usr/local/include/
sudo ldconfig
Quick Start¶
<?php
require_once 'vendor/autoload.php';
use Mullama\Model;
use Mullama\Context;
use Mullama\SamplerParams;
// Load a model
$model = Model::load('./model.gguf', ['nGpuLayers' => 32]);
// Create a context
$ctx = new Context($model, ['nCtx' => 2048]);
// Generate text
$text = $ctx->generate("Once upon a time", 100, SamplerParams::greedy());
echo $text . "\n";
API Reference¶
Mullama (Main Class)¶
The Mullama class manages the FFI library loading and backend lifecycle.
Mullama::initialize()¶
Initialize the backend. Called automatically on first use.
Mullama::shutdown()¶
Free backend resources. Call before process exit.
Mullama::setLibraryPath(path)¶
Set a custom path to the libmullama_ffi library before initialization.
// Must be called before any other Mullama operations
Mullama\Mullama::setLibraryPath('/opt/mullama/lib/libmullama_ffi.so');
Mullama::supportsGpuOffload()¶
Check if GPU offloading is available.
Mullama::systemInfo()¶
Get system information about the backend.
Mullama::maxDevices()¶
Get the maximum number of compute devices.
Mullama::version()¶
Get the library version.
Mullama::getLastError()¶
Get the last error message from the FFI layer.
Model¶
The Model class handles model loading, tokenization, and model information.
Model::load(path, params)¶
Load a model from a GGUF file.
Parameters (via $params array):
| Key | Type | Default | Description |
|---|---|---|---|
nGpuLayers |
int |
0 |
Layers to offload to GPU (0 = CPU, -1 = all) |
useMmap |
bool |
true |
Use memory mapping |
useMlock |
bool |
false |
Lock model in memory |
vocabOnly |
bool |
false |
Only load vocabulary |
Throws: RuntimeException if loading fails
// CPU only
$model = Model::load('./model.gguf');
// GPU accelerated
$model = Model::load('./model.gguf', ['nGpuLayers' => -1]);
// Vocabulary only
$model = Model::load('./model.gguf', ['vocabOnly' => true]);
$model->free()¶
Release model resources. Called automatically by the destructor.
$model->tokenize(text, addBos, special)¶
Convert text to token IDs.
Returns: int[] array of token IDs
$model->detokenize(tokens)¶
Convert token IDs back to text.
Model Properties¶
public function nCtxTrain(): int // Training context size
public function nEmbd(): int // Embedding dimension
public function nVocab(): int // Vocabulary size
public function nLayer(): int // Number of layers
public function nHead(): int // Number of attention heads
public function tokenBos(): int // BOS token ID
public function tokenEos(): int // EOS token ID
public function size(): int // Model size in bytes
public function nParams(): int // Number of parameters
public function description(): string // Model description
public function tokenIsEog(int $token): bool // Check if token is EOG
$model = Model::load('./model.gguf');
echo "Description: " . $model->description() . "\n";
echo "Parameters: " . number_format($model->nParams()) . "\n";
echo "Layers: " . $model->nLayer() . "\n";
echo "Embedding dim: " . $model->nEmbd() . "\n";
echo "Size: " . round($model->size() / 1e9, 2) . " GB\n";
Context¶
The Context class provides text generation capabilities.
new Context(model, params)¶
Create a new inference context.
Parameters (via $params array):
| Key | Type | Default | Description |
|---|---|---|---|
nCtx |
int |
0 |
Context size (0 = model default) |
nBatch |
int |
2048 |
Batch size |
nThreads |
int |
0 |
Thread count (0 = auto) |
embeddings |
bool |
false |
Enable embeddings mode |
Throws: RuntimeException if creation fails
$ctx->generate(prompt, maxTokens, params)¶
Generate text from a prompt.
public function generate(string $prompt, int $maxTokens = 100, ?SamplerParams $params = null): string
| Parameter | Type | Default | Description |
|---|---|---|---|
$prompt |
string |
(required) | Text prompt |
$maxTokens |
int |
100 |
Maximum tokens to generate |
$params |
?SamplerParams |
null |
Sampling parameters (null = defaults) |
$text = $ctx->generate("Hello, AI!", 100);
echo $text;
// With custom params
$text = $ctx->generate("Write a poem:", 200, new SamplerParams([
'temperature' => 0.9,
'topP' => 0.95,
]));
$ctx->generateFromTokens(tokens, maxTokens, params)¶
Generate text from pre-tokenized input.
public function generateFromTokens(array $tokens, int $maxTokens = 100, ?SamplerParams $params = null): string
$ctx->generateStream(prompt, maxTokens, params)¶
Generate text and return token pieces as an array.
public function generateStream(string $prompt, int $maxTokens = 100, ?SamplerParams $params = null): array
Returns: string[] array of generated text segments
$pieces = $ctx->generateStream("Once upon a time", 100);
foreach ($pieces as $piece) {
echo $piece;
flush();
}
Note
The current PHP implementation returns the full result as a single-element array. True token-by-token streaming requires the C-level callback mechanism which is not directly exposed through PHP FFI.
$ctx->clearCache()¶
Clear the KV cache.
Context Properties¶
SamplerParams¶
The SamplerParams class configures text generation sampling.
Constructor¶
Properties:
| Property | Type | Default | Description |
|---|---|---|---|
$temperature |
float |
0.8 |
Randomness (0.0 = deterministic) |
$topK |
int |
40 |
Top-k sampling (0 = disabled) |
$topP |
float |
0.95 |
Nucleus sampling (1.0 = disabled) |
$minP |
float |
0.05 |
Min-p sampling (0.0 = disabled) |
$typicalP |
float |
1.0 |
Typical sampling (1.0 = disabled) |
$penaltyRepeat |
float |
1.1 |
Repeat penalty (1.0 = disabled) |
$penaltyFreq |
float |
0.0 |
Frequency penalty |
$penaltyPresent |
float |
0.0 |
Presence penalty |
$penaltyLastN |
int |
64 |
Token window for penalties |
$seed |
int |
0 |
Random seed (0 = random) |
// Default parameters
$params = new SamplerParams();
// Custom parameters
$params = new SamplerParams([
'temperature' => 0.7,
'topK' => 50,
'topP' => 0.9,
]);
// Direct property access
$params->temperature = 0.5;
echo $params->topK; // 50
Preset Methods¶
// Deterministic generation
$params = SamplerParams::greedy();
// temperature=0.0, topK=1
// Creative generation
$params = SamplerParams::creative();
// temperature=1.2, topK=100
// Focused generation
$params = SamplerParams::precise();
// temperature=0.3, topK=20
EmbeddingGenerator¶
The EmbeddingGenerator class creates text embeddings.
Constructor¶
| Parameter | Type | Default | Description |
|---|---|---|---|
$model |
Model |
(required) | Model for embeddings |
$nCtx |
int |
512 |
Context size |
$normalize |
bool |
true |
Normalize to unit length |
$gen->embed(text)¶
Generate an embedding vector for text.
Returns: float[] embedding vector
$gen->embedBatch(texts)¶
Generate embeddings for multiple texts.
Returns: float[][] array of embedding vectors
$texts = ['Hello', 'World', 'Test'];
$embeddings = $gen->embedBatch($texts);
echo "Count: " . count($embeddings) . "\n";
$gen->nEmbd()¶
Get the embedding dimension.
EmbeddingGenerator::cosineSimilarity(a, b)¶
Compute cosine similarity between two vectors.
Throws: RuntimeException if vectors have different lengths.
$sim = EmbeddingGenerator::cosineSimilarity($emb1, $emb2);
echo "Similarity: " . round($sim, 4) . "\n";
Examples¶
Basic Text Generation¶
<?php
require_once 'vendor/autoload.php';
use Mullama\Model;
use Mullama\Context;
use Mullama\SamplerParams;
$model = Model::load('./model.gguf', ['nGpuLayers' => -1]);
$ctx = new Context($model, ['nCtx' => 2048]);
$response = $ctx->generate(
"Explain PHP in one paragraph:",
150,
new SamplerParams(['temperature' => 0.7, 'topP' => 0.9])
);
echo $response . "\n";
Embeddings and Similarity¶
<?php
require_once 'vendor/autoload.php';
use Mullama\Model;
use Mullama\EmbeddingGenerator;
$model = Model::load('./embedding-model.gguf');
$gen = new EmbeddingGenerator($model);
// Index documents
$documents = [
'PHP is a server-side scripting language',
'JavaScript runs in web browsers',
'Python is popular for machine learning',
'The weather is sunny today',
];
$docEmbeddings = $gen->embedBatch($documents);
// Query
$queryEmb = $gen->embed("What language is used for AI?");
// Rank by similarity
$results = [];
foreach ($docEmbeddings as $i => $docEmb) {
$score = EmbeddingGenerator::cosineSimilarity($queryEmb, $docEmb);
$results[] = ['text' => $documents[$i], 'score' => $score];
}
usort($results, fn($a, $b) => $b['score'] <=> $a['score']);
echo "Search results:\n";
foreach ($results as $result) {
printf(" [%.4f] %s\n", $result['score'], $result['text']);
}
Tokenization¶
<?php
use Mullama\Model;
$model = Model::load('./model.gguf');
// Tokenize
$tokens = $model->tokenize("Hello, world!");
echo "Tokens: " . implode(', ', $tokens) . "\n";
echo "Token count: " . count($tokens) . "\n";
// Detokenize
$text = $model->detokenize($tokens);
echo "Text: {$text}\n";
// Model info
echo "Vocab size: " . $model->nVocab() . "\n";
echo "BOS token: " . $model->tokenBos() . "\n";
echo "EOS token: " . $model->tokenEos() . "\n";
Laravel Service Provider¶
<?php
namespace App\Providers;
use Illuminate\Support\ServiceProvider;
use Mullama\Model;
use Mullama\Context;
class MullamaServiceProvider extends ServiceProvider
{
public function register(): void
{
$this->app->singleton(Model::class, function () {
return Model::load(
config('mullama.model_path'),
['nGpuLayers' => config('mullama.gpu_layers', 0)]
);
});
$this->app->bind(Context::class, function ($app) {
return new Context($app->make(Model::class), [
'nCtx' => config('mullama.context_size', 2048),
]);
});
}
}
Usage in a controller:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Mullama\Context;
use Mullama\SamplerParams;
class GenerateController extends Controller
{
public function __construct(
private Context $context,
) {}
public function generate(Request $request)
{
$validated = $request->validate([
'prompt' => 'required|string|max:4096',
'max_tokens' => 'integer|min:1|max:2000',
'temperature' => 'numeric|min:0|max:2',
]);
$params = new SamplerParams([
'temperature' => $validated['temperature'] ?? 0.8,
]);
$text = $this->context->generate(
$validated['prompt'],
$validated['max_tokens'] ?? 100,
$params
);
return response()->json(['text' => $text]);
}
}
Symfony Bundle Integration¶
<?php
// config/services.yaml equivalent in PHP
namespace App;
use Mullama\Model;
use Mullama\Context;
use Mullama\EmbeddingGenerator;
// Service definitions
class MullamaFactory
{
private ?Model $model = null;
public function __construct(
private string $modelPath,
private int $gpuLayers = 0,
private int $contextSize = 2048,
) {}
public function getModel(): Model
{
if ($this->model === null) {
$this->model = Model::load($this->modelPath, [
'nGpuLayers' => $this->gpuLayers,
]);
}
return $this->model;
}
public function createContext(): Context
{
return new Context($this->getModel(), [
'nCtx' => $this->contextSize,
]);
}
public function createEmbeddingGenerator(): EmbeddingGenerator
{
return new EmbeddingGenerator($this->getModel());
}
}
Error Handling¶
PHP bindings throw RuntimeException on errors:
use Mullama\Model;
use Mullama\Context;
use RuntimeException;
try {
$model = Model::load('./nonexistent.gguf');
} catch (RuntimeException $e) {
echo "Load failed: " . $e->getMessage() . "\n";
}
try {
$model = Model::load('./model.gguf');
$ctx = new Context($model, ['nCtx' => 2048]);
$text = $ctx->generate("Hello", 100);
} catch (RuntimeException $e) {
echo "Error: " . $e->getMessage() . "\n";
}
Errors from the FFI layer are automatically retrieved and included in the exception message.
Configuration¶
php.ini Settings¶
; Required: enable the FFI extension
extension=ffi
; FFI access level:
; "true" - allow FFI::cdef() (development)
; "preload" - only allow preloaded FFI (production)
; "false" - disable FFI
ffi.enable=true
; Preload the mullama FFI definitions (production)
; ffi.preload=/path/to/mullama_preload.php
Custom Library Path¶
<?php
use Mullama\Mullama;
use Mullama\Model;
// Set before any other Mullama operations
Mullama::setLibraryPath('/opt/mullama/lib/libmullama_ffi.so');
// Now load models as usual
$model = Model::load('./model.gguf');
Requirements¶
| Requirement | Version |
|---|---|
| PHP | >= 8.1 |
| FFI extension | enabled |
libmullama_ffi |
shared library |
mullama.h |
header file |
| Composer | >= 2.0 (for installation) |
Performance Tips¶
-
Reuse models -- model loading is expensive. Use a singleton pattern or dependency injection to share a single model instance.
-
GPU offloading -- set
nGpuLayersto-1to use all GPU layers when available. -
Context lifecycle -- create contexts as needed and let them be garbage collected, or call
free()explicitly in long-running processes. -
Batch embeddings -- use
embedBatch()rather than callingembed()in a loop. -
FFI preloading -- in production, use
ffi.enable=preloadand preload the FFI definitions for better security and performance. -
Memory limits -- large models may exceed PHP's default memory limit. Increase with
ini_set('memory_limit', '4G')or inphp.ini.