8.4 KiB
+++ disableToc = false title = "Fine-Tuning" weight = 18 url = '/features/fine-tuning/' +++
LocalAI supports fine-tuning LLMs directly through the API and Web UI. Fine-tuning is powered by pluggable backends that implement a generic gRPC interface, allowing support for different training frameworks and model types.
Supported Backends
| Backend | Domain | GPU Required | Training Methods | Adapter Types |
|---|---|---|---|---|
| trl | LLM fine-tuning | No (CPU or GPU) | SFT, DPO, GRPO, RLOO, Reward, KTO, ORPO | LoRA, Full |
Availability
Fine-tuning is always enabled. When authentication is enabled, fine-tuning is a per-user feature (default OFF). Admins can enable it for specific users via the user management API.
{{% notice note %}} This feature is experimental and may change in future releases. {{% /notice %}}
Quick Start
1. Start a fine-tuning job
curl -X POST http://localhost:8080/api/fine-tuning/jobs \
-H "Content-Type: application/json" \
-d '{
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"backend": "trl",
"training_method": "sft",
"training_type": "lora",
"dataset_source": "yahma/alpaca-cleaned",
"num_epochs": 1,
"batch_size": 2,
"learning_rate": 0.0002,
"adapter_rank": 16,
"adapter_alpha": 16,
"extra_options": {
"max_seq_length": "512"
}
}'
2. Monitor progress (SSE stream)
curl -N http://localhost:8080/api/fine-tuning/jobs/{job_id}/progress
3. List checkpoints
curl http://localhost:8080/api/fine-tuning/jobs/{job_id}/checkpoints
4. Export model
curl -X POST http://localhost:8080/api/fine-tuning/jobs/{job_id}/export \
-H "Content-Type: application/json" \
-d '{
"export_format": "gguf",
"quantization_method": "q4_k_m",
"output_path": "/models/my-finetuned-model"
}'
API Reference
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/api/fine-tuning/jobs |
Start a fine-tuning job |
GET |
/api/fine-tuning/jobs |
List all jobs |
GET |
/api/fine-tuning/jobs/:id |
Get job details |
DELETE |
/api/fine-tuning/jobs/:id |
Stop a running job |
GET |
/api/fine-tuning/jobs/:id/progress |
SSE progress stream |
GET |
/api/fine-tuning/jobs/:id/checkpoints |
List checkpoints |
POST |
/api/fine-tuning/jobs/:id/export |
Export model |
POST |
/api/fine-tuning/datasets |
Upload dataset file |
Job Request Fields
| Field | Type | Description |
|---|---|---|
model |
string | HuggingFace model ID or local path (required) |
backend |
string | Backend name (default: trl) |
training_method |
string | sft, dpo, grpo, rloo, reward, kto, orpo |
training_type |
string | lora or full |
dataset_source |
string | HuggingFace dataset ID or local file path (required) |
adapter_rank |
int | LoRA rank (default: 16) |
adapter_alpha |
int | LoRA alpha (default: 16) |
num_epochs |
int | Number of training epochs (default: 3) |
batch_size |
int | Per-device batch size (default: 2) |
learning_rate |
float | Learning rate (default: 2e-4) |
gradient_accumulation_steps |
int | Gradient accumulation (default: 4) |
warmup_steps |
int | Warmup steps (default: 5) |
optimizer |
string | adamw_torch, adamw_8bit, sgd, adafactor, prodigy |
extra_options |
map | Backend-specific options (see below) |
Backend-Specific Options (extra_options)
TRL
| Key | Description | Default |
|---|---|---|
max_seq_length |
Maximum sequence length | 512 |
packing |
Enable sequence packing | false |
trust_remote_code |
Trust remote code in model | false |
load_in_4bit |
Enable 4-bit quantization (GPU only) | false |
DPO-specific (training_method=dpo)
| Key | Description | Default |
|---|---|---|
beta |
KL penalty coefficient | 0.1 |
loss_type |
Loss type: sigmoid, hinge, ipo |
sigmoid |
max_length |
Maximum sequence length | 512 |
GRPO-specific (training_method=grpo)
| Key | Description | Default |
|---|---|---|
num_generations |
Number of generations per prompt | 4 |
max_completion_length |
Max completion token length | 256 |
GRPO Reward Functions
GRPO training requires reward functions to evaluate model completions. Specify them via the reward_functions field (a typed array) or via extra_options["reward_funcs"] (a JSON string).
Built-in Reward Functions
| Name | Description | Parameters |
|---|---|---|
format_reward |
Checks <think>...</think> then answer format (1.0/0.0) |
— |
reasoning_accuracy_reward |
Extracts <answer> content, compares to dataset's answer column |
— |
length_reward |
Score based on proximity to target length [0, 1] | target_length (default: 200) |
xml_tag_reward |
Scores properly opened/closed <think> and <answer> tags |
— |
no_repetition_reward |
Penalizes n-gram repetition [0, 1] | — |
code_execution_reward |
Checks Python code block syntax validity (1.0/0.0) | — |
Inline Custom Reward Functions
You can provide custom reward function code as a Python function body. The function receives completions (list of strings) and **kwargs, and must return list[float].
Security restrictions for inline code:
- Allowed builtins:
len,int,float,str,list,dict,range,enumerate,zip,map,filter,sorted,min,max,sum,abs,round,any,all,isinstance,print,True,False,None - Available modules:
re,math,json,string - Blocked:
open,__import__,exec,eval,compile,os,subprocess,getattr,setattr,delattr,globals,locals - Functions are compiled and validated at job start (fail-fast on syntax errors)
Example API Request
curl -X POST http://localhost:8080/api/fine-tuning/jobs \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"backend": "trl",
"training_method": "grpo",
"training_type": "lora",
"dataset_source": "my-reasoning-dataset",
"num_epochs": 1,
"batch_size": 2,
"learning_rate": 5e-6,
"reward_functions": [
{"type": "builtin", "name": "reasoning_accuracy_reward"},
{"type": "builtin", "name": "format_reward"},
{"type": "builtin", "name": "length_reward", "params": {"target_length": "200"}},
{"type": "inline", "name": "think_presence", "code": "return [1.0 if \"<think>\" in c else 0.0 for c in completions]"}
],
"extra_options": {
"num_generations": "4",
"max_completion_length": "256"
}
}'
Export Formats
| Format | Description | Notes |
|---|---|---|
lora |
LoRA adapter files | Smallest, requires base model |
merged_16bit |
Full model in 16-bit | Large but standalone |
merged_4bit |
Full model in 4-bit | Smaller, standalone |
gguf |
GGUF format | For llama.cpp, requires quantization_method |
GGUF Quantization Methods
q4_k_m, q5_k_m, q8_0, f16, q4_0, q5_0
Web UI
When fine-tuning is enabled, a "Fine-Tune" page appears in the sidebar under the Agents section. The UI provides:
- Job Configuration — Select backend, model, training method, adapter type, and hyperparameters
- Dataset Upload — Upload local datasets or reference HuggingFace datasets
- Training Monitor — Real-time loss chart, progress bar, metrics display
- Export — Export trained models in various formats
Dataset Formats
Datasets should follow standard HuggingFace formats:
- SFT: Alpaca format (
instruction,input,outputfields) or ChatML/ShareGPT - DPO: Preference pairs (
prompt,chosen,rejectedfields) - GRPO: Prompts with reward signals
Supported file formats: .json, .jsonl, .csv
Architecture
Fine-tuning uses the same gRPC backend architecture as inference:
- Proto layer:
FineTuneRequest,FineTuneProgress(streaming),StopFineTune,ListCheckpoints,ExportModel - Python backends: Each backend implements the gRPC interface with its specific training framework
- Go service: Manages job lifecycle, routes API requests to backends
- REST API: HTTP endpoints with SSE progress streaming
- React UI: Configuration form, real-time training monitor, export panel