mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-18 05:33:09 -04:00
* fix(agentpool): close truncate-then-read race in agent_jobs.json persistence
Three call sites wrote and read agent_jobs.json (and agent_tasks.json)
through three independent mutexes:
- AgentJobService.ExecuteJob spawns go saveJobs(job) -> fileJobPersister
holding p.mu
- AgentJobService.SaveJobsToFile holding service.fileMutex
- AgentJobService.LoadJobsFromFile on a separate service instance holding
a different service.fileMutex
Nothing serialized those mutexes, and both writers used os.WriteFile, which
opens O_TRUNC. A reader landing between the truncate and the write saw a
zero-byte file and surfaced as `unexpected end of JSON input` at offset 0.
The macOS tests-apple job started hitting this consistently once the path
filter was removed from .github/workflows/test.yml and the file-mode race
test ran on every push (run 25823124797 was the first observed failure).
Two changes close the window:
1. fileJobPersister.saveTasksToFile / saveJobsToFile now write to a
same-directory temp file and os.Rename to the final path. rename(2) is
atomic on POSIX, so concurrent readers see either the prior contents or
the new contents and never a zero-byte window. The helper Syncs before
close so a crash mid-write leaves either the old file intact or the temp
behind (cleaned up on next save).
2. AgentJobService.{Load,Save}{Tasks,Jobs}{FromFile,ToFile} are collapsed
to thin wrappers around fileJobPersister, removing the duplicate write
path and the redundant service.fileMutex / service.tasksFile /
service.jobsFile fields. Within a single service all task/job I/O now
serializes on the persister's mutex; the atomic rename handles the
cross-instance case the tests exercise.
Adds a regression test that hammers SaveJobsToFile and LoadJobsFromFile
concurrently for 500ms across two service instances on the same paths.
On master this reproduces `unexpected end of JSON input` on Linux within
~500ms; with the fix the suite ran -until-it-fails for 30s (54 attempts,
all green).
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(agentpool): route service flush/load through JobPersister interface
The first cut of the race fix made AgentJobService.{Save,Load}{Tasks,Jobs}*
type-assert s.persister to *fileJobPersister so they could reach the
unexported saveTasksToFile / saveJobsToFile helpers. That defeats the
JobPersister interface: the service is back to reasoning about a concrete
implementation instead of an abstraction.
Promote the bulk-flush operations to the interface as FlushTasks / FlushJobs:
- fileJobPersister.FlushTasks/FlushJobs call the existing private helpers
(atomic temp+rename writes from the prior commit).
- dbJobPersister.FlushTasks/FlushJobs are no-ops because SaveTask/SaveJob
are already write-through to the database.
The service's four file-named methods now talk only to the interface:
LoadTasks/LoadJobs read through s.persister.LoadTasks/LoadJobs, and the
Save side calls FlushTasks/FlushJobs. The "FromFile"/"ToFile" suffixes
stay for backward compat with user_services.go and the existing tests,
but they no longer claim a file-only contract.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
36 lines
1.2 KiB
Go
36 lines
1.2 KiB
Go
package agentpool
|
|
|
|
import (
|
|
"time"
|
|
|
|
"github.com/mudler/LocalAI/core/schema"
|
|
)
|
|
|
|
// JobPersister abstracts task/job persistence between file-backed and DB-backed modes.
|
|
// The in-memory syncmap remains the primary store; the persister provides secondary
|
|
// persistence and authoritative reads (DB mode only).
|
|
type JobPersister interface {
|
|
// Write-through persistence (called after in-memory mutation)
|
|
SaveTask(userID string, task schema.Task) error
|
|
DeleteTask(taskID string) error
|
|
SaveJob(userID string, job schema.Job) error
|
|
DeleteJob(jobID string) error
|
|
|
|
// Bulk flush of the current in-memory state. File-backed persister
|
|
// rewrites the whole JSON file; DB-backed persister no-ops because
|
|
// SaveTask/SaveJob are already write-through.
|
|
FlushTasks() error
|
|
FlushJobs() error
|
|
|
|
// Authoritative reads — DB returns fresh data; file returns nil, nil
|
|
GetJob(jobID string) (*schema.Job, error)
|
|
ListJobs(userID, taskID, status string, limit int) ([]schema.Job, error)
|
|
|
|
// Bootstrap (load all persisted data at startup)
|
|
LoadTasks(userID string) ([]schema.Task, error)
|
|
LoadJobs(userID string) ([]schema.Job, error)
|
|
|
|
// Maintenance
|
|
CleanupOldJobs(retention time.Duration) (int64, error)
|
|
}
|