This commit is contained in:
James Pine
2026-03-24 16:37:09 -07:00
parent 53b030c492
commit 73ec35656b
4 changed files with 991 additions and 1 deletions

View File

@@ -70,6 +70,22 @@ Shipped adapters: Gmail, Apple Notes, Chrome Bookmarks, Chrome History, Safari H
Spacedrive integrates with [Spacebot](https://github.com/spacedriveapp/spacebot), an open source AI agent runtime. Spacebot runs as a separate process alongside Spacedrive, communicating over APIs. Spacedrive provides the data layer. Spacebot provides the intelligence layer. Neither depends on the other. Together, they form an operating surface where humans and agents work side by side.
### File System Intelligence
Spacedrive adds a layer of file system intelligence on top of the native filesystem. It does not just expose files and folders. It understands what they are, why they exist, how they are organized, and what agents are allowed to do with them.
File System Intelligence combines three things:
- **File intelligence** — derivative data for individual files such as OCR, transcripts, extracted metadata, thumbnails, previews, classifications, and sidecars.
- **Directory intelligence** — contextual knowledge attached to folders and subtrees, like "this is where I keep active projects" or "this archive contains dormant repositories".
- **Access intelligence** — universal permissions and policy that apply across devices and clouds, so agents can be granted structured access through Spacedrive instead of raw shell access.
This makes the filesystem legible to AI. When an agent navigates a path through Spacedrive, it does not walk blind. It receives the listing, the relevant context for that subtree, the effective permissions, and summaries of what lives there. A projects folder is not just a folder. It is an active workspace. An archive is not just another directory. It carries historical meaning and policy.
That context evolves over time. Users can explain how they organize their system. Agents can add notes and observations with attribution. Jobs can generate summaries from structure and activity. Spacedrive keeps that intelligence attached to the filesystem itself instead of burying it inside temporary session memory.
The result is a file system that feels native to both humans and agents. Finder and Explorer show you where files are. Spacedrive adds the intelligence layer that explains what they are, how they relate, and how automation should interact with them.
---
## Architecture

View File

@@ -0,0 +1,526 @@
# File System Intelligence
## Purpose
Define `File System Intelligence` as a first-class Spacedrive capability.
File System Intelligence is the intelligence layer that sits on top of the native filesystem and the VDFS. It turns files, directories, clouds, and devices into a machine-readable, agent-readable, human-readable system with derived knowledge, layered context, and universal policy.
This is one of the clearest explanations for why Spacedrive exists beyond being a file manager.
## Definition
`File System Intelligence` is Spacedrive's cross-platform intelligence layer for filesystems.
It includes:
1. derived knowledge about individual files
2. contextual knowledge attached to directories and subtrees
3. universal permissions and policies for agents and automation
Native operating systems expose paths, files, folders, metadata, and OS permissions.
Spacedrive adds:
- meaning
- structure-aware summaries
- derivative data
- context that evolves over time
- cross-device continuity
- agent-readable policy
## Why This Exists
Agents walking a filesystem through shell commands are effectively walking blind.
They can list directories, open files, and infer structure, but they do not naturally understand:
- why a folder exists
- how a user organizes work
- what a directory is for
- what files are important inside a subtree
- what workflows apply there
- what the agent is allowed to do there
File System Intelligence gives the filesystem a context layer that can be surfaced as the agent navigates.
The goal is to make a filesystem legible to AI without relying on fragile session memory or a monolithic root instruction file.
## Relationship to the VDFS
The VDFS remains the storage and identity substrate.
File System Intelligence is a layer on top of it.
The VDFS gives us:
- content identity
- path abstraction
- sidecars and derivatives
- cross-device addressing
- jobs
- sync
- permissions infrastructure
File System Intelligence uses that substrate to attach context and policy to files and subtrees in a way that is portable across devices and storage backends.
## Relationship to Spacebot
Spacedrive owns File System Intelligence.
Spacebot is the first major producer and consumer of it.
This is important because the intelligence layer should not be framed as only a Spacebot feature. It is a core Spacedrive capability that any agent or automation system can use.
Spacebot can:
- write user-informed context into the filesystem intelligence layer
- read that context while navigating files and directories
- update summaries and policies over time
## Product Framing
This is the short product framing:
- Finder and Explorer show you where files are.
- Spacedrive understands what they are, why they exist, how they relate, and what agents are allowed to do with them.
This is the platform framing:
- Spacedrive adds File System Intelligence: derived knowledge, contextual understanding, and universal permissions across every device and cloud.
## Core Pillars
File System Intelligence has three pillars.
### 1. File Intelligence
Per-file derived knowledge.
Examples:
- extracted metadata
- OCR text
- transcripts and subtitles
- thumbnails and previews
- classifications
- sidecars and derivative artifacts
- extracted structure from documents or media
This intelligence is usually deterministic or pipeline-driven.
### 2. Directory Intelligence
Contextual knowledge attached to directories and subtrees.
Examples:
- "This is where I keep active projects"
- "Archive contains dormant repositories"
- "This folder contains scanned personal records"
- "This area is client work, do not modify without approval"
- summaries of what a directory contains and how it is used
This intelligence can come from both users and agents and should be inherited through the subtree where appropriate.
### 3. Access Intelligence
Universal permissions and policy that sit above OS-native permissions.
Examples:
- which folders an agent may read
- which folders an agent may write to
- whether deletion is allowed
- whether a subtree is sensitive
- whether a cloud source is accessible to a given automation
This allows a user to grant access once through Spacedrive and have that policy apply consistently across devices, clouds, and operating systems.
## What It Is Not
File System Intelligence is not:
- a replacement for the native filesystem
- a monolithic prompt file like a giant `AGENTS.md`
- only vector embeddings
- only tags
- only sidecars
- only agent memory
It is a structured context and policy layer that can be queried, updated, inherited, and observed over time.
## Design Principles
### Context should be hierarchical
Context must attach at multiple levels of the filesystem and follow the tree.
If a user explains what `~/Projects` is for, that context should be available when an agent explores `~/Projects/foo/bar` unless something more specific overrides or narrows it.
### Context should be scoped
Only the relevant context for the current subtree should be surfaced.
This avoids the context pollution problem of large root-level instruction files.
### Context should be observable
The system should preserve who said what, when it changed, and how the understanding of a subtree evolved over time.
### Context should be atomic
The source of truth should not be a single mutable paragraph.
Facts, policies, and notes should be stored as atomic records. Summaries should be generated views over those records.
### Context should be portable
The same model should work for:
- local filesystems
- removable volumes
- NAS storage
- cloud providers
- future repository-backed archival sources where relevant
## Recommended Data Model
Do not model File System Intelligence as tags alone.
Tags are useful, but they are too narrow to carry the full meaning of filesystem context.
Instead, use a richer context-layer model.
### Context Node
A `ContextNode` is the core primitive.
It attaches to a file, directory, subtree, or virtual filesystem object and stores one piece of meaning, policy, or generated understanding.
Suggested fields:
```text
id
library_id
target_kind # file | directory | subtree | volume | cloud_location
target_id # VDFS identity or location-scoped identifier
scope # exact | inherited
node_kind # fact | summary | policy | note | tag
title
content
structured_payload
source_kind # user | agent | job | system
source_id
confidence
visibility # user_only | agent_visible | private | synced
created_at
updated_at
supersedes_id
archived_at
```
### Why This Shape
- atomic facts can accumulate over time
- generated summaries can be refreshed without destroying history
- policies can be stored separately from descriptive context
- tags can remain lightweight labels rather than carrying every semantic burden
## Facts vs Summaries
This distinction is critical.
### Atomic Facts
Examples:
- "User keeps active repositories in this directory"
- "Archive subfolder contains inactive projects"
- "This subtree contains financial documents"
- "Agent may edit files here but may not delete them"
Facts are durable, attributable, and versionable.
### Generated Summaries
Examples:
- "This directory mostly contains Rust and TypeScript repositories updated recently"
- "This subtree appears to be an archive of completed client projects"
Summaries are synthesized views over facts, file structure, and activity.
The source of truth is the atomic layer, not the summary text.
## Tags
Tags still matter.
They can be used as one expression of intelligence, especially when the system needs lightweight labels with rich metadata.
But they should not be the only model.
Recommended role for tags:
- lightweight labels on files or directories
- optional metadata carriers
- one output of the broader context system
Possible future direction:
- allow tags to carry rich text and version history
- allow tags to be generated from or backed by context nodes
## Permissions and Policy
Universal permissions are a major part of File System Intelligence.
These permissions should live above the OS layer and be enforced when agents access files through Spacedrive.
Examples:
- read-only subtree
- writable subtree
- safe workspace subtree
- no-delete policy
- user-confirmation-required policy
- hidden subtree for sensitive data
This gives the user one consistent interface for granting agent access across:
- macOS
- Windows
- Linux
- cloud providers
- remote devices
## Agent Experience
When an agent accesses a path through Spacedrive, it should not only receive the raw directory listing.
It should receive:
- the listing itself
- relevant inherited context
- relevant local context
- active permissions and policy
- important summaries of subtree contents
- optionally recent changes or historical notes
This turns navigation from blind traversal into informed traversal.
## Query Surface
At the VDFS and API layer, the system should support queries such as:
- get context for this path
- get inherited context for this subtree
- list context nodes attached here
- generate summary for this subtree
- add fact to this path
- add policy to this subtree
- resolve effective policy for this path
- show context history for this directory
The system should be able to answer both human-facing and agent-facing forms of the same question.
## Sources of Intelligence
There are multiple sources of intelligence.
### Deterministic Jobs
Best for:
- metadata extraction
- media derivatives
- content statistics
- directory composition summaries
- language and file type distribution
### Agent-Written Context
Best for:
- user workflow explanations
- organizational semantics
- safe workspace semantics
- intent captured during normal conversation
### User-Written Context
Best for:
- explicit corrections
- durable preferences
- policy decisions
- sensitive or authoritative context
## Jobs vs Agent Interaction
The first implementation should not rely only on a background job that tries to infer the meaning of the whole filesystem from structure alone.
That approach risks weak summaries and invented semantics.
Instead:
- jobs produce deterministic observations and refresh generated summaries
- agents and users add meaning over time
This lets the intelligence layer evolve incrementally and honestly.
## Example
Given a home directory with:
```text
~/Projects
~/Projects/Archive
~/Documents
```
The user tells Spacebot:
- "I keep active repositories in Projects"
- "Archive contains repos I'm not actively working on"
The system stores these as atomic context nodes.
Later, a summary job produces:
- `~/Projects`: "Primary software workspace containing active repositories, mostly Rust and TypeScript"
- `~/Projects/Archive`: "Inactive or historical repositories, lower write priority"
Then when an agent enters `~/Projects/foo`, it inherits:
- that it is inside the active projects subtree
- that agent write access may be allowed there
- that archive semantics do not apply yet
This is the intended user and agent experience.
## Storage Strategy
The exact persistence model is open, but the design should support:
- attachment to VDFS identities and locations
- revision history
- sync across devices when appropriate
- efficient subtree lookup
- policy inheritance resolution
Possible implementation shapes:
1. dedicated context tables in the library database
2. sidecar-style storage indexed into the library
3. tag-backed records with richer metadata and versioning
Recommended direction:
- use dedicated context records as the real model
- integrate tags as one expression layer, not the underlying substrate
## Observability and History
This system should preserve how understanding changes over time.
That means:
- revision history for facts and policies
- superseded summaries rather than silent overwrite
- attribution to user, job, or agent
- optional inspection of context evolution
This is important both for trust and for future agent behavior.
## Search and Retrieval
Vector embeddings may help in some cases, but they are not the primary abstraction for File System Intelligence.
The first retrieval model should be structure-aware and direct.
Examples:
- retrieve context by exact path
- retrieve inherited context by walking ancestors
- retrieve effective policy by path
- retrieve summaries for the current subtree
Embeddings can be added later for semantic recall over large bodies of context, but they should not replace the explicit hierarchical model.
## MVP Recommendation
Start with four primitives.
### 1. Folder Context
Attach rich context to a directory or subtree.
### 2. Atomic Facts
Store user or agent assertions as discrete records.
### 3. Agent Policy
Store subtree-level read/write/modify rules.
### 4. Generated Summary
Generate refreshable summaries from file structure and facts.
This is enough to demonstrate the full value of File System Intelligence without solving every future problem first.
## Integration Path
### Phase 1: Product Language and UI Surface
- adopt `File System Intelligence` as the product term
- expose a basic UI for enabling it per location or subtree
- allow users to add and inspect context
### Phase 2: Context Data Model
- add context node storage
- add effective-context queries
- add policy resolution
### Phase 3: Agent Integration
- Spacebot reads context while navigating via Spacedrive
- Spacebot can write facts and notes with attribution
### Phase 4: Summary Jobs
- generate structure-aware summaries
- refresh them on indexing or change events where appropriate
### Phase 5: Cross-Device Policy and Sync
- sync context and policy across devices at the library level
- apply universal permissions through the VDFS
## Open Questions
1. Should the first storage implementation use dedicated context records or evolve the existing tag model first?
2. How should effective-context inheritance be surfaced in the UI so it is understandable?
3. Which parts of the context layer should sync automatically and which should stay local?
4. How should user-authored policy interact with existing OS-level permission failures?
5. How much agent-written context should require confirmation before becoming durable?
## Recommendation
Adopt `File System Intelligence` as the name for Spacedrive's filesystem context and policy layer.
Implement it as:
- atomic context records
- generated summaries built over those records
- subtree-aware policy and permission resolution
- agent-readable context surfaced during navigation
This gives Spacedrive a clear answer to a fundamental product question:
- why should an agent use Spacedrive instead of raw shell access?
Because Spacedrive does not just expose files. It exposes file systems with intelligence.

View File

@@ -0,0 +1,448 @@
# Spacebot Integration Design
## Purpose
Add first-class Spacebot support to Spacedrive without collapsing the two products into one process model.
Spacedrive should be able to:
1. manage a local Spacebot instance for the user
2. connect to an already running local Spacebot instance
3. connect to a remote Spacebot instance
This keeps Spacebot as a separate runtime while making it feel native inside Spacedrive.
## Decision
Spacedrive will treat Spacebot as a companion service, not as an embedded subsystem inside the VDFS daemon.
The integration boundary is HTTP plus SSE, using Spacebot's existing API.
Spacedrive will support three connection modes:
1. **Managed Local** — Spacedrive launches and supervises a foreground Spacebot child process.
2. **External Local** — Spacedrive connects to an existing localhost Spacebot instance.
3. **Remote** — Spacedrive connects to a Spacebot instance over HTTPS with bearer auth.
## Why This Shape
- Spacebot already has a real control plane: HTTP API, health endpoints, status endpoints, SSE, and a stable instance directory model.
- Spacedrive already treats Spacebot as a separate process in the README, which is the right long-term boundary.
- Embedding Spacebot directly into `sd-core` would couple two daemon models too early.
- Spacebot works cleanly as a child process because it has explicit foreground mode and local file-backed state.
- The same client model can serve local managed, local external, and remote connections.
## Non-Goals
- Do not merge Spacebot into the Spacedrive daemon process.
- Do not proxy every Spacebot API through Spacedrive in v1.
- Do not require Spacedrive core to understand Spacebot internals like channels, branches, workers, or memory schemas.
- Do not design a brand-new agent API when Spacebot already has one.
## Existing Spacebot Capabilities
Spacebot already exposes the pieces Spacedrive needs.
### Runtime
- single binary
- foreground mode for supervised child-process execution
- daemon mode with PID file and Unix socket for native CLI control
- configurable instance directory
Relevant files:
- `spacebot/src/main.rs`
- `spacebot/src/daemon.rs`
- `spacebot/src/config/types.rs`
### HTTP API
Default API behavior:
- bind: `127.0.0.1`
- port: `19898`
- optional bearer token auth
Relevant files:
- `spacebot/src/api/server.rs`
- `spacebot/src/api/system.rs`
- `spacebot/docs/docker.md`
### Minimal endpoints Spacedrive can rely on
- `GET /api/health` — liveness
- `GET /api/status` — version, pid, uptime
- `GET /api/idle` — worker and branch activity
- `GET /api/agents/warmup` — work readiness
- `POST /api/webchat/send` — inject a message
- `GET /api/webchat/history` — fetch conversation history
- `GET /api/events` — global SSE event stream
## Integration Modes
### 1. Managed Local
Spacedrive starts Spacebot as a child process in foreground mode and talks to it over localhost HTTP.
Recommended command shape:
```text
spacebot start --foreground --config <path>
```
Recommended ownership:
- process lifecycle owned by the desktop shell layer, not by `sd-core`
- status mirrored into Spacedrive config and UI
- health and warmup polled over HTTP
Why this is the recommended default:
- easiest onboarding
- strongest first-class user experience
- least invasive to Spacedrive core
- preserves Spacebot as a separate product and runtime
### 2. External Local
Spacedrive connects to an already running local Spacebot instance.
Expected user inputs:
- base URL, usually `http://127.0.0.1:19898`
- optional bearer token
This mode is important for:
- developers already running Spacebot manually
- advanced users with custom instance directories or configs
- system-service installs managed outside Spacedrive
### 3. Remote
Spacedrive connects to a remote Spacebot instance over HTTPS.
Expected user inputs:
- base URL
- bearer token
- optional instance label
This mode is important for:
- self-hosted NAS or server deployments
- hosted Spacebot instances
- team or shared deployments
## Recommended V1 Scope
The smallest honest first-class integration is:
1. support Managed Local and External Local first
2. design the client so Remote works with the same abstraction
3. use the existing Spacebot webchat and SSE APIs instead of inventing a new protocol
4. keep Spacebot lifecycle in the app layer
5. keep Spacebot connection metadata in app config
## Architecture Boundary
### Spacedrive Core responsibilities
- persist Spacebot connection settings in app config
- expose typed config get/update operations
- expose lightweight status and health queries if the UI should stay transport-agnostic
- optionally publish Spacebot connection events onto Spacedrive's own event system later
### Desktop shell responsibilities
- spawn and stop managed local Spacebot processes
- supervise child process lifecycle
- detect existing local process connectivity
- surface launch and crash diagnostics
### Interface responsibilities
- Spacebot settings page
- connection mode selection
- status display and diagnostics
- embedded chat and activity surfaces
### Spacebot responsibilities
- own agent runtime, messaging, memory, tools, and control API
- remain independently deployable and independently upgradeable
## Why Not Manage Spacebot in `sd-core`
`sd-core` is the VDFS daemon. Spacebot is its own daemon-like runtime with its own process lifecycle, logs, warmup state, secrets, agent graph, and HTTP UI model.
Putting child-process management directly into `sd-core` would:
- blur product boundaries
- complicate server and mobile targets unnecessarily
- make local-only process concerns leak into the core library
The right split is:
- config lives in core
- process supervision lives in the platform shell
## Proposed Spacedrive Config Shape
Add a new block to `AppConfig` in `spacedrive/core/src/config/app_config.rs`.
Suggested shape:
```text
spacebot:
enabled
mode # managed_local | external_local | remote
base_url
auth_token
manage_process
binary_path
config_path
instance_dir
auto_start
connect_on_launch
last_known_status
```
Notes:
- `auth_token` should not stay in plain app config long-term if we already have a stronger secret storage primitive available.
- v1 can store token in config only if necessary, but the preferred direction is secure storage.
## Proposed Core Operations
Add config-backed operations for Spacebot.
### Core Queries
- `spacebot.config.get`
- `spacebot.status.get`
- `spacebot.health.get`
### Core Actions
- `spacebot.config.update`
- `spacebot.connect`
- `spacebot.disconnect`
- `spacebot.start_managed`
- `spacebot.stop_managed`
These can begin as thin wrappers around app config and platform commands.
## Proposed Desktop Platform Commands
The Tauri layer already manages `sd-daemon`. Reuse that pattern for Spacebot.
Recommended commands:
- `spacebot_start`
- `spacebot_stop`
- `spacebot_restart`
- `spacebot_status`
- `spacebot_logs_path`
Managed Local should:
- launch Spacebot in foreground mode
- inject or point to a dedicated config path
- set `SPACEBOT_DIR` or equivalent instance path
- wait for `GET /api/health`
- then wait for `GET /api/agents/warmup` if chat UI depends on readiness
## Connection Client Abstraction
Add a lightweight Spacebot client in the app layer or shared TypeScript layer.
Recommended methods:
- `health()`
- `status()`
- `warmupStatus()`
- `sendWebchatMessage(agentId, sessionId, senderName, message)`
- `getWebchatHistory(agentId, sessionId)`
- `subscribeEvents()`
This should be HTTP plus SSE based, independent of whether the instance is local or remote.
## UI Placement
### Settings
Best fit:
- extend `spacedrive/packages/interface/src/Settings/pages/ServicesSettings.tsx`
Add a Spacebot section with:
- mode selector
- managed-local start on launch toggle
- local URL / remote URL
- auth token input
- connection test button
- health, status, and warmup indicators
### Chat Surface
Recommended first placement:
- a dedicated Spacebot route or panel in the interface
The first slice does not need to fully replicate the Spacebot dashboard. It only needs a clean embedded chat surface plus basic runtime status.
## Data and Security Model
### Local managed instance
Recommended default:
- store Spacebot instance data under Spacedrive's data root but in a separate subtree
Example:
```text
<spacedrive-data-dir>/spacebot/
instance/
config.toml
logs/
```
This keeps ownership clear while preserving process separation.
### Auth
- local managed can run without auth if strictly loopback bound
- local external should support optional bearer token
- remote should require bearer token in practice
### Secret storage
Preferred direction:
- store remote bearer tokens outside plain JSON config when possible
## Event Model
Spacebot already emits a global SSE stream from `/api/events`.
V1 recommendation:
- consume it directly in the UI client
- filter by `agent_id` and `channel_id` client-side
- do not mirror all Spacebot events into Spacedrive core yet
Why:
- less duplication
- less coupling
- fewer translation bugs
Future:
- if the rest of Spacedrive needs Spacebot events, add a narrow translated event layer later
## Session Model
Use Spacebot's webchat model as the first integration path.
Suggested mapping:
- one Spacedrive user session or panel maps to one `session_id`
- one chosen Spacebot agent maps to `agent_id`
- user input goes to `/api/webchat/send`
- UI state is hydrated from `/api/webchat/history`
- live output comes from `/api/events`
This is enough to ship first-class chat without adopting the full Spacebot dashboard API surface.
## Risks
### Mode complexity
Supporting managed local, external local, and remote is correct, but the UX can get confusing fast.
Mitigation:
- make Managed Local the recommended default
- place External Local and Remote behind an explicit advanced setup flow
### Readiness mismatch
`/api/health` only means the HTTP server is up. It does not mean the agent is ready.
Mitigation:
- gate chat UX on warmup status, not liveness alone
### Secrets in app config
Remote bearer tokens should not live forever in plain JSON.
Mitigation:
- v1 can be pragmatic
- v2 should move tokens to secure storage
### Tight product coupling
If Spacedrive starts depending on too much of Spacebot's internal API surface, upgrades get harder.
Mitigation:
- define a narrow Spacedrive-facing client contract
- start with webchat, status, health, and SSE only
## Phased Plan
### Phase 1: Config and Discovery
- add Spacebot config block to app config
- add settings UI for connection mode and endpoint
- add lightweight client for health/status/warmup
### Phase 2: Managed Local
- add Tauri platform commands to start and stop Spacebot
- add supervised child-process support
- create dedicated Spacebot instance directory under Spacedrive data
### Phase 3: Embedded Chat
- add Spacebot panel or route
- send messages via `/api/webchat/send`
- show history via `/api/webchat/history`
- stream updates via `/api/events`
The first prototype can ship with a config-gated chat route, handwritten request types for the narrow webchat surface, and polling for history before the SSE layer is wired in.
### Phase 4: Deeper Integration
- agent picker
- worker status and live activity
- memory and task views if useful
- cross-link Spacebot with Spacedrive repository and file contexts
### Phase 5: Remote Hardening
- secure token storage
- richer diagnostics
- better reconnect behavior
## Recommendation
Ship first-class Spacebot support as a companion-runtime integration.
Start with:
- **Managed Local** as the default
- **External Local** as the easy advanced path
- **Remote** as the same client abstraction with a different base URL
Keep the boundary at HTTP plus SSE. Keep process supervision in the desktop shell. Keep settings in Spacedrive core config. Use Spacebot's webchat model first.
That gives Spacedrive deep native Spacebot support without pretending the two runtimes should already be one.

View File

@@ -7,7 +7,7 @@ setup:
# Run the daemon (default dev workflow: just dev-daemon + just dev-desktop)
dev-daemon *ARGS:
cargo run --bin sd-daemon {{ARGS}}
cargo run --features ffmpeg,heif --bin sd-daemon {{ARGS}}
# Run the desktop app in dev mode
dev-desktop: