docs and plans

2026-04-29 02:42:47 -04:00 · 2022-04-11 21:42:32 -07:00
parent 9eb7c89591
commit e648dd2e2b
5 changed files with 54 additions and 33 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -24,6 +24,7 @@ packages/*/data
 apps/*/data
 docs/public/*.st
 docs/public/*.toml
+dev.db

 !cli/cmd/turbo
 cli/npm/turbo-android-arm64/bin
--- a/docs/architecture/distributed-data-sync.md
+++ b/docs/architecture/distributed-data-sync.md
@@ -20,8 +20,8 @@ mod sync {
  // we can now impl specfic CRDT traits to given resources
  enum SyncResource {
    FilePath(dyn Replicate),
-    File(dyn OperationalTransform),
-    Tag(dyn OperationalTransform),
+    File(dyn PropertyOperation),
+    Tag(dyn PropertyOperation),
    TagOnFile(dyn LastWriteWin),
    Jobs(dyn Replicate + OperationalTransform)
  }
@@ -31,29 +31,19 @@ mod sync {


 ## Data Types
-Data is divided into several kinds, Shared, Relational and Owned.
+Data is divided into several kinds, Shared and Owned.
 - **Shared data** - Can be created and modified by any client. Has a `uuid`.

-  *Sync Method:* `Operational transform*`
+  *Sync Method:* `Property operation*`

  > Shared resources could be,`files`, `tags`, `comments`, `albums` and `labels`. Since these can be created, updated or deleted by any client at any time. 

- **Relational data** - Can be created and modified by any client. Links two UUIDs by local IDs.
-
-  *Sync Method:* `Last write wins (LWW)`
-
-  > Any many-to-many tables do not store UUIDs, we have to handle this data specifically. Querying for the resources local IDs before creating or deleting the relation.
-
 - **Owned data** - Can only be modified by the client that created it. Has a `client_id` and `uuid`.

  *Sync Method:* `Replicate`

  > Owned resources would be `file_paths`, `jobs`, `locations` and `media_data`, since a client is the single source of truth for this data. This means we can perform conflict free synchronization.

- **Offline data** - Not synchronized at all.
-
-  > For example `logs`, `pending_operations` and `_migrations`. These are static and not part of this system.
-


 **Shared data doesn't always use this method, in some cases we can create shared resources in bulk, where conflicts are handled by simply merging. More on that in [Synchronization Strategy]()*.
@@ -120,17 +110,20 @@ Owned data → Bulk shared data →  Shared data → Relational data
 ### Types of CRDT:

 ```rust
-trait OperationalTransform;
-
-trait LastWriteWin;
+trait PropertyOperation;

 trait Replicate;
 ```

- **Operational Transform** - Update Shared resources at a property level. Operations stored in `pending_operations` table. 
- **Last Write Win** - The most recent event will always be applied, used for many-to-many datasets.
+- **PropertyOperation** - Update Shared resources at a property level. Operations stored in `pending_operations` table. 
 - **Replicate** - Used exclusively for Owned data, clients will replicate with no questions asked.

+- ~~**Last Write Win** - The most recent event will always be applied, used for many-to-many datasets.~~
+
+
+
+
+


 ## Operations
@@ -139,7 +132,7 @@ Operations perform a Shared data change, they are cached in the database as `pen
 Operations are removed once all online clients have received the payload.

 ```rust
-struct OperationalTransform<V> {
+struct PropertyOperation<V> {
  method: OperationMethod,
  // the name of the database table
  resource_type: String,
@@ -272,18 +265,6 @@ Files also impempent `OperationalMerge` would use



-## Ingesting Sync Events
-
-
-
-
-
-
-
-
-
-
-



--- a/docs/architecture/virtual-filesystem.md
+++ b/docs/architecture/virtual-filesystem.md
@@ -36,7 +36,7 @@ struct File {
 }
 ```

- `partial_checksum ` - A SHA256 checksum generated from 5 samples of 10,000 bytes throughout the file data, including the begining and end. This is used to identify a file as *likely* unique in under 100µs. 
+- `partial_checksum ` - A SHA256 checksum generated from 5 samples of 10,000 bytes throughout the file data, including the begining and end + total byte count. This is used to identify a file as *likely* unique in under 100µs. 

 > ~~It is impossible to have a unique constraint at a database level for the `partial_checksum` however we can asyncronously resolve conflicts by querying for duplicates and generating full checksums at a later date.~~
 >
--- a/extensions/apple-photos/README.md
+++ b/extensions/apple-photos/README.md
@@ -0,0 +1,39 @@
+This extension must first register an indexer context to prevent the indexer from scanning the photo library
+
+```rust
+struct IndexerContext {
+    key: String,
+    is_dir: bool,
+    extension: Option<String>,
+    must_contain: Vec<String>,
+    always_ignored: Option<String>
+    scan: bool,
+}
+```
+
+```rust
+core.register_context(IndexerContext {
+    key: "apple-photo-library",
+    is_dir: false,
+    extension: ".photoslibrary",
+    must_contain: vec!["database", "originals"],
+    always_ignored: None, 
+    scan: false, // apple-photos extension takes care of scan
+});
+
+core.register_context(IndexerContext {
+    key: "github-repository",
+    is_dir: true,
+    extension: None,
+    must_contain: vec![".git"],
+    always_ignored: Some("node_modules", "target")
+    scan: true,
+});
+```
+
+For Apple Photos we need:
+- Hidden/Favorite items
+- Live photo support
+- Original creation date
+- Edited photos
+- Albums
--- a/extensions/twitter-history/Cargo.toml
+++ b/extensions/twitter-history/Cargo.toml