11 KiB
Storage Format
This document describes how backups generated by kopia are stored and how encryption is used to secure them.
The bulk of the backup data is stored in a Repository, while meta-information about backups (typically orders of magnitude smaller) is stored in a Vault.
This design allows repository and vault to be stored in two physically different locations. The Repository is typically stored in highly durable and available cloud storage, such as Google Cloud Storage, while the Vault is small enough that it could be carried on a thumb drive, but typically is also stored in the cloud, together with the Repository.
Repositories can be shared among users, while Vaults are typically owned by single user.
BLOB Storage
Kopia stores all data (both vault and repository) in BLOB storage, which stores Blocks of unstructured binary data. Currently supported storage mechanisms include:
- filesystem
- Google Cloud Storage
Repository
Repository is a content-addressable storage, which stores arbitrary Objects (files, directory listings, etc.) that can be referenced using Object Identifiers, which are human-readable strings.
When objects get stored in a repository, the user does not pick their identifiers, but instead the repository computes the identifier as a function of the contents of the object itself. Because of that repository achieves data deduplication, because multiple identical objects will have the same object IDs, they can be stored only once, saving upload time and storage space.
-
For very small binary objects (up to few kilobytes), the object identifier is
Bfollowed by base-64 encoded contents of the object.EXAMPLE: An object with 4 bytes
01 02 03 04has identifierBAQIDBA. -
For very small text objects (ASCII files, up to few kilobytes), the object identifier is
Tfollowed by the contents of the object.EXAMPLE: The text
quick brown foxhas identifierTquick brown fox -
For medium-sized objects (less than about 20MB), Kopia applies cryptographic hash function to compute the message digest (typically 128-512 bits long), and stores the object contents in a Block whose name is the digest. Object for medium-sized objects is
D{digest}.EXAMPLE:
Da76999788386641a3ec798554f1fe7e6could be an identifier of an object whose cryptographic hash is a76999788386641a3ec798554f1fe7e6. In certain formats (see below) object identifier is followed by base-16 encoded per-object encryption key, such asDa76999788386641a3ec798554f1fe7e6.82f948a549f7b791a5b41915ee4d1ec3935357e4e2317250d0372afa2ebeeb3a -
Larger objects are split into chunks of 20MB which get stored as medium-sized objects. The list of object IDs representing chunks is then stored in JSON object, which is also stored in a repository. The resulting object ID is
L1{digest of list object}. -
Sometimes smaller objects are combined into bundles. This allows all small files in a directory to be stored in a single medium-sized object. In order to refer to sections of the bundle objects
S{offset},{length},{bundle}is used.EXAMPLE:
S2000,5000,Da76999788386641a3ec798554f1fe7e6refers to a section ofDa76999788386641a3ec798554f1fe7e6from byte 2000 till 6999
Sharing and Encryption
Depending on data sharing and encryption needs, Kopia supports three encryption modes:
-
Unencrypted - object data in the repository is stored unencrypted
- This mode is recommended for repositories that are stored in trusted locations
- Any user with access to of repository can see the contents of all the files, although they can't easily find names of files.
- Single-user and shared repositories are supported.
-
Single-key encryption - object data is encrypted using shared key
- Encryption key is shared among users and stored in each user's Vault
- Any user with access to repository and the shared key can decrypt all files, although they can't easily find names of files.
- Per-object initialization vector (synthetic IV, or SIV) is derived from the object contents and a secret also stored in the vault.
- Object identifiers are typically short, for example:
Da76999788386641a3ec798554f1fe7e6. - This mode is recommended for data owned by a single user or a set of trusted users, where key sharing is possible
-
Per-object encryption - object data is encrypted using per-object key
- Per-object encryption key is derived from the object contents and a secret stored in users vaults.
- Encryption key is stored as part of object identifier and is required to decrypt the object.
- Object identifiers are relatively long, because they include 256-bit encryption keys for example:
Da76999788386641a3ec798554f1fe7e6.82f948a549f7b791a5b41915ee4d1ec3935357e4e2317250d0372afa2ebeeb3a - Access to repository and knowledge of the shared secret is not enough to decrypt files, per-object encryption key is also required.
- This mode is recommended for cases where more than one user is sharing a repository, when sharing encryption key is not feasible
Object Formats
The following formats are supported:
| ID | Mode | ObjectID Length | Encryption |
|---|---|---|---|
UNENCRYPTED_HMAC_SHA256 |
Unencrypted | 65 | |
UNENCRYPTED_HMAC_SHA256_128 |
Unencrypted | 33 | |
ENCRYPTED_HMAC_SHA256_AES256_SIV |
Single-key | 33 | AES-256 |
ENCRYPTED_HMAC_SHA512_384_AES256 |
Per-object-key | 98 | AES-256 |
ENCRYPTED_HMAC_SHA512_AES256 |
Per-object-key | 130 | AES-256 |
The default format is ENCRYPTED_HMAC_SHA256_AES256_SIV which is best-suited for single-user deployments.
-
UNENCRYPTED_HMAC_SHA256:
- contents are not encrypted and stored in a block named:
blockID := BASE16(HMACSHA256(secret,content)- per-repository
secretis stored in the Vault
-
UNENCRYPTED_HMAC_SHA256_128:
- contents are not encrypted and stored in a block named:
blockID := BASE16(TRUNCATE(HMACSHA256(secret,content),16)- per-repository
secretis stored in the Vault
-
ENCRYPTED_HMAC_SHA256_AES256_SIV:
- block contents are encrypted with AES-256 in CTR mode with synthetic IV derived from the content:
iv := TRUNCATE(HMACSHA256(secret,content),16) cipherText := AES256CTR(encryptionKey,iv,content) blockID := BASE16(iv)- per-repository
encryptionKeyandsecretare stored in the Vault
-
ENCRYPTED_HMAC_SHA512_384_AES256:
- block contents are encrypted with AES-256 in CTR mode with key derived from content and constant IV:
digest := HMACSHA512384(secret,content) blockID := BASE16(digest[0:16]) encryptionKey := digest[16:48] iv := "kopiakopiakopiak" cipherText := AES256CTR(encryptionKey,iv,content)- per-repository
encryptionKeyandsecretare stored in the Vault
-
ENCRYPTED_HMAC_SHA512_AES256:
- block contents are encrypted with AES-256 in CTR mode with key derived from content and constant IV:
digest := HMACSHA512(secret,content) blockID := BASE16(digest[0:32]) encryptionKey := digest[32:64] iv := "kopiakopiakopiak" cipherText := AES256CTR(encryptionKey,iv,content)- per-repository
encryptionKeyandsecretare stored in the Vault
Vault Vormat
Vault provides storage for backup metadata that is typically encrypted with per-user key.
Each vault contains an unencrypted block named format describing the vault encryption format and key derivation algorithm:
{
"version": "1",
"uniqueID": "Rig5PvhA5HxHcfBV7MwY7US6XXwm40Sz5RzL1hEc4LM=",
"keyAlgo": "scrypt-65536-8-1",
"encryption": "AES256_GCM"
}
All other vault blocks are encrypted using AES256 in Galois/Counter Mode. The encryption key and authenticated data derived from a master key. Master key is either user-provided or derived from a password using Scrypt.
One encrypted block is of particular importance, the block named repo, which describes the location and format of the repository:
```json
{
"connection": {
"type": "filesystem",
"config": {
"path": "/tmp/kopia-test-repo"
},
},
"format": {
"version": 1,
"objectFormat": "ENCRYPTED_HMAC_SHA512_384_AES256",
"secret": "TzQzQDQ7jfBf6/RGNJAIXYZMRbc4Ty8270wiLTfBUHU=",
"maxInlineContentLength": 32768,
"maxBlockSize": 20971520,
"masterKey": "h1jU2A+tSnzRot2Me5ZQNdjjox6KUTqd8H9TqZvtypw="
}
}
```
Directory Format
Directory is represented as JSON object, which can be examined using:
$ kopia show <object id>
It lists all directory entries, sorted lexicographically with directory entry attributes such as length and permissions included. Each entry has an identifier of an object (obj) that contains the file contents or in the case of a directory the JSON object with subdirectory entries.
Note that the directory name is not stored as part of the object, this preserves object IDs of directories that have been moved around but not modified.
{
"stream":"kopia:directory",
"entries":[
{"name":"IMG_0032.JPG","type":"f","mode":"0600","size":1690375,
"mtime":"2016-11-06T00:01:05Z","uid":501,"gid":20,
"obj":"D38861041c27cfeb5fb2b03b69579b3ce"},
{"name":"IMG_0032.MOV","type":"f","mode":"0600","size":3325165,
"mtime":"2016-11-06T00:01:05Z","uid":501,"gid":20,
"obj":"Dd1ed2787f0c3f975afd4cbd733f79533"},
{"name":"IMG_0033.JPG","type":"f","mode":"0600","size":1591460,
"mtime":"2016-11-06T00:01:05Z","uid":501,"gid":20,
"obj":"D6f6c202a0074074bbfe49bbf69d8a1bf"},
...
{"name":"bundle-1","type":"b","size":"465",
"mtime":"2016-11-06T00:01:06Z","obj":"D4cb4013f0cb66d24e6569119e0a122aa",
"bundled":[
{"name":"IMG_0130.JPG","type":"f","mode":"0600","size":"124",
"mtime":"2016-11-06T00:01:06Z","uid":501,"gid":20},
{"name":"IMG_0131.JPG","type":"f","mode":"0600","size":"341",
"mtime":"2016-11-06T00:01:06Z","uid":501,"gid":20}
]}
]}