Caltech Library logo

File system layout

dataset provides a two ways to organize your JSON Objects. The original was a “buckets” oriented layout. The newer layout is a pairtree. Both are managed/described by the collection.json document located in the root folder of the collection. Both file layouts currently support “attachments” as a tar ball of with the same basename as the JSON object document (e.g. hello-world.json would have attachments stored as hello-world.tar). Attachments are experimental and how they are handled will likely change in the future. If so the repair/analyzer abilities of dataset should ease the migration process.

Pairtree

The directory layout looks like:

Buckets

The directory layout looks like:

BUCKETS are names without meaning normally using alphabetic characters. A dataset defined with four buckets might looks like aa, ab, ba, bb. These directories will contains JSON documents and a tar file if the document has attachments.