dataset provides a two ways to organize your JSON Objects. The original was a “buckets” oriented layout. The newer versions of file layout uses a pairtree. Both are managed/described by the collection.json document at in the root folder the the collection. Both file layouts currently support “attachments” as a tar ball of with the same basename as the JSON object document (e.g. hello-world.json would have attachments stored as hello-world.tar). Attachments are experimental and how they are handled will likely change in the future. If so the repair/analyzer abilities of dataset should ease the migration process.
The directory layout looks like:
The directory layout looks like:
BUCKETS are names without meaning normally using Alphabetic characters. A dataset defined with four buckets might looks like aa, ab, ba, bb. These directories will contains JSON documents and a tar file if the document has attachments.