Documentation for dataset
The documentation is organized around the command line options
and as a series of “how to” style examples.
Command line program documentation
- dataset - usage page for managing collections with dataset
Internal project concepts
dataset Operations
The basic operations support by dataset are listed below organized
by collection and JSON document level.
Collection Level
- init creates a collection
- import (csv) JSON documents from rows of a CSV file
- import (gsheet) JSON documents from rows of a Google Sheet
- export (csv) JSON documents from a collection into a CSV file
- export (gsheet) JSON documents from a collection into a Google Sheet
- keys list keys of JSON documents in a collection, supports filtering and sorting
- haskey returns true if key is found in collection, false otherwise
- count returns the number of documents in a collection, supports filtering for subsets
- extract unique JSON attribute values from a collection
- grid create a 2D grid of data from keys and dot paths in a collection
- data frame support provides a persistant grid plus metadata associated with the collection
JSON Document level
- create a JSON document in a collection
- read back a JSON document in a collection
- update a JSON document in a collection
- delete a JSON document in a collection
- join a JSON document with a document in a collection
- list the lists JSON records as an array for the supplied keys
- path list the file path for a JSON document in a collection
JSON Document Attachments
- attach a file to a JSON document in a collection
- attachments lists the files attached to a JSON document in a collection
- detach retrieve an attached file associated with a JSON document in a collection
- prune delete one or more attached files of a JSON document in a collection
Search
- indexer indexes JSON documents in a collection for searching with find
- deindexer de-indexes (removes) JSON documents from an index
- find provides a search indexed full text interface into a collection
Samples and cloning
- sample - getting a random sample of keys
- clone - clone a repository
- clone-sample - cloning a respository into training and test collections
Collection health
- check - checks a collection against the current version of tools
- repair - repairs/upgrades a collection based on the current verison of the tool
- migrate - migrates from one file layout to another (e.g. bucekts and pairtree)