dataset COLLECT_NAME init
init create a collection. Collections can be create on local disc, on Amazon’s S3 and in Google Cloud Storage. If you are initializing a collection on S3 or in Google Cloud Storage then the bucket (where the collection will reside) needs to already exist and you need to have been authenticated.
To store your collection in S3 prefix the path with s3://, likewise for Google Cloud Storage the prefix is gs://.
The following three example commands create a dataset collection named “data”. First one local disc in the current currectory, then in S3 and again in Google Cloud Storage. In the case of S3 and Google Cloud Storage the buckets exist and are named “stuff.example.org”. Also for both remote storage options it is assumed you’ve authenticated and have your environment setup correctly.
dataset data init
dataset s3://stuff.example.org/data init
dataset gs://stuff.example.org/data init
NOTE: After each envocation of dataset init
if all went well you will be
shown an OK
. If you want to save typing you can set the environment variable
DATASET. For our examples above that would look like
dataset data init
export DATASET="data"
or for the Amazon S3 example
dataset s3://stuff.example.org/data init
export DATASET="s3://stuff.example.org/data"
or for the Google storage example
dataset gs://stuff.example.org/data init
export DATASET="gs://stuff.example.org/data"
You can refernce loading the environment for AWS S3 access previous setup with the AWS SDK tool with by exporting the “AWS_SDK_LOAD_CONFIG” environment variable with the a value of “1”.
export AWS_SDK_LOAD_CONFIG=1
Google Cloud Platform authentication can be done via the gsutil command available with Google Cloud SDK setup.