dataset is designed to easily manage collections of JSON objects. Objects are associated with a unique key you provide. The objects themselves are stored on disc in a bucket folder inside the collection folder. The collection folder contains a JSON object document called collection.json. This file stores metadata about the collection including the association of keys with paths to their objects. dataset comes in several flavors — a command line program called dataset, a Go langauge package also called dataset, a shared library called libdataset and a Python 3.6 package called dataset. This tutorial talks about the command line program and the Python package.
To create a collection you use the init verb. In the following examples you will see how to do this with both the command line tool called dataset as well as the Python module of the same name.
Let’s create a collection called friends.ds. At the command line type the following.
dataset init friends.ds
Notice that when you typed this in you see an “OK” response. If there had been an error then you would have seen an error message instead.
Working in Python is similar to the command line we do need to import some modules and for these exercises we’ll be importing the following modules sys, os, json and of course dataset.
import sys
import os
import json
import dataset
# stop is a convenience function
def stop(msg):
print(msg)
sys.exit(1)
err = dataset.init("friends.ds")
if err != "":
stop(err)
In Python the error message is an empty string if everything is ok, otherwise we call stop which prints the message and exits. You will see this pattern followed in a number of upcoming Python examples.
There is no dataset verb to remove a collection. A collection is just a folder with some files in it. You can delete the collection by throwing the folder in the trash (Mac OS X and Windows) or using a recursive remove in the Unix shell.
As with many systems that store information dataset provides for basic operations of creating, updating and deleting. In the following section we will work with the friends.ds collection and favorites.ds collection we created previously.
I have some friends who are characters in ZBS radio plays. I am going to create and save some of their info in our collection called friends.ds. I am going to store their name and email address so I can contact them. Their names are Little Frieda, Mojo Sam and Jack Flanders.
dataset friends.ds create frieda '{"name":"Little Frieda","email":"frieda@inverness.example.org"}'
Notice the “OK”. Just like init the create verb returns a status. “OK” means everything is good, otherwise an error is shown. Doing the same thing in Python would look like.
err = dataset.create("friends.ds", "frieda", {"name":"Little Frieda","email":"frieda@inverness.example.org"})
if err != "":
stop(msg)
With create we need to provide a collection name, a key (e.g. “frieda”) and Python dict (which becomes our JSON object). Now let’s add records for Mojo Sam and Jack Flanders.
command line –
dataset friends.ds create "mojo" '{"name": "Mojo Sam, the Yudoo Man", "email": "mojosam@cosmic-cafe.example.org"}'
dataset friends.ds create "jack" '{"name": "Jack Flanders", "email": "capt-jack@cosmic-voyager.example.org"}'
in python –
err = dataset.create("friends.ds", "mojo", {"name": "Mojo Sam, the Yudoo Man", "email": "mojosam@cosmic-cafe.example.org"})
if err != "":
stop(err)
err = dataset.create("friends.ds", "jack", {"name": "Jack Flanders", "email": "capt-jack@cosmic-voyager.example.org"})
if err != "":
stop(err)
We have three records in our friends.ds collection — “frieda”, “mojo”, and “jack”. Let’s see what they look like with the read verb.
command line –
dataset friends.ds read frieda
This command emitts a JSON object. The JSON is somewhat hard to read. To get a pretty version of the JSON object used the “-p” option.
dataset -p friends.ds read frieda
On the command line you can easily pipe the results to a file for latter modification. Let’s do this for each of the records we have created so far.
dataset -p friends.ds read frieda > frieda-profile.json
dataset -p friends.ds read mojo > mojo-profile.json
dataset -p friends.ds read jack > jack-profile.json
Working in python is similar but rather than write out our JSON structures to a file we’re going to keep them in memory as Python dict.
In Python –
(frieda_profile, err) = dataset.read("friends.ds", "frieda")
if err != "":
stop(err)
(mojo_profile, err) = dataset.read("friends.ds", "mojo")
if err != "":
stop(err)
(jack_profile, err) = dataset.read("friends.ds", "jack")
if err != "":
stop(err)
In Python, just like with init and create the read verb returns a tuple of the value and err. Notice a pattern?
Next we can modify the profiles (the *.json files for the command line version). We’re going to add a key/value pair for “catch_phrase” associated with each JSON object in friends.ds. For Little Frieda edit freida-profile.json to look like –
{
"_Key": "frieda",
"email": "frieda@inverness.example.org",
"name": "Little Frieda",
"catch_phrase": "Woweee Zoweee"
}
For Mojo’s mojo-profile.json –
{
"_Key": "mojo",
"email": "mojosam@cosmic-cafe.example.org",
"name": "Mojo Sam, the Yudoo Man",
"catch_phrase": "Feet Don't Fail Me Now!"
}
An Jack’s jack-profile.json –
{
"_Key": "jack",
"email": "capt-jack@cosmic-voyager.example.org",
"name": "Jack Flanders",
"catch_phrase": "What is coming at you is coming from you"
}
On the command line we can read in the updated JSON objects and save the results in the collection with the update verb. Like with init and create the update verb will return an “OK” or error message. Let’s update each of our JSON objects.
dataset friends.ds update freida frieda-profile.json
dataset friends.ds update mojo mojo-profile.json
dataset friends.ds update jack jack-profile.json
TIP: By providing a filename ending in “.json” the dataset command knows to read the JSON object from disc. If the object had stated with a “{” and ended with a “}” it would assume you were using an explicit JSON expression.
In Python we can work with each of the dictionaries variables we save from our previous read example. We add our “catch_phrase” attribute then update each record.
frieda_profile["catch_phrase"] = "Wowee Zowee"
mojo_profile["catch_phrase"] = "Feet Don't Fail Me Now!"
jack_profile["catch_phrase"] = "What is coming at you is coming from you"
err = dataset.update("friends.ds", "frieda", frieda_profile)
if err != "":
stop(err)
err = dataset.update("friends.ds", "mojo", mojo_profile)
if err != "":
stop(err)
err = dataset.update("friends.ds", "jack", jack_profile)
if err != "":
stop(err)
As an exercise how would you read back the updated version on the command line or in Python?
Eventually you might want to remove a JSON object from the collection. Let’s remove Jack Flander’s record for now.
command line –
dataset friends.ds delete jack
Notice the “OK” in this case it means we’ve successfully delete the JSON object from the collection.
An perhaps as you’ve already guessed working in Python looks like –
err = dataset.delete("friends.ds", "jack")
if err != "":
stop(err)
Eventually you have lots of objects in your collection. You are not going to be able to remember all the keys. dataset provides a keys function for getting a list of keys as well as a count to give you a total number of keys.
Now that we’ve deleted a few things let’s see how many keys are in friends.ds. We can do that with the count verb.
Command line –
dataset friends.ds count
In Python –
cnt = dataset.count("friends.ds")
print(f"Total Records Now: {cnt}")
Likewise we can get a list of the keys with the keys verb.
dataset friends.ds keys
If you are following along in Python then you can just save the keys to a variable called keys.
keys = dataset.keys("friends.ds")
print("\n".join(keys))
One of the challenges in working on JSON objects is their tree like structure. When tabulating or comparing values it is often easier to work in a spreadsheet like grid. grid is dataset’s verb for taking a list of keys, a list of dot paths into the JSON objects and returning a 2D grid of the results. This is handy when generating reports. A grid unlike frame which we will see shortly doesn’t enforce any specifics on the columns and rows. It only contains the values you specify.
Let’s create a grid from our friends.ds collection.
dataset friends.ds keys > fiends.keys
dataset friends.ds grid friends.keys .name .email .catch_phrase
As with read the grid verb can take the “-p” option to make the JSON grid a little easier to read.
dataset -p friends.ds grid friends.keys .name .email .catch_phrase
Notice we make a list of keys first and save those to a file. Then we use that list of keys and create our grid. The grid output is in JSON notation. In Python making a grid follows a similar patter, generate a list of keys, use those keys and a list of dot paths to define the grid.
keys = dataset.keys("friends.ds")
(g, err) = dataset.grid("friends.ds", keys, [".name", ".email", "catch_phrase"])
if err != "":
stop(err)
print(json.dumps(g, indent = 4))
In python grid like create and update returns a tuple that has your result and an error status. Finally we print our result using the JSON module’s dumps.
dataset also comes with a frame verb. A frame is like a grid plus additional matadata. It enforces a structure such on its grid. Column 1 of the frame’s internal grid element always has the keys associated with the collection. A frame will also derive heading labels from the dot paths used to define the frame and will include metadata about the collection, keys used to define the frame and default types of data in the columns. The extra information in a frame stays with the collection. Frames are persistent and can be easily recalculated based on collection updates. Finally frames as used by more complex verbs such as export-csv, export-gsheet, and indexer we’ll be covering later.
To define a frame we only need one additional piece of information besides what we used for a grid. We need a name for the frame.
Working from our previous grid example, let’s call this frame “name-and-email”.
dataset friends.ds frame "name-and-email" fiends.keys .name .email .catch_phrase
In python it would look like
keys = dataset.keys("friends.ds")
err = dataset.frame("friends.ds", "name-and-email", keys, [ ".name", ".email", ".catch_phrase"])
if err != "":
stop(err)
To see the contents of a frame we only need to support the collection name and frame name.
dataset friends.ds frame "name-and-email"
In Python it’d look like
(f, err) = dataset.frame("friends.ds", "name-and-email")
if err != "":
stop(err)
print(json.dumps(f, indent = 4))
Looking at the resulting JSON object you see many other attribute beyond the grid of values. These are what simplify some of dataset more complex interactions.
Let’s add back the Jack record we deleted a few sections again and “reframe” our “name-and-email” frame.
# Adding back Jack
dataset friends.ds create jack jack-profile.json
# Save all the keys in the collection
dataset friends.ds keys > friends.keys
# Now reframe "name-and-email" with the updated friends.keys
dataset friends.ds reframe "name-and-email" friends.keys
# Now let's take a look at the frame
dataset -p friends.ds frame "name-and-email"
Like with grid and read before it the “-p” option will cause the JSON representation of the frame to be pretty printed.
Let’s try the same thing in Python
err = dataset.create("friends.ds", "jack", jack_profile)
if err != "":
stop(err)
keys = dataset.keys("friends.ds")
err = dataset.reframe("friends.ds", "name-and-email", keys)
if err != "":
stop(err)
(f, err) = dataset.frame("friends.ds", "name-and-email")
if err != "":
stop(err)
print(json.dumps(f, indent = 4))
We can list the frames in the collection using the frames verb.
dataset friends.ds frames
In Python
frame_names = dataset.frames("friends.ds")
print("\n".join(frame_names))
In our frame we have previously defined three columns, looking at the JSON representation of the frame we also see three “labels” and three “types”. These properties are for future use. Our plan is to leverage them when exporting to spreadsheets as well as when definining search indexes. E.g. labels could be used as column names when synchronizing a spreadsheet with collection values and the types fields would be helpful when defining search indexes.
Finally the last thing we need to be able to do is delete a frame. Delete frames work very similar to deleting a JSON record.
dataset friends.ds delete-frame "name-and-email"
Or in Python
err = dataset.delete_frame("friends.ds", "name-and-email")
if err != "":
stop(err)
TIP: Frames like collections have a number of operations. Here’s the list
Continue exploring dataset with