CSV files are commonly used to share data. Most spreadsheets and many database systems can export and import from CSV files. datatset can use a spreadsheet in CSV format to populate JSON objects in a collection. The header row of the CSV file will become the object attribute names and the rows will become their values. dataset requires a column of unique values to become the keys for the JSON objects stored in the collection.
You can export CSV directly from a collection too. The paths to the elements in the objects become the header row and the values exported from the objects become the subsequent rows.
NOTE: In an upcoming release of data the specific command line
parameters and Python method definitions may change. Now that
_dataset_ supports the concept of frames import and export to
a tabular structure (or even synchronizing with a tabular
structure) can be simplified. This document describes the current
release method for working with CSV files.
You can import rows of a CSV document as JSON documents. This is useful when you have a large number of simple structures.
In this example we have a simple CSV file containing the following
name,email
Zowie,zowie@zbs.example.org
Ralph Rolf,ralph.rolf@zbs.example.org
Mojo Sam,mojo.sam@zbs.example.org
Little Frieda,little.frieda@zbs.example.org
Far Seeing Art,old.art@zbs.example.org
Jack Flanders,captain.jack@zbs.example.org
Save this file as characters.csv. To import this let’s create a collection named characters.
dataset init characters-v1.ds
Now we can populate our characters collection by importing characters.csv. Then look at the keys.
dataset import characters-v1.ds characters.csv 2
dataset keys characeter-v1.ds
Notice the assigned ids. We used the second column, the one with th email heading to be our keys.
ralph.rolf@zbs.example.org
zowie@zbs.example.org
captain.jack@zbs.example.org
little.frieda@zbs.example.org
mojo.sam@zbs.example.org
old.art@zbs.example.org
dataset keys characters-v1.ds |\
while read ID; do
dataset read characters-v1.ds "${ID}";
done
Now let’s make a new version of our characters collection but this time we’ll column one (the name column) as the key.
dataset init characters-v2.ds
dataset import characters-v2.ds characters.csv 1
dataset keys
Now our keys look a little different.
Reading the records back we see we have the JSON same document structure.
dataset keys characters-v2.ds | \
while read ID; do
dataset read characters-v2.ds "${ID}";
done
Our records look like…
{"email":"captain.jack@zbs.example.org","name":"Jack Flanders"}
{"email":"zowie@zbs.example.org","name":"Zowie"}
{"email":"ralph.rolf@zbs.example.org","name":"Ralph Rolf"}
{"email":"mojo.sam@zbs.example.org","name":"Mojo Sam"}
{"email":"little.frieda@zbs.example.org","name":"Little Frieda"}
Let’s create a new collection and try the “-use-header-row=false” option.
dataset init characters-v4.ds
dataset import -skip-header-row=false characters-v4.ds characters.csv
dataset keys characters-v4.ds
dataset keys characters-v4.ds | \
while read ID; do
dataset read characters-v4.ds "${ID}";
done
Our ids are like in our first example because we chose to use the default JSON document key.
characters.csv_2
characters.csv_3
characters.csv_4
characters.csv_5
characters.csv_6
characters.csv_7
characters.csv_1
Now take a look at the records output
{"col1":"Zowie","col2":"zowie@zbs.example.org"}
{"col1":"Ralph Rolf","col2":"ralph.rolf@zbs.example.org"}
{"col1":"Mojo Sam","col2":"mojo.sam@zbs.example.org"}
{"col1":"Little Frieda","col2":"little.frieda@zbs.example.org"}
{"col1":"Far Seeing Art","col2":"old.art@zbs.example.org"}
{"col1":"Jack Flanders","col2":"captain.jack@zbs.example.org"}
{"col1":"name","col2":"email"}
Instead of a name and email property name we have col1 and col2.
Setting “-use-header-row” to false can be used with column numbers and or
“-uuid” option. Give it a try with this final collection.
dataset import -use-header-row=false characters-v5.ds characters.csv 2
dataset keys characters-v5.ds
dataset keys characters-v5.ds | \
while read ID; do
dataset read characters-v5.ds "${ID}";
done
Explore what you see.
We can import data from a CSV file and store each row as a JSON document in dataset. You need to pick a column with unique values to be the key for each record in the collection. In this example we assume column one has the key value.
dataset init mydata.ds
dataset import mydata.ds my-data.csv 1
You can create a CSV export by providing the dot paths for each column and then givening columns a name.
dataset frame -all mydata.ds export-frame '.id' '.title' '.pubDate'
dataset frame-labels mydata.ds export-frame 'id,title,publication date'
dataset export mydata.ds export-frame
If you wanted to restrict to a subset (e.g. publication in year 2016) You just need to create a frame with that restriction.
dataset keys mydata.ds '(eq 2016 (year .pubDate))' | \
dataset frame mydata.ds published-2016 \
'.id' '.title' '.pubDate'
dataset frame-labels mydata.ds published-2016 \
'id' 'title' 'publication' 'date'
dataset export mydata.published-2016 ds