Caltech Library logo

How to import CSV rows as JSON documents

You can import rows of a CSV document as JSON documents. This is useful when you have a large number of simple structures.

In this example we have a simple CSV file containing the following

    name,email
    Zowie,zowie@zbs.example.org
    Ralph Rolf,ralph.rolf@zbs.example.org
    Mojo Sam,mojo.sam@zbs.example.org
    Little Frieda,little.frieda@zbs.example.org
    Far Seeing Art,old.art@zbs.example.org
    Jack Flanders,captain.jack@zbs.example.org

Save this file as characters.csv. To import this let’s create a collection named characters.

    dataset characters-v1.ds init

Now we can populate our characters collection by importing characters.csv. Then look at the keys.

    dataset characters-v1.ds import characters.csv 2
    dataset characeter-v1.ds keys

Notice the assigned ids. We used the second column, the one with th email heading to be our keys.

    ralph.rolf@zbs.example.org
    zowie@zbs.example.org
    captain.jack@zbs.example.org
    little.frieda@zbs.example.org
    mojo.sam@zbs.example.org
    old.art@zbs.example.org
    dataset characters-v1.ds keys | while read ID; do dataset characters-v1.ds read "${ID}"; done

In out next example we will use the environment variable DATASET to save us some typing. Now let’s make a new version of our characters collection but this time we’ll column one (the name column) as the key.

    dataset init characters-v2.ds
    export DATASET="characters-v2.ds"
    dataset import characters.csv 1
    dataset keys

Now our keys look a little different.

Reading the records back we see we have the JSON same document structure.

    dataset list keys | while read ID; do dataset read "${ID}"; done

Our records look like…

    {"email":"captain.jack@zbs.example.org","name":"Jack Flanders"}
    {"email":"zowie@zbs.example.org","name":"Zowie"}
    {"email":"ralph.rolf@zbs.example.org","name":"Ralph Rolf"}
    {"email":"mojo.sam@zbs.example.org","name":"Mojo Sam"}
    {"email":"little.frieda@zbs.example.org","name":"Little Frieda"}

Again the header row becomes the property names of the JSON document. But what if you don’t have a unique ID and don’t like the filename/row number in our first example? You can generate a UUID for each record by using the “-uuid” option. Let’s create a third version of characters and step through the results as before.

    dataset init characters-v3
    export DATASET=characters-v3
    dataset -uuid import characters.csv
    dataset list keys
    dataset list keys | while read ID; do dataset read "${ID}"; done

Notice that the UUID is inserted into the result JSON documents. This lets you easily keep records straight even if you rename the keys when moving between collections.

    {"email":"little.frieda@zbs.example.org","name":"Little Frieda","uuid":"27a5295f-4a80-4855-a2d1-e8a3a1a4623f"}
    {"email":"old.art@zbs.example.org","name":"Far Seeing Art","uuid":"872f68fe-f96b-4ce0-83bb-5c255d28cae7"}
    {"email":"captain.jack@zbs.example.org","name":"Jack Flanders","uuid":"fa382371-9a9e-4ade-a63c-7ebf88ef266e"}
    {"email":"zowie@zbs.example.org","name":"Zowie","uuid":"c05fceaa-b5de-460a-9497-f38fd9434cef"}
    {"email":"ralph.rolf@zbs.example.org","name":"Ralph Rolf","uuid":"fb48731d-9da7-4cc0-990d-9a5d1e0b33ac"}
    {"email":"mojo.sam@zbs.example.org","name":"Mojo Sam","uuid":"5aea6f22-390c-4727-8235-b9cab5ea1180"}

What if the CSV file has no header row?

Let’s create a new collection and try the “-skip-header-row=false” option.

    dataset init characters-v4
    export DATASET=characters-v4
    dataset -skip-header-row=false import characters.csv
    dataset list keys
    dataset list keys | while read ID; do dataset read "${ID}"; done

Our ids are like in our first example because we chose to use the default JSON document key.

    characters.csv_2
    characters.csv_3
    characters.csv_4
    characters.csv_5
    characters.csv_6
    characters.csv_7
    characters.csv_1

Now take a look at the records output

    {"col1":"Zowie","col2":"zowie@zbs.example.org"}
    {"col1":"Ralph Rolf","col2":"ralph.rolf@zbs.example.org"}
    {"col1":"Mojo Sam","col2":"mojo.sam@zbs.example.org"}
    {"col1":"Little Frieda","col2":"little.frieda@zbs.example.org"}
    {"col1":"Far Seeing Art","col2":"old.art@zbs.example.org"}
    {"col1":"Jack Flanders","col2":"captain.jack@zbs.example.org"}
    {"col1":"name","col2":"email"}

Instead of a name and email property name we have col1 and col2. Setting “-skip-header-row” to false can be used with column numbers and or “-uuid” option. Give it a try with this final collection.

    dataset -skip-header-row=false import characters.csv 2
    dataset -skip-header-row=false -uuid import characters.csv
    dataset list keys
    dataset list keys | while read ID; do dataset read "${ID}"; done

Explore what you see.