Caltech Library logo

Origin story

The inspiration for dataset grew from the desire to process metadata as JSON object collections using simple Unix shell utilities and data pipelines. The core use case evolved at Caltech Library working with various repository systems’ API (e.g. EPrints and Invenio). It has allowed the library to build an aggregated view of heterogeneous content (see https://feeds.library.caltech.edu) as well as facilitate ad-hoc analysis and data enhancement for a number of internal library projects.