Published March 6, 2024 | Version v2
Dataset Open

Data from "Measuring data rot: an analysis of the continued availability of shared data from a single university"

  • 1. ROR icon California Institute of Technology

Description

Data files from the article "Measuring Data Rot: An Analysis of the Continued Availability of Shared Data from a Single University" by Kristin Briney.

This research looked at supplemental data links from publications in CaltechAUTHORS and tested them for their availability on the web using web scraping and hand testing in the Chrome browser.

Data in the tables:

  • Table1_ResearchAreas.csv
  • Table2_LinkType.csv
  • Table3_URLwebsites.csv
  • Table4_DOIwebsites.csv
  • Table5_UnavailableByType.csv
  • Table6_UnavailableURLs.csv
  • Table7_UnavailableDOIs.csv

Data in the figures:

  • Figure1_LinksByYear.csv
  • Figure2_UnavailableByYear.csv

Data from the project:

  • DataRot.csv
    • Overall dataset supporting this research, with variables defined in the data dictionary. This data contains all of the links tested, listing results of the webscraping but not results of the hand testing.
  • DataRot_dataDictionary.csv
    • Data dictionary defining variable names and values for DataRot.csv
  • DataRot_handTested.csv
    • Subset of supplemental data links from DataRot.csv that were hand tested and the results of the hand testing ("browser_test = TRUE" means the data was available, "browser_test = FALSE" means the data was not available, and "browser_test = LOGIN" means the webpage asked for a login to see the data).
  • DataRot_missingData.csv
    • Subset of DataRot_handTested.csv with fewer variables. This dataset only includes supplemental data links for data that was not available.

CaltechAUTHORS sampling dataset:

  • Sampling.csv
    • Contains comparison between 450 articles recorded in CaltechAUTHORS with what is listed in the articles themselves with respect to shared data and supplementary information.
  • Sampling_dataDictionary.txt
    • Data dictionary defining variable names and values for Sampling.csv

Files

README.txt
Files (683.6 kB)
Name Size
md5:7b97c35ec0f75498fb50542a4618bb62
318 Bytes Preview Download
md5:e1b4ec281601c56dcbf62492139a1557
593 Bytes Preview Download
md5:8145c7b82da1f0837260b12f27b92c1f
1.5 kB Preview Download
md5:d32e57df371c0f6e345b900b3855fa02
2.2 kB Preview Download
md5:de9cfb8afd38053707be81324808e8c1
275 Bytes Preview Download
md5:8a37b3b4742cf194378faf450d586dcc
56.0 kB Preview Download
md5:85b62f27931719d9205c91d5508a70a7
523.3 kB Preview Download
md5:61d79221e29dbb27850358c3764b751f
22.7 kB Preview Download
md5:5c1db24633f764c56d8c4b6aeb009d9f
60.8 kB Preview Download
md5:495904f928fc876eba87d6fc8000407e
1.1 kB Preview Download
md5:b19fcf97af249d7544f57eb0d56ae39b
10.9 kB Preview Download
md5:bfbb378d2ae021b36f2c3504fd4e43c3
258 Bytes Preview Download
md5:49e20fbf0e5e42fa074a0048dfb9da51
1.6 kB Preview Download
md5:972f7b5da1095d1fa34d7e986e867681
1.5 kB Preview Download
md5:f35bfea6b0bfe21a3a1217f46b44d23c
178 Bytes Preview Download
md5:9b6de5cb43b8873b1a69d1e537c7238b
258 Bytes Preview Download

Additional details

Created:
March 6, 2024
Modified:
March 6, 2024