Published September 10, 2021 | Version v1
Dataset Open

Discovery of complex oxides via automated experiments and data science

  • 1. Google Research, Google Applied Science
  • 2. Division of Engineering and Applied Science and Joint Center for Artificial Photosynthesis, California Institute of Technology

Description

This dataset is licensed under the Creative Commons Attribution 4.0 license(CC-BY-4.0). See https://creativecommons.org/licenses/by/4.0/for more information.

 

If using this dataset, please cite https://doi.org/10.1073/pnas.2106042118

 

We've released data from 6 print sessions, comprising 173 plates, 131 quaternary oxide systems, 6,918,024  individual composition samples, and 376,752 distinct compositions. While the tenfold reproductions within each plate are well controlled, uncontrolled variables (printhead age, etc) may lead to poorer consistency between print sessions.

 

The data exists in four directories and one metadata file. Each directory contains one type of data, with one *.csv file per printed plate.

 

i. The data in ten_replicas/ consists of optical transmission data, with one row per printed patch on a plate. The column headers are:

ExpID: an integer experiment ID for the printed patch on the plate.

row, col: The row and the column coordinates of the printed patch in the microscope image

signal_#: The measurement of ɑ, the optical transmission spectrum of the printed patch, at a given wavelength. # ranges from 0 to 8, inclusive, indicating transmission spectra at the following wavelengths: 375, 395, 455, 530, 590, 617, 660, 735, & 850 nm.

plate: The integer plate identifier.

line: An integer identifier of the composition gradient that was printed.

line_experiment_id: An integer identifier of the composition sample along the composition gradient.

replica: An integer identifier of the replica # of the printed line.

metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals printed.

 

ii. The data in aggregated_replicas/ consists of optical transmission data, with one row per tenfold aggregated patch on a plate. The column headers are:

signal_#: The measurement of ɑ, the optical transmission spectrum of the printed patch, at a given wavelength. # ranges from 0 to 8, inclusive, indicating transmission spectra at the following wavelengths: 375, 395, 455, 530, 590, 617, 660, 735, & 850 nm.

plate: The integer plate identifier.

line: An integer identifier of the composition gradient that was printed.

line_experiment_id: An integer identifier of the composition sample along the composition gradient.

metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals printed.

 

iii. The data in mixture/ represents the outcome of a probabilistic model that a given composition can be explained by a mixture of at most 3 binary signals. There is one row per composition. The column headers are:

log_prob: The log of the probability that this composition is explainable by at most 3 binary signals.

metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals in the composition.

 

iv. The data in phase_fits/ represents the outcome of a phase fitting model. There is one row per phase diagram. This data is meant to be read using the example colab. The column headers are:

residual: Float, the residual of the phase fit.

signal_type: This is either 'signal' or 'sigma', indicating the type of the phase fit (see paper).

discretization: The integer number of intervals we discretized the phase space into.

n_points: The number of internal points in the phase diagram. This is an integer between 1 and 5, inclusive.

metal_0, metal_1, metal_2: Three strings identifying the constituent metals of the phase diagram.

point_#_pos_0, point_#_pos_y: The coordinates of a phase point. # ranges between 0 and 7, inclusive. point_#_pos_0 gives the float amount of metal_0, and point_#_pos_1 gives the float amount of metal_1. The float amount of metal_2 can be inferred via 1 - (point_#_pos_0 + point_#_pos_1).

point_#_fitted_channel_X: The fitted optical absorption spectra of point_#. # is an integer between 0 and 7, inclusive. X is an integer between 0 and 8, inclusive, indicating the wavelength of the light absorbed.

 

The files are publicly available for access via:

- the gsutil CLI tool at https://cloud.google.com/storage/docs/gsutil

- the tf.io.gfile APIs at https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile

- HTTP API: http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/path/to/file

 

This file, the README, is available at:

http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/README.txt

 

The metadata file is available at:

http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/metadata.csv, which lists all the plates available for download.

 

The plate data for each of the four data types listed above can be found at:

http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/data_type_subdir/plate.csv

Files

metal-oxide-spectroscopy.zip
Files (809.2 MB)
Name Size
md5:da839dd4cd4b96af1d0bd0e5442a94f6
809.2 MB Preview Download

Additional details

Created:
January 31, 2024
Modified:
January 31, 2024