of 6
Publication
:
Principles of Computation by Competitive Protein Dimerization Networks
Authors
:
Jacob Parres
-
Gold, Matthew Levine, Ben Emert, Andrew Stuart, Michael B. Elowitz
Primary contact for code & data
: Jacob Parres
-
Gold,
jacobparresgold@gmail.com
This dataset is intended to
demonstrate how the data were generated
and how to re
-
analyze the
data if necessary.
All analysis was originally performed on an
Amazon Web Services (AWS)
c5.4xlarge instance and has been
further annotated for readability.
We tried to carefully implement
as few changes as possible; however, it is always possible that bugs may have been introduced,
such as in file
-
handing commands.
See the Methods section for details on the analysis.
Utilities
The following Python script provides functions that are used by other notebooks in the dataset:
dimer_network_u
tilities.
py
Resources
The following resources are available to simulate
the input
-
output functions of arbitrary networks:
simulate_individual_networks.ipynb
This notebook demonstrates simulations of
individual
one
-
and two
-
input
networks
with
schematics
of
the
network
architecture.
Interactive Google Colab
Interactive Google Colab notebook for
simulating one
-
and two
-
input networks.
Notebooks to Replot Figures
The following notebooks were used to plot the figures used in the paper.
They were created to
make re
-
plotting of the data as simple as possible. The data for these figures are drawn directly
from the
archived data folder.
These notebooks also include many plots not included in the main
or supplementary figures
;
such
plots have not been
aesthetically
cleaned up.
remake_GraphicalAbstract.ipynb
Plots for graphical abstract
remake_Fig1.ipynb
Plots for Figure 1
remake_Fig2.ipynb
Plots for Figure 2
remake_Fig3.ipynb
Plots for Figure 3
remake_Fig4.ipynb
Plots for Figure 4
and Figure S3
remake_Fig5.ipynb
Plots for Figure 5
and Figure S4
remake_Fig6.ipynb
Plots for Figure 6
and Figure S6
remake_FigS5.ipynb
Plots for Figure S5
remake_FigS7.ipynb
Plots for Figure S7
Notebooks to Demonstrate Analysis
Methods
The following notebooks were created from code that was used to perform
the majority of the
analysis for the paper
. Many of these notebooks use the ray package for parallelization, which is
not available for Windows. Thus, if you would like to re
-
run these notebooks, make sure to disable
the commands involving ray. It
may
also be necessary to change code relevant to
loading and
saving files.
analyze_1D_screen.ipynb
This notebook analyzes the responses
observed in the parameter screen of one
-
input network responses.
analyze_2D_screen.ipynb
This notebook analyzes the responses
observed in the parameter screen of
two
-
input network responses.
analyze_connectivity_trends.ipynb
Using the large screen of many networks,
this notebook assess
es
how various
network properties vary with network
connectivity.
analyze_equilibration_kinetics.ipynb
This
notebook
use
s
deterministic
simulations, via numerical integration of
ordinary differential equations (ODEs), to
estimate the time required for network re
-
equilibration after a perturbation of the
input monomer.
analyze_expression_noise_robustness.ipynb
This notebook simulate
s
various networks
(one for each unique function) under
random perturbations of monomer
expression levels (total abundances).
analyze_intrinsic_noise.ipynb
This
notebook
use
s
stochastic
stimulations, via Gillespie simulation, of
networks at steady state to estimate
stochastic
fluctuations
in
dimer
concentrations at equilibrium.
boolean_complexity
.ipynb
This notebook demonstrates how to
calculate the Boolean
complexity of the
Boolean functions used in this work.
generate_param_screen.ipynb
This notebook performs a parameter
screen of networks, considering 1 or 2
monomers as the input
.
optimizer_dualannealing.ipynb
This notebook demonstrates how a dual
annealing algorithm was used to optimize
dimerization networks to perform desired
functions.
separation_of_timescales
.ipynb
This
notebook
use
s
deterministic
simulations, via numerical integration of
ordinary differential equations (ODEs),
to
determine whether networks are at
equilibrium when an input monomer
oscillates over time.
v
ersatility_expression_changes
.ipynb
This notebook analyzes, from the results
of the optimization trials investigating
one
-
input versatility, the extent to which
accessory protein expression levels
changed to achieve versatility.
Data
: General
The full dataset is quite large
(~332 Gb)
. The data is
available as individual files
in an
AWS S3
bucket hosted by CaltechDATA.
We have also created a more user
-
friendly subset of the data
in a zip file called
parresgold_2023_dimer_networks_plotting_data.zip
. This file contains only the data
necessary to re
-
plot the figures
and is significantly smaller in size.
Data from Parameter Screens
param_screen_1D
Raw data from
one
-
input parameter
screen.
param_screen_1D_limited_param_range
Raw data from one
-
input parameter
screen with a more limited parameter
range.
param_screen_2D
Raw data from two
-
input parameter
screen.
param_screen_analysis_1D
Results of downstream analysis of the
one
-
input parameter screens.
param_screen_analysis_1D_limited_param_range
Results of downstream analysis of the
one
-
input parameter screens with a
more limited parameter range.
param_screen_analysis_2D
Results of downstream analysis of the
two
-
input parameter screens.
Data from
Annealing Optimizations for Multi
-
Input Logic Gates
optimization_trials_dualannealing_3D_4D
Optimization results, based on the dual
annealing optimizer,
for three
-
and four
-
input optimization trials.
Data on Transcription Factor Co
-
Expression
transcription_factor_coexpression
Co
-
expression data for nuclear receptor (NR)
and bZIP transcription factors in mouse and
humans.
Code
and Data
for Versatility Analysis
The code for the versatility analysis, in which genetic algorithm
optimization was used to optimize
accessory expression levels
of networks to perform desired functions, was primarily written by
Matthew Levine.
This code was not condensed like the above notebooks were; instead, they have
been
directly
included
as
-
is.
The
code
can
be
found
in
randomK_versatility_optimization_trials
, and the results
can be found in the folders listed
below
.
This code, rather than being run in a notebook, was run as a series of batch scripts.
We
understand that this code is not as easily
parsed
as the rest; if you have questions, please reach out
to the authors.
optimization_trials_randomK_1D
Versatility results
for the general one
-
input functions.
targets.csv
:
This file has columns
"m","targetID","used","0","1",..."29"
where
"m" is the network size, "targetID" is the name
of the m
-
dependent index of the target
function, "used" is True if the target was used
in the experiment, and the other columns are
the 30 val
ues of the target.
summary.csv
:
This file has columns "m",
"targetID", "KID", "dimerID", "Linf", "MSE",
"goodenough", where "m" is the network size,
"targetID" is the name of the m
-
dependent
index of the target function, "KID" is the name
of the m
-
dependent index of the network
parameter
K, "dimerID" is the name of the
index of the dimer, "Linf" is the Linf error of
the target fit, "MSE" is the MSE error of the
target fit, and "goodenough" is True if Linf <=
1.0.
a_opt.pkl
:
This file is a dictionary with keys
"m" and a value of a 4D numpy array of shape
(N K's x N Dimers x N targets x N accessories).
The first index is the index of the KID, the
second index is the index of the dimer, the third
index is the index of the target
, and the fourth
index is the index of the accessory. The value
is the optimal accessory parameter a as
determined by the optimization experiment.
K_random.pkl
:
This file is a dictionary with
keys "m" and a value of a 2D numpy array of
shape (N K's x N Dimers). The first index is the
index of the KID, and the second index is the
index of the dimer. The value is the random K
value used in the experiment.
name_dict.pkl
:
This file is a dictionary with
keys "K_names" and "a_names" and values of
dictionaries with keys "m" and values of lists
of names. The "K_names" dictionary has keys
"m" and values of lists of names of the KID's.
The "a_names" dictionary has keys "m" and
va
lues of lists of names of the dimerID's.
optimization_trials_
randomK_
1D_testing_connectivity
Versatility results for an experiment in which we
tested networks of 8 monomers while explicitly
varying network connectivity.
The filenames follow
the same conventions as above.
optimization_trials_randomK_
2
D
Versatility results for a select set of 2
-
input functions.
Note that data is included for networks of 8, 16, and 20
monomers, although the analys
e
s shown in Figure 6
and Figure S6 come from the 20
-
monomer dataset.
The 20
-
monomer dataset was optimized much more
deeply than the other two datasets were, so we caution
against directly comparing the results.
Optimization
results are provided as a
.pkl
file for a dictionary. The
first level of keys specify:
Network size
m
Target name
targetID
Index of the random K
KID
E.g., a first
-
level key could be
’m
-
20_targetID
-
XNOR_KID
-
9_dimerID
-
Non
e’
The second level of keys specify
properties associated
with the
optimized solution:
['MSE','MSE
-
per
-
target','Linf','a',
'theta','logK','K','f_fit','f_target']
a
gives the optimized accessory concentrations
theta
is a binary vector with many 0’s and
exactly one 1 (the location of this 1 indicates
the optimally selected output dimer ID).
logK
and
K
represent the binding affinities for
this
KID
f_fit
is the
optimized
2D function (in log
space) un
-
rolled into a vector. To get the 2D
grid, you can do .reshape(12,12).
f_target
is the
target
2D function (in log
space) un
-
rolled into a vector. To get the 2D
grid, you can do .reshape(12,12)
There is also a summary
.csv
file that provides a
bunch of metrics comparing how well
f_fit
matches
f_target
.