Preview

ABOUT

Description

This repository contains all the data and code needed to generate the figures and support the conclusions in the following works:

"Diversity in Notch ligand-receptor signaling interactions" by Rachael Kuintzle, Leah Santat, and Michael B. Elowitz (submitted to eLife in 2023).
Rachael Kuintzle's PhD thesis: "Diversity in Notch ligand-receptor signaling interactions" (2023). CaltechTHESIS. https://thesis.library.caltech.edu/15235/

Dependencies

This study used Python3 version 3.7.7 and the following packages:

numpy version 1.19.2
scipy version 1.5.2
pandas version 0.24.2
jupyter
- jupyter core : 4.7.0
- jupyter-notebook : 6.1.6
- qtconsole : 4.7.7
- ipython : 7.19.0
- ipykernel : 5.3.4
- jupyter client : 6.1.7
- jupyter lab : 2.2.6
- nbconvert : 6.0.7
- ipywidgets : 7.6.3
- nbformat : 5.1.2
- traitlets : 5.0.5
seaborn version 0.11.1
matplotlib version 3.3.2
FlowCal version 1.3.0
operator
glob
re
time
random
os
sys
importlib

Several custom python scripts are included in this repository and are imported and used by the Jupyter notebooks herein for flow cytometry data analysis:

flow_data_munging.py </br> import flow_data_munging as munge_flow
global_parameters_flow_analysis.py</br> import global_parameters_flow_analysis as flow_params
flow_statistical_analysis.py</br> import flow_statistical_analysis as flow_stat
plot_flow_data.py</br> import plot_flow_data as flowplot

Notes

Some file paths in Jupyter notebooks may differ from the file paths in this repository, although relative paths should be correct. In particular, where custom scripts are imported at the top of each Jupyter notebook file, make sure the path to those scripts is defined correctly.
Some figure attributes (axes labels, legends) were added or cleaned up in Illustrator, so the figures in the below Figure folders may differ in minor ways from figures in the above studies.
Disregard .efl files in the Raw_Data subfolders in Flow Cytometry.
In most cases, commented-out code was not run. However, occasionally the number of bootstrapping replicates was set to a low number for quick figure re-generation (commenting out the 10000 bootstrap replicates number used for statistical analysis). In all cases, bootstrapping means and confidence intervals were done with 10,000 bootstrap replicates in order to produce the exported files and figures used in the works listed above.

REPOSITORY STRUCTURE

Where to find datasets (raw and processed), code, and figures. For more information about how datasets were generated, see the Methods sections of the written works described above. The Adobe Illustrator file in this repository (.ai extension) contains the figure layouts with figures produced using the data and code described below.

I. Bulk_RNA-seq

This dataset is available at NCBI under GEO accession GSE233573.

A. Code
- C2C12-Nkd_vs_WT_plated_ligand_assay_scatterplotting.ipynb: Run this notebook to generate RNA-seq scatterplot figures.
B. Figures
C. Data (see file record_of_analysis_steps.txt and the Methods sections of the above studies for detailed analysis descriptions)
1. annotations: the Gencode gtf file used to compute transcript abundances.
2. bedgraph: Files for alignment visualization
3. edgeR: Differential expression analysis (not used in the above studies).
4. Fastqs: Raw data.
5. processed_data_for_scatterplots: csv files with data subsets from the Stringtie-generated transcript abundances used for plotting scatterplots included in this study. These files were created by the Jupyter notebook C2C12-Nkd_vs_WT_plated_ligand_assay_scatterplotting.ipynb.
6. Stringtie: Transcript abundances computed by Stringtie.
7. Trim_galore: Read trimming and quality assessment.

II. Flow_Cytometry

A. Raw_Data: Each subfolder listed below contains subfolders of .fcs flow cytometry data files, corresponding sample label files (.xls), and corresponding csv files (one per fcs file subfolder). Each of the following dataset folders also contains a single csv files of the same name as the dataset folder--this csv file contains all combined, pre-processed data and is the only file used by 'Processing' Jupyter notebooks for analysis and figure generation. NOTE: for experiments with only ONE fcs file subfolder, the corresponding single .csv file was renamed to match the dataset folder.
1. Dataset1_2021_CHO_reporter_vs_plated_ligand: Plated ligand assay with CHO-K1 cells. Notch receiver cells were plated on titrated concentrations of recombinant ligands.
2. Dataset2_2021_CHO_reporter_vs_trans_ligand: Trans-activation assay with CHO-K1 cells. Notch receiver cells were cocultured with an excess of Tet-OFF sender cells with ligand expression titrated using 4-epi-tetracycline.
3. Dataset3_2021_CHO_reporter_vs_cis_ligand_48well: Cis-modulation assay with CHO-K1 cells. Notch receiver cells coexpressing ligands were cocultured with an excess of stable sender cells of each ligand identity. Cis-ligand expression was titrated in receivers with 4-epi-tetracycline.
  - This dataset has raw data partitioned into multiple subfolders because of sample mislabeling that was fixed in the Jupyter notebook files (Processing_Dataset3_2021_CHO_reporter_vs_cis_ligand_48well-Step1.ipynb and Processing_Dataset3_2021_CHO_reporter_vs_cis_ligand_48well-Step2.ipynb)--see the local README.txt file for more info. Raw data subsets are divided into the following subfolders:
    - Data_subset_A
    - Data_subset_B
    - Data_subset_C_with_some_mislabeled_samples
4. Dataset4_2022_CHO_reporter_vs_cis_ligand_24well: Cis-activation and cis-modulation assay with CHO-K1 cells. Notch receiver cells coexpressing ligands were cocultured with an excess of wild-type cells (no ligand) or stable Dll1 sender cells. Cis-ligand and control protein expression were titrated in receivers with 4-epi-tetracycline. Raw data are divided into two csv files: 1) Dll1, Dll4, Jag1, & Jag2 ('_ligands.csv'), and 2) NGFR & H2B-mCh ('_controls.csv').
5. Dataset5_2022_CHO_reporter_vs_cis_ligand_Fng: Cis-activation assay, cis-modulation assay, and cis- + trans-activation assay with CHO-K1 cells. Notch receiver cells coexpressing ligands were cocultured with an excess of wild-type cells (no ligand) or stable sender cells. Cis-ligand expression was titrated in receivers with 4-epi-tetracycline. Endogenous Fringes were knocked down in receivers, and Lfng or dLfng were added by transient plasmid transfection. Raw data are divided into two csv files: 1) experiments from summer 2022 ('_subsetA'), and 2) experiment from winter 2022 ('_subsetB').
6. Dataset6_2022_CHO_trans_Fng_coculture: Trans-activation assay with CHO-K1 cells. Notch receiver cells were cocultured with an excess of wild-type cells (no ligand) or stable sender cells of each ligand type, with two different ligand expression levels per ligand. Endogenous Fringes were knocked down in receivers, and Lfng or dLfng were added by transient plasmid transfection.
7. Dataset7_2022_CHO_trans_binding_Fng: Soluble ligand binding assay with CHO-K1 cells. Endogenous Fringes were knocked down in receiver cells, and Lfng or dLfng were added by transient plasmid transfection. Notch receiver cells were incubated with recombinant ligands preclustered with dye-conjugated secondary antibodies. Receiver cells' surface Notch levels were measured with anti-Notch antibodies.
8. Dataset8_2023_CHO_cis-act_density_optimization: Cis-activation assay with trans-activation assay control (with CHO-K1 cells). Notch receiver cells coexpressing ligands were cocultured with an equal number of sender cells expressing high levels of ligand, at various cell counts. These receiver and sender cells were surrounded by excess of wild-type cells (no ligand). In the positive control condition, a minority of receivers were surrounded with an excess of the high-ligand senders.
9. Dataset9_2023_CHO_cis-act_control_preinduction_signaling_ruleout: Cis-activation assay with modified pre-culture conditions with CHO-K1 cells. During the ligand preinduction period (prior to the cis-activation assay), receiver cells were culture sparsely or densely to assess contributions of pre-culture signaling to the measured cis-activation assay signal.
10. Dataset10_2023_CHO_reporter_vs_plated_ligand_Fng: Plated ligand assay with CHO-K1 cells. Notch receiver cells were plated on one or two different concentrations of recombinant ligands after endogenous Fringes were knocked down in receiver cells and Lfng or dLfng were added by transient plasmid transfection.
11. Dataset11_2023_C2C12_cis-act_density_optimization: Cis-activation assay with trans-activation assay control (with C2C12-Nkd cells). Notch receiver cells coexpressing ligands were cocultured with an equal number of sender cells expressing high levels of ligand, at various cell counts. These receiver and sender cells were surrounded by excess of wild-type cells (no ligand). In the positive control condition, a minority of receivers were surrounded with an excess of the high-ligand senders.
12. Dataset12_2023_C2C12_reporter_vs_cis_ligand_validation: Cis-activation and cis-modulation assay with C2C12-Nkd cells. Notch receiver cells coexpressing ligands were cocultured with an excess of wild-type cells (no ligand) or stable sender cells. Cis-ligand expression was titrated in receivers with 4-epi-tetracycline.
13. Dataset13_2023_C2C12_trans_Fng_coculture: Trans-activation assay with C2C12-Nkd cells. Notch receiver cells were cocultured with an excess of wild-type cells (no ligand) or stable Dll1 or Jag1 sender cells of each ligand type. Receivers were first treated with control or Rfng siRNAs to knock down endogenous mouse Rfng.
B. Processed_Data: This folder contains subfolders with csv data files and figures produced by Jupyter notebooks having filenames indicating the relevant datasets and starting with the 'Processing' prefix. In addition to one subfolder per Dataset (described above in Raw_Data), there are additional subfolders with processed data from multiple combined datasets (indicated in the subfolder name). Processed file names include the names of the dataset(s) plus a descriptive suffix. Common suffixes are defined as follows (see below section 'FLOW CYTOMETRY CODE AND ANALYSIS PIPELINE' for more data analysis methods info):
- '-receivers_and_controls': Single cell data filtered on cell size and selected for mTurq2 expression (cotranslational with receptor). Controls are the wild-type or parental reporter cells with no receptor added--these are only filtered on size and not on mTurq2.
- '-senders_and_controls.csv': Single cell data filtered on cell size and selected for mCherry expression (cotranslational with ligand). Controls are the wild-type cells with no ligand added--these are only filtered on size and not on mCherry.
- '-HillFit_stats.csv' or '-linreg_stats.csv' or '-loglog_linreg_stats': Mean values and 95% confidence intervals for bootstrapped least-squares curve fitting.
- Common suffix extensions:
  - '_transfected': Cells are labeled in the 'gate' column if they express the cotransfection marker IFP2 above background and below the overexpression cutoff. This suffix might not appear on all IFP-gated datasets, and some datasets include IFP2-gated and non IFP2-gated data (but this info is always present in the 'gate' column of these files).
  - '_mCh_binned': Cells are labeled in the 'gate' column with the bin corresponding to the range of mCh fluorescence they fall into. This suffix might not appear on all mCh-binned datasets, and some datasets include mCh-binned and non mCh-binned data (this info is always present in the 'gate' column of these files).
  - '_bulk': Bulk data, averaged across single cell subpopulations after pooling cells by sample label and bioreplicate label or experiment date. In some cases, additional gates were applied that divide cells into multiple subpopulations within the same sample label (such as mCherry gate binning).
  - '_bsub': Some background signal was subtracted from the signal of interest in the bulk dataset, and the background subtracted value was added as a new column. (Sometimes this suffix may be missing from the filename even if the bsub column exists.)
  - '_yNorm': A normalized version of the y-axis value was added as a new column. Sometimes this suffix may be missing from the filename even if the yNorm column exists. The yNorm column usually ends with e.g., '_control_FC' or '_fringe_FC' for a fold-change ('FC') over some control, for example, when the values are divided by some maximum value. The string preceding 'FC' (e.g., 'control') is the column that was used to identify the control to normalize to.
  - '_xy': The dataframe is restructured into a plotting-convenient, 'untidy' dataframe structure such that the x and y variables for plotting each get their own columns. This restructuring is necessary if x and y variables correspond to different fluorescence gates, for example.
    - usually but not always, 'xy' will be followed by the data category, such as 'fringe' when x is dLFng and y is LFng, for example.
  - '_mean_95pcCIs': statistics from 10,000 bootstrap replicates of the data. May sometimes alternatively be called "_stats".
  - Other suffixes may appear that are not defined here. Individual data processing Jupyter notebooks list all exported files at the top of the notebook. The code shows how each exported csv file was generated.
C. Code: See description of Jupyter notebook files and custom Python modules in the section 'FLOW CYTOMETRY CODE AND ANALYSIS PIPELINE' below.
- All 'Processing' notebooks have a cell at the top of the notebook listing the names and locations of files exported by the notebook.
- Where there are multiple 'Processing' notebooks for a single dataset, run them in the order indicated by the '-Step' suffix (e.g., Step1, Step2, Step3).

III. qRT-PCR

Each of the below qRT-PCR subfolders contains the raw data as Excel spreadsheets, a Jupyter notebook (.ipynb file extension) with the same name as the dataset subfolder, and the figure(s) generated by that Jupyter notebook (.svg file extension).

Dataset1_2021_CHO_Tet-OFF_induction_time-series: CHO-K1 senders with integrated Dll1 or Dll4 ligands driven by the Tet-OFF promoter were seeded at a density that would reach confluence at the time of collection. Seeded senders were induced to express ligand by reducing [4-epi-Tc] in the culture medium, and cells were harvested for RNA isolation at 12-24 hour intervals through a 144 hour time course for tracking ligand transcript dynamics after ligand induction.
Dataset2_2021_CHO_Tet-OFF_density_dependence: CHO-K1 Dll1 or Dll4 sender cells were seeded at high, medium, and low densities and induced to maximal ligand expression by removing 4-epi-Tc from the culture medium. Cells were collected 72 hr later for analysis of ligand transcript levels.
Dataset3_2022_CHO_Lfng_Rfng_knockdown_assessment: CHO-K1 cells were treated with either a negative control siRNA, Rfng siRNA, Lfng siRNA, or Rfng and Lfng siRNAs to assess the efficiency of siRNA knockdown of endogenous Rfng and Lfng.
Dataset4_2023_C2C12_residual_Notch_knockdown_assessment: C2C12-Nkd Notch1 and Notch2 receiver cells were treated with either a negative control siRNA; siRNAs against mNotch1, mNotch2, and mNotch3 (“N1-3”); or siRNAs against all three endogenous mouse receptors and mRfng (“N1-3, Rfng”) to assess the efficiency of siRNA knockdown of these endogenous Notch components.

FLOW CYTOMETRY CODE AND ANALYSIS PIPELINE

Data analysis steps

Flow cytometry data analysis steps are described in the proper sequential order with example commands below. A subset of the processing commands was used to process each of the datasets, depending on the cell types, culture schemes, and treatments used in the experiment that generated each dataset.

Cells were gated in a 2D plane of forward scatter (FSC) and side scatter (SSC) to select intact, singlet cells. [All data.]
```
df = munge_flow.apply_fluorescence_gate(df, flow_params.SSC_FSC_gate, 'SSC_v_FSC')
```
Cells were gated in a 2D plane of mTurq2 (PB450, A.U.) vs. SSC to separate out the +mTurq2 receiver cells from -mTurq2 senders or ‘blank’ parental cells. [All samples with receiver cells.]
```
df_r = munge_flow.apply_fluorescence_gate(df, flow_params.receiver_gate, 'receivers')
```
```
df_s = munge_flow.apply_fluorescence_gate(df, flow_params.sender_gate, 'senders')
```
Plasmid-transfected cells were gated in the APC700 channel to select cells expressing the cotransfection marker IFP2. [Plasmid-transfected samples only.]
```
df_transfected = munge_flow.apply_fluorescence_gate(df_r, flow_params.transfected_gate, 'transfected')
```
Note: df_transfected would be used in all subsequent steps instead of df_r, if applicable.
Receiver cells coexpressing ligand were gated into six consecutive bins of arbitrary mCherry (ECD) fluorescence units. [All samples with receiver cells coexpressing ligands.]
```
df_r = munge_flow.apply_multiple_1D_gates(df_r, 'mCh', flow_params.mCh_gate_bounds, keep_ungated=True)
```
Compensation was applied to subtract mTurq2 signal leaking into the FITC channel. [All samples with receiver cells.]
```
df_r = munge_flow.compensate(df_r, flow_params.mCit_mTurq_comp)
```
If applicable, reporter activity, mCitrine (FITC, A.U.) fluorescence was normalized to co-translational receptor expression by dividing mCitrine by the mTurq2 signal (PB450, A.U.). The resulting mCitrine/mTurq2 ratio is the “signaling activity” (reporter activity per unit receptor). [All samples with receiver cells.]
```
df_r = munge_flow.single_cell_fluor_norm(df_r, numerator_col='mCitrine', denominator_col='mTurq')
```
If applicable, cotranslational cis-ligand expression was normalized to co-translational receptor expression by dividing mCherry by the mTurq2 signal (PB450, A.U.). The resulting mCherry/mTurq2 ratio (cis-ligand expression per unit receptor) controls for slight variations in receptor expression when quantiatively comparing ligands’ cis-inhibition efficiencies (Figure 6). [All samples with receiver cells coexpressing ligands.]
```
df_r = munge_flow.single_cell_fluor_norm(df_r, numerator_col='mCh', denominator_col='mTurq')
```

Average bulk measurements for each sample were obtained by computing the mean signal across single-cell data for a given sample (and mCherry bin, if applicable). Cells treated with different 4-epi-tetracycline (4-epi-Tc) levels were pooled as technical replicates after mCherry binning. A minimum of 100 cells were required during averaging; mCherry bins with too few cells did not generate a bulk data point. [All samples.]

# Define fluorescence values to average in calculation of bulk fluorescence levels.
y_columns = ['mCitrine','mTurq','mCh', 'mCitrine/mTurq']

# Define x_categories: a list of data attribute names and values to separate
# the samples by, prior to averaging across all single cells in those unique samples.
# The composition of x_categories depends on the relevant data attributes for
# each given dataset. Here are two examples:
x_categories = [['date', pd.Series.unique(df_r['date'])],
             ['gate', pd.Series.unique(df_r['gate'])],
            ['sample', pd.Series.unique(df_r['sample'])]] # Without mCh binning
x_categories = [['date', pd.Series.unique(df_r['date'])],
              ['gate', pd.Series.unique(df_r['gate'])],
              ['receptor', pd.Series.unique(df_r['receptor'])],
              ['cis-ligand', pd.Series.unique(df_r['cis-ligand'])],
              ['sender', pd.Series.unique(df_r['sender'])],
              ['biorep', pd.Series.unique(df_r['biorep'])]] # With mCh binning

# Average the specified fluorescence values across single cells for each unique sample.
df_bulk = munge_flow.summarize_multiple_yvals(df_r, y_columns, x_categories,
                                       stats =['median','mean'], min_cell_count=100)

Background subtraction was performed by subtracting “leaky” reporter activity of the receiver (with minimal cis-ligand, if applicable) in coculture with “blank” senders (CHO-K1 wt or C2C12-Nkd parental cells, according to the receiver cell type). You must first add a column to the dataframe that designates the 'background' samples with signal to subtract from the corresponding non-background samples.

# Define fluorescence values to background subtract.
y_columns = ['mCitrine/mTurq_mean',]

# Define x_categories: a list of data attribute names and values to separate
# the samples by, prior to background subtraction. Here, the order of attributes
# in x_categories does matter: the last item in the list must specify
# the column (here, 'control') and label (here, 'bsub') that designates the background samples to subtract. The bsub label must come first in the list.
# The below example is relatively simple - for some datasets, several more data
# attributes should be included in the list.
x_categories = [['receiver', ['Notch1', 'Notch2']],
             ['biorep', [1, 2, 3, 4]],
             ['control', ['bsub', '']]]
# In the above example, including 'biorep' in x_categories ensures that
# background fluorescence is subtracted individually for each biorep. If you
# instead exclude 'biorep' from x_categories, the function below will subtract
# the average value across all bioreps.

# Perform the background subtraction (creates new column).
df_bulk = munge_flow.background_subtract(df_bulk, y_columns, x_categories)

Y-axis normalization was performed as described in each figure caption, but most often by dividing background-subtracted fluorescence values by fluorescence in some condition with 'maximal' signaling activity. The process of y-axis normalization is similar to background subtraction: you must first add a column to the dataframe that designates the samples with the signal you'll divide other samples' signal by.

# Define fluorescence values to normalize fluorescence to.
y_columns = ['mCitrine/mTurq_mean_bsub',]

# Define x_categories: a list of data attribute names and values to separate
# the samples by, prior to y-axis normalization. Here, the order of attributes
# in x_categories does matter: the last item in the list must specify
# the column (here, 'control') and label (here, 'Yeq1') that designates the samples to divide other samples by. The first value in the list of control labels
# must be the control label designating the samples to divide others by.
# The below example is relatively simple - for some datasets, several more data
# attributes should be included in the list.
x_categories = [['receiver', ['Notch1', 'Notch2']],
            ['control', ['Yeq1', 'bsub', '']]]
# When 'biorep' or 'date' are excluded from x_categories, the function below will divide by the average 'Yeq1' fluorescence value across all bioreps (this is the recommended approach for y-axis normalization for most data in this study).

# Perform y-axis normalization (creates new column).
df_bulk = munge_flow.compute_fold_change(df_bulk, y_columns, x_categories)

Jupyter notebooks

Pre-processing steps were performed first, in order. Processing steps were performed next but are largely parallel to each other. Where applicable, sequential processing steps are denoted with 'A' (first) and 'B' (second).

Pre-processing

These notebooks export files to the Raw_Data folders only.

Pre-processing_step1_convert_fcs_to_csv.ipynb: Import the original flow cytometry .fcs files and export .csv files. This script looks for a folder containing .fcs files, and in the same location, an .xls file with a list of sample labels corresponding to the .fcs files. It exports a single .csv file with the data from all .fcs files in the specified folder. FCS file import and sample labeling is slow, and can take multiple days for some datasets. To use this file, you must update paths and folder names.
Pre-processing_step2_combine_csvs_for_each_experiment.ipynb: Combine flow cytometry data from multiple .csv files (if applicable), add a column specifying the date the data were collected, and fix errors in sample labeling() if applicable). The date is either scraped from the filename or mapped from an excel file.

Processing

These notebooks export files to the ProcessedData folders only. The name of each Processing notebook file is 'Processing', follwed by the name of the folder that it exports data files and figures to.