############### Using Xynergy ############### ******************* What Xynergy does ******************* Xynergy allows you to calculate drug synergy from dose-response matrices with minimal data. It does this by imputing the missing data, then calculating synergy as normal. .. list-table:: - - .. figure:: ./_static/minimal_matrix.png :width: 200px The data you have - .. figure:: ./_static/full_matrix.png :width: 200px The data you want - .. figure:: ./_static/imputed_matrix.png :width: 200px The data Xynergy imputes *********************** Example data overview *********************** Xynergy bundles a small real workbook example at `xynergy/example_data/data.xlsx`. The helper below reads that workbook, renames the columns to Xynergy's canonical names, and converts the worksheet's viability-style percentages into inhibition-style response percentages. .. code:: python from xynergy.example import load_example_data example_data = load_example_data() `example_data` contains a single drug pair measured at six dose levels. Only the off-axis single-agent doses plus the dose-matched diagonal are observed, so the data still have the sparse "minimum combination data" shape shown in the left figure above. The bundled workbook has: - One experiment - One drug pair: Venetoclax + Volasertib - Six dose levels on each axis - Sixteen observed combinations, which expand to a 6 x 6 matrix after tidying The code snippets below now use this bundled workbook example. Some printed output tables later in the page are illustrative summaries from older runs and may not match the workbook values exactly. If you want to work with the raw workbook columns instead, use `load_example_data(raw=True)`. ************** Tidying data ************** The workbook example already has one response per row, so `tidy` mainly does three things for us here: - Checks the input columns - Creates `experiment_id` - Expands the sparse axis-plus-diagonal observations into the full matrix, filling missing combinations with `null` Because `load_example_data()` already returns the preferred column names `dose_a`, `dose_b`, and `response`, we only need to specify the columns that identify an experiment: .. code:: python from xynergy.tidy import tidy clean_data = tidy( example_data, experiment_cols=[ "experiment_source_id", "line", "drug_a", "drug_b", "pair_index", ], ) For the bundled workbook, `tidy` expands 16 observed rows to 36 matrix rows, adding 20 `null` placeholders for the unmeasured interior combinations. If you do not care about preserving the metadata columns, the bundled example also works as a single-experiment dataset: .. code:: python tidy(example_data) Tidying is just the first step, though. Our next step is to pre-impute the missing values. **************** Pre-imputation **************** Let's start simple. We'll use the iterative imputer from `scikit-learn `_ to start. Don't worry about the `use_single_drug_response_data` argument yet. .. code:: python from xynergy.impute import pre_impute, post_impute imputed = pre_impute( clean_data, method="IterativeImputer", use_single_drug_response_data=False, experiment_cols="experiment_id", dose_cols=["dose_a", "dose_b"], response_col="response", ) Actually, since we ran our data through `tidy`, our column names have been set to be the defaults of downstream functions - so as a bonus we can omit many of these arguments: .. code:: python pre_impute( clean_data, method="IterativeImputer", use_single_drug_response_data=False, ) .. code:: ┌───────────────┬────────┬────────┬──────────┬──────────────┐ │ experiment_id ┆ dose_a ┆ dose_b ┆ response ┆ resp_imputed │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u32 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞═══════════════╪════════╪════════╪══════════╪══════════════╡ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │ │ … ┆ … ┆ … ┆ … ┆ … │ │ 2 ┆ 1000.0 ┆ 3.16 ┆ null ┆ 75.86 │ │ 2 ┆ 1000.0 ┆ 56.23 ┆ null ┆ 78.35 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 99.99 │ └───────────────┴────────┴────────┴──────────┴──────────────┘ .. NOTE:: I've removed the experimental columns - `line`, `name_a`, `name_b`, and `replicate` - to ensure all the data fit easily on the screen. However, they would normally be returned. You'll notice a new column, `resp_imputed`, with values that are `null` in the same row of the `response` column. These are our (pre-)imputed values! For some algorithms, we can improve imputation accuracy by including columns that contain the response we would get if we JUST added `drug_a` or `drug_b`, for instance (refer to the figure below). .. figure:: ./_static/uncombined_drugs.png Intuition of `use_single_drug_response_data` We can do that automatically by setting `use_single_drug_response_data=True` (the default, so we'll just forgo setting it): .. code:: python pre_impute( clean_data, method="IterativeImputer", ) .. code:: ┌───────────────┬────────┬────────┬──────────┬─────────────┬─────────────┬──────────────┐ │ experiment_id ┆ dose_a ┆ dose_b ┆ response ┆ dose_a_resp ┆ dose_b_resp ┆ resp_imputed │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u32 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞═══════════════╪════════╪════════╪══════════╪═════════════╪═════════════╪══════════════╡ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 2 ┆ 1000.0 ┆ 3.16 ┆ null ┆ 98.92 ┆ 23.23 ┆ 83.92 │ │ 2 ┆ 1000.0 ┆ 56.23 ┆ null ┆ 98.92 ┆ 85.1 ┆ 123.07 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ └───────────────┴────────┴────────┴──────────┴─────────────┴─────────────┴──────────────┘ These results are *ok*, and they certainly were quick to get. However, if we have the time we can significantly improve our accuracy of imputation using `XGBoost `_ regression: .. code:: python imputed = pre_impute(clean_data, method="XGBR") imputed .. code:: ┌───────────────┬────────┬────────┬──────────┬─────────────┬─────────────┬──────────────┐ │ experiment_id ┆ dose_a ┆ dose_b ┆ response ┆ dose_a_resp ┆ dose_b_resp ┆ resp_imputed │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u32 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞═══════════════╪════════╪════════╪══════════╪═════════════╪═════════════╪══════════════╡ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 2 ┆ 1000.0 ┆ 3.16 ┆ null ┆ 98.92 ┆ 23.23 ┆ 97.64 │ │ 2 ┆ 1000.0 ┆ 56.23 ┆ null ┆ 98.92 ┆ 85.1 ┆ 97.81 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ │ 2 ┆ 1000.0 ┆ 1000.0 ┆ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 │ └───────────────┴────────┴────────┴──────────┴─────────────┴─────────────┴──────────────┘ We note that the `IterativeImputer` response returns above 100 (a telltale sign that something has gone awry), while this is not the case with the `XGBoost` imputation. Due to its increased accuracy, it's generally recommended to use the `XGBR` option when possible. *************** Factorization *************** Now that we've pre-imputed a full matrix, we can feed this matrix to matrix factorization algorithms. There are several algorithms available to us - NMF, SVD, RPCA, and PMF - and we can pick and choose which ones we want. For this example, let's use NMF and SVD. Xynergy makes this relatively painless: .. code:: python from xynergy.factor import matrix_factorize factored = matrix_factorize( imputed, method=["SVD", "NMF"], dose_cols=["dose_a", "dose_b"], response_col="resp_imputed", experiment_cols="experiment_id", ) factored .. code:: ┌──────────┬─────────────┬─────────────┬──────────────┬───┬────────┬────────┬──────────────────┬──────────────────┐ │ response ┆ dose_a_resp ┆ dose_b_resp ┆ resp_imputed ┆ … ┆ dose_a ┆ dose_b ┆ resp_imputed_NMF ┆ resp_imputed_SVD │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞══════════╪═════════════╪═════════════╪══════════════╪═══╪════════╪════════╪══════════════════╪══════════════════╡ │ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.01 ┆ -0.04 │ │ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.01 ┆ -0.04 │ │ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.01 ┆ -0.04 │ │ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.01 ┆ -0.04 │ │ 0.0 ┆ 0.04 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.01 ┆ -0.04 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ null ┆ 98.92 ┆ 23.23 ┆ 97.64 ┆ … ┆ 1000.0 ┆ 3.16 ┆ 97.07 ┆ 97.63 │ │ null ┆ 98.92 ┆ 85.1 ┆ 97.81 ┆ … ┆ 1000.0 ┆ 56.23 ┆ 98.88 ┆ 98.32 │ │ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 ┆ … ┆ 1000.0 ┆ 1000.0 ┆ 98.88 ┆ 99.98 │ │ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 ┆ … ┆ 1000.0 ┆ 1000.0 ┆ 98.88 ┆ 99.98 │ │ 99.99 ┆ 98.92 ┆ 99.08 ┆ 99.99 ┆ … ┆ 1000.0 ┆ 1000.0 ┆ 98.88 ┆ 99.98 │ └──────────┴─────────────┴─────────────┴──────────────┴───┴────────┴────────┴──────────────────┴──────────────────┘ .. NOTE:: Like in the case of `pre_impute`, since we are using the default column names from `tidy`, the default argument names for `dose_cols`, `response_col`, and `experiment_cols` are the same as the one we provided, so this is equivalent: .. code:: python factored = matrix_factorize(imputed, method=["SVD", "NMF"]) Like `pre_impute`, `matrix_factorize` appends new columns to the input - in this case, one for each method. ***************** Post-imputation ***************** Finally, we can use the resultant columns from `matrix_factorize` to predict a final response column: .. code:: python final = post_impute(factored) final .. NOTE:: By default, this function uses any columns that start with `resp_imputed_` to impute the final response, but the columns used - or the prefix searched for - can be manually set. As is the pattern with Xynergy, the default outputs of the previous function are the default inputs of this function, so in this case we don't need to set anything. .. code:: ┌───────────┬─────────────┬─────────────┬──────────────┬───┬────────┬────────┬──────────────────┬──────────────────┐ │ response ┆ dose_a_resp ┆ dose_b_resp ┆ resp_imputed ┆ … ┆ dose_a ┆ dose_b ┆ resp_imputed_NMF ┆ resp_imputed_SVD │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ f32 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞═══════════╪═════════════╪═════════════╪══════════════╪═══╪════════╪════════╪══════════════════╪══════════════════╡ │ 0.0 ┆ 0.11 ┆ 0.04 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.0 │ │ 0.0 ┆ 0.11 ┆ 0.04 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.0 │ │ 0.0 ┆ 0.11 ┆ 0.04 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.0 │ │ 0.0 ┆ 0.11 ┆ 0.04 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.0 │ │ 0.0 ┆ 0.11 ┆ 0.04 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 98.099998 ┆ 85.73 ┆ 98.94 ┆ 98.61 ┆ … ┆ 56.23 ┆ 1000.0 ┆ 99.02 ┆ 98.97 │ │ 98.559998 ┆ 99.01 ┆ 0.1 ┆ 97.71 ┆ … ┆ 1000.0 ┆ 0.01 ┆ 101.08 ┆ 98.45 │ │ 98.580002 ┆ 99.01 ┆ 1.71 ┆ 97.71 ┆ … ┆ 1000.0 ┆ 0.18 ┆ 100.37 ┆ 98.35 │ │ 98.010002 ┆ 99.01 ┆ 23.3 ┆ 97.71 ┆ … ┆ 1000.0 ┆ 3.16 ┆ 94.99 ┆ 97.6 │ │ 98.019997 ┆ 99.01 ┆ 84.16 ┆ 97.81 ┆ … ┆ 1000.0 ┆ 56.23 ┆ 98.73 ┆ 98.2 │ └───────────┴─────────────┴─────────────┴──────────────┴───┴────────┴────────┴──────────────────┴──────────────────┘ Note that unlike most other functions in `Xynergy`, this does not add an additional column, but modified a previously existing column (here `response`). Whereas before our `response` column had `null` values in it, those values have been imputed! ********** Accuracy ********** Normally you won't have the luxury of knowing how close the responses are to ground truth. In this particular case, I removed values from the original dataset before imputing them, so we can compare the original values to the imputed values to see how far off we were .. code:: python # Collapse replicates to a single value for every dose-pair by taking the mean final_summary = final.group_by( ["experiment_id", "dose_a", "dose_b"], ).agg( pl.col("response").mean(), ) # Join it with the original data that doesn't have any missing values with_og = og_data_summary.join(final_summary, on=["dose_a", "dose_b", "experiment_id"]) # As a simple summary, we can take the root mean squared error: # "response" is our prediction, "resp" is the original data rmse = ((with_og["response"] - with_og["resp"]) ** 2).mean() ** 0.5 rmse .. code:: 1.1592251324998666 Pretty good! We can get a better picture - literally - of how predicted and actual responses vary by using a built-in plotting helper function ********** Plotting ********** .. code:: python from xynergy.plot import plot_response_landscape for i, group in with_og.group_by("experiment_id"): plot_response_landscape( df=group, dose_cols=["dose_a", "dose_b"], response_col="response", reference_col="resp", scheme="redblue", color_min=-20, color_mid=0, color_max=20, ).save(f"{i[0]}_vs-og.png", ppi=800) `plot_reponse_landscape` plots the dose-response matrix, with doses as the axis and response as the color. If a `reference_col` argument is supplied, that column is subtracted from the `response_col`. So in this case, we're plotting how far off our predicted values were from the original values. .. list-table:: Deviation of imputed response from true response - - .. figure:: ./_static/1_vs-og.png :width: 350px - .. figure:: ./_static/2_vs-og.png :width: 350px We can see from these plots that Xynergy did a pretty good job estimating the original dataset - almost everything looks gray, implying the difference between imputed response and actual response is near 0. This function can also be useful for plotting differences from a given synergy model. Let's talk about that now. ********* Synergy ********* Simplistically, synergy is when a combination of drugs acts with greater effect combined 'than expected' from their individual effects. Usually, synergy models define what they would expect a no-synergy case (the null or reference), and then see how much the observed deviations vary from that. Xynergy provides several synergy models - Bliss independence, highest single agent (HSA), Loewe additivity, and zero interaction potency (ZIP). Generally, the best way to calculate synergy is with the `add_synergy` function: .. code:: python from xynergy.synergy import add_synergy with_synergy = add_synergy( final, ["dose_a", "dose_b"], "response", "experiment_id", ["bliss", "zip"] ) # Too many columns to show comfortably here - here are the important ones with_synergy[:, [8, 4, 9, 10, 13, 14]] .. code:: ┌───────────────┬───────────┬────────┬────────┬───────────┬─────────┐ │ experiment_id ┆ response ┆ dose_a ┆ dose_b ┆ bliss_syn ┆ zip_syn │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u32 ┆ f32 ┆ f64 ┆ f64 ┆ f32 ┆ f64 │ ╞═══════════════╪═══════════╪════════╪════════╪═══════════╪═════════╡ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.02 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.02 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.02 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.02 │ │ 1 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ -0.02 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 1 ┆ 99.989998 ┆ 1000.0 ┆ 1000.0 ┆ -0.0 ┆ -0.01 │ │ 1 ┆ 99.989998 ┆ 1000.0 ┆ 1000.0 ┆ 0.0 ┆ -0.01 │ │ 2 ┆ 99.989998 ┆ 1000.0 ┆ 1000.0 ┆ -0.0 ┆ -0.0 │ │ 2 ┆ 99.989998 ┆ 1000.0 ┆ 1000.0 ┆ -0.0 ┆ -0.0 │ │ 2 ┆ 99.989998 ┆ 1000.0 ┆ 1000.0 ┆ 0.0 ┆ -0.0 │ └───────────────┴───────────┴────────┴────────┴───────────┴─────────┘ We notice that a lot of these values are close to 0, implying that our combinations exhibit very little synergy (or antagonism). It's much easier to see these when plotting - let's plot the deviation from the zero-interaction potency reference model (`zip_syn`): .. code:: python for i, group in with_synergy.group_by("experiment_id"): plot_response_landscape( group, ["dose_a", "dose_b"], response_col="zip_syn", scheme="redblue", color_min=-20, color_mid=0, color_max=20, ).save(f"{i[0]}_zip.png", ppi=800) .. list-table:: ZIP synergy landscape for each experiment - - .. figure:: ./_static/1_zip.png :width: 350px - .. figure:: ./_static/2_zip.png :width: 350px You'll note here that pretty much everything looks gray. There's a good reason for this: I simulated these data to have 0 synergy under the Bliss independence model, and ZIP and Bliss are very similar. A smarter person would have created an example with an exciting synergy and antagonism that could be revealed through this process, but I am not that person. .. DANGER:: You may want to calculate just the reference model landscape for a given experiment. Xynergy *will* enable you to do this with `add_reference`, but prefer `add_synergy` if that's what you really want. For most *but not all* synergy models, subtracting the reference model from the observed responses will give you the same synergy score. However, ZIP does **not** do this and if you attempt to calculate synergy scores like this, you will get the wrong value.