Xynergy allows you to calculate drug synergy from dose-response matrices
with minimal data. It does this by imputing the missing data, then
calculating synergy as normal.
Xynergy bundles a small real workbook example at
xynergy/example_data/data.xlsx. The helper below reads that workbook,
renames the columns to Xynergy’s canonical names, and converts the
worksheet’s viability-style percentages into inhibition-style response
percentages.
example_data contains a single drug pair measured at six dose levels.
Only the off-axis single-agent doses plus the dose-matched diagonal are
observed, so the data still have the sparse “minimum combination data”
shape shown in the left figure above.
The bundled workbook has:
One experiment
One drug pair: Venetoclax + Volasertib
Six dose levels on each axis
Sixteen observed combinations, which expand to a 6 x 6 matrix after tidying
The code snippets below now use this bundled workbook example. Some
printed output tables later in the page are illustrative summaries from
older runs and may not match the workbook values exactly.
If you want to work with the raw workbook columns instead, use
load_example_data(raw=True).
The workbook example already has one response per row, so tidy mainly
does three things for us here:
Checks the input columns
Creates experiment_id
Expands the sparse axis-plus-diagonal observations into the full matrix,
filling missing combinations with null
Because load_example_data() already returns the preferred column names
dose_a, dose_b, and response, we only need to specify the columns
that identify an experiment:
Actually, since we ran our data through tidy, our column names have
been set to be the defaults of downstream functions - so as a bonus we
can omit many of these arguments:
I’ve removed the experimental columns - line, name_a, name_b,
and replicate - to ensure all the data fit easily on the screen.
However, they would normally be returned.
You’ll notice a new column, resp_imputed, with values that are null
in the same row of the response column. These are our (pre-)imputed
values!
For some algorithms, we can improve imputation accuracy by including
columns that contain the response we would get if we JUST added drug_a
or drug_b, for instance (refer to the figure below).
These results are ok, and they certainly were quick to get. However,
if we have the time we can significantly improve our accuracy of
imputation using XGBoost regression:
We note that the IterativeImputer response returns above 100 (a
telltale sign that something has gone awry), while this is not the case
with the XGBoost imputation. Due to its increased accuracy, it’s
generally recommended to use the XGBR option when possible.
Now that we’ve pre-imputed a full matrix, we can feed this matrix to
matrix factorization algorithms. There are several algorithms available
to us - NMF, SVD, RPCA, and PMF - and we can pick and choose which ones
we want. For this example, let’s use NMF and SVD. Xynergy makes this
relatively painless:
Like in the case of pre_impute, since we are using the default
column names from tidy, the default argument names for dose_cols,
response_col, and experiment_cols are the same as the one we
provided, so this is equivalent:
Finally, we can use the resultant columns from matrix_factorize to
predict a final response column:
final=post_impute(factored)final
Note
By default, this function uses any columns that start with
resp_imputed_ to impute the final response, but the columns used -
or the prefix searched for - can be manually set. As is the pattern
with Xynergy, the default outputs of the previous function are the
default inputs of this function, so in this case we don’t need to set
anything.
Note that unlike most other functions in Xynergy, this does not add an
additional column, but modified a previously existing column (here
response). Whereas before our response column had null values in
it, those values have been imputed!
Normally you won’t have the luxury of knowing how close the responses
are to ground truth. In this particular case, I removed values from the
original dataset before imputing them, so we can compare the original
values to the imputed values to see how far off we were
# Collapse replicates to a single value for every dose-pair by taking the meanfinal_summary=final.group_by(["experiment_id","dose_a","dose_b"],).agg(pl.col("response").mean(),)# Join it with the original data that doesn't have any missing valueswith_og=og_data_summary.join(final_summary,on=["dose_a","dose_b","experiment_id"])# As a simple summary, we can take the root mean squared error:# "response" is our prediction, "resp" is the original datarmse=((with_og["response"]-with_og["resp"])**2).mean()**0.5rmse
1.1592251324998666
Pretty good! We can get a better picture - literally - of how predicted
and actual responses vary by using a built-in plotting helper function
plot_reponse_landscape plots the dose-response matrix, with doses as
the axis and response as the color. If a reference_col argument is
supplied, that column is subtracted from the response_col. So in this
case, we’re plotting how far off our predicted values were from the
original values.
We can see from these plots that Xynergy did a pretty good job
estimating the original dataset - almost everything looks gray, implying
the difference between imputed response and actual response is near 0.
This function can also be useful for plotting differences from a given
synergy model. Let’s talk about that now.
Simplistically, synergy is when a combination of drugs acts with greater
effect combined ‘than expected’ from their individual effects. Usually,
synergy models define what they would expect a no-synergy case (the null
or reference), and then see how much the observed deviations vary from
that.
Xynergy provides several synergy models - Bliss independence, highest
single agent (HSA), Loewe additivity, and zero interaction potency
(ZIP). Generally, the best way to calculate synergy is with the
add_synergy function:
fromxynergy.synergyimportadd_synergywith_synergy=add_synergy(final,["dose_a","dose_b"],"response","experiment_id",["bliss","zip"])# Too many columns to show comfortably here - here are the important oneswith_synergy[:,[8,4,9,10,13,14]]
We notice that a lot of these values are close to 0, implying that our
combinations exhibit very little synergy (or antagonism). It’s much
easier to see these when plotting - let’s plot the deviation from the
zero-interaction potency reference model (zip_syn):
You’ll note here that pretty much everything looks gray. There’s a good
reason for this: I simulated these data to have 0 synergy under the
Bliss independence model, and ZIP and Bliss are very similar. A smarter
person would have created an example with an exciting synergy and
antagonism that could be revealed through this process, but I am not
that person.
Danger
You may want to calculate just the reference model landscape for a
given experiment. Xynergy will enable you to do this with
add_reference, but prefer add_synergy if that’s what you really
want. For most but not all synergy models, subtracting the
reference model from the observed responses will give you the same
synergy score. However, ZIP does not do this and if you attempt
to calculate synergy scores like this, you will get the wrong value.