Xynergy: One stop-shop#
- xynergy.xynergy.xynergy(df: DataFrame, dose_cols: list[str], response_col: str, experiment_cols: list[str] | str | None, response_is_percent: bool, complete_response_is_0: bool, pre_impute_method: str = 'RBFSurface', pre_impute_target: str = 'response', pre_impute_reference_for_target: str = 'bliss', pre_impute_clip_response_bounds: tuple[float, float] | None = (0.0, 100.0), factorization_method: list[str] | str = ['NMF', 'SVD', 'RPCA'], synergy_method: list[str] | str = ['bliss', 'hsa', 'loewe', 'zip'], use_single_drug_response_data: bool = True, post_impute_tuning: str = 'Predefined', log: str = 'all') DataFrame#
Impute missing values and calculate synergy from a (or several) dose-response matrices.
Parameters#
- df: polars.DataFrame
Contains, minimally, one response and two agent doses per row.
- dose_cols: list
A list of exactly two columns names that contain untransformed numeric values of agent dose
- response_col: string
The name of the column that contains response data. Can be multiple columns, though they will be unpivoted to a single column.
- experiment_cols: list[str] or string, optional
The names of columns that should be used to distinguish one dose pair’s response from another. If none are supplied, two rows with the same doses will be considered replicates. One common application of this might be to provide columns containing the names of drugs used.
- response_is_percent: bool
Is the response a percentage (ranges from 0-100) or is it a probability/ratio (ranges from 0-1)?
- complete_response_is_0: bool
Is the response reported as (for instance) survival, where a complete response would be 0? Or is it something like (again, for instance) killing, where a complete response would be 1 (in the case of
response_is_percent = False) or 100 (in the case ofreponse_is_percent = True)- pre_impute_method: string, default “RBFSurface”
“RBFSurface” (recommended): RBF interpolation of Bliss residuals in log-dose space. Exploits pharmacological smoothness of dose-response surfaces. Very fast and generally the most accurate method.
“GaussianProcessSurface”: Gaussian-process regression in log-dose space. Slower than RBFSurface, but a strong non-parametric surface benchmark.
“MatrixCompletion”: Iterative rank-truncated SVD that exploits the low-rank structure of dose-response matrices.
“XGBR” (slowest, most accurate of the tabular methods),
“RandomForest” (roughly medium speed and accuracy),
“LassoCV” (fast, poor accuracy. Not recommended.),
Otherwise, default sklearn IterativeImputer (fastest, sometimes better accuracy than LassoCV)
- pre_impute_target: string, default “response”
Passed through to
pre_impute. Options are “response”, “combo_effect”, and “ensemble”.- pre_impute_reference_for_target: string, default “bliss”
Passed through to
pre_imputewhenpre_impute_target in ["combo_effect", "ensemble"]. Options currently include “bliss” and “hsa”.- pre_impute_clip_response_bounds: tuple[float, float] | None, default (0.0, 100.0)
Bounds applied by
pre_imputeto reconstructed responses. Set toNoneto disable clipping.- factorization_method: list[str] or str, default [“NMF”, “SVD”, “RPCA”]
The method(s) used for matrix factorization. Options include NMF, SVD, PMF, and RPCA
- synergy_method: list[str] or str, default [“bliss”, “hsa”, “loewe”, “zip”]
The method used for calculating synergy.
- use_single_drug_response_data: bool, default True
Some methods - like RandomForest - perform better when the dataset contains columns with the responses of, say, ‘drug A’ at ‘dose_a’ (no combination). If this parameter is
True, automatically calculate this value and include it as data to be used for imputation. You might set this asFalseif you want to include your own data, for instance - in which case you would add the name of those columns toadditional_imputation_cols. In general, this step can only help and is relatively quick.- post_impute_tuning: string, default “Predefined”
Strategy for tuning XGBoost hyperparameters during post-imputation.
“Predefined”: Use fixed hyperparameters (very fast).
“RandomizedSearchCV”: Random subset search (moderate speed).
“GridSearchCV”: Exhaustive grid search (slowest).
- log: string, default “all”
Verbosity of function. Options include “all”, “warn”, and “none”.
If “all”, will emit notes and warnings.
If “warn”, will emit only warnings.
If “none”, will not emit anything (except errors)
Returns#
- polars.DataFrame
Data will be ‘tidy’, with each row containing a single response. Additionally, the following columns will be appended:
resp_imputedcolumn (plus[dose_cols]_respifuse_single_drug_response_data = True). Contains response imputed bypre_impute_methodresponse_[factorization_method]. Contain the supplied response values approximated by indicated method[synergy_method]_syn. Contain the synergy score, where positive numbers indicate synergy and negative numbers indicate antagonism
Additionally, the missing
response_colvalues are imputed. Will be modified (if necessary) to be ‘% inhibition style’ (0 = no inhibition, 100 = complete inhibition).
Notes#
This function is essentially several functions in a trenchcoat: This function runs
tidy,pre_impute,matrix_factorize,post_impute, andadd_synergyin sequence. These functions can be called individually if you want a bit more control or to do something between each stepFor additional information, see the documentation of the individual functions.