Relative lift
This notebook exemplifies how the library supports absolute and relative effects in MDE analysis and experiment analysis in OLS.
The steps include:
- Create dataframe at customer level
- Run MDE analysis with absolute and relative effects, comparing the two of them
- Run experiment analysis with absolute and relative effects, comparing the two of them
Dataframe creation
import numpy as np
import pandas as pd
from copy import deepcopy
from cluster_experiments import AnalysisPlan, NormalPowerAnalysis
def get_user_df(n_users=10_000):
df = pd.DataFrame(
{
"customer_id": np.arange(n_users),
"orders_pre": np.random.poisson(10, n_users),
"_treatment": np.random.rand(n_users) > 0.5,
"X1": np.random.poisson(1, n_users),
"X2": np.random.poisson(2, n_users),
}
)
df = df.assign(**{"_treatment": df["_treatment"].astype(int), "orders": lambda x: x["orders_pre"] + 2 * x["X1"] + x["X2"] + 0.1 * x["_treatment"]})
df = df.assign(
**{
"center_X1": lambda x: x["X1"] - x["X1"].mean(),
"center_X2": lambda x: x["X2"] - x["X2"].mean(),
}
)
df["_treatment"] = df["_treatment"].map({0: "A", 1: "B"})
return df
user_df = get_user_df()
MDE analysis with absolute and relative effects¶
First we create NormalPowerAnalysis objects for absolute and relative effects
config_relative = {
"analysis": "ols",
"perturbator": "constant",
"splitter": "non_clustered",
"relative_effect": True,
"target_col": "orders",
"covariates": ["X1"]
}
config_vanilla = deepcopy(config_relative)
config_vanilla["relative_effect"] = False
pw_relative = NormalPowerAnalysis.from_dict(config_relative)
pw_vanilla = NormalPowerAnalysis.from_dict(config_vanilla)
Calculate relative MDE
pw_relative.mde(
user_df,
n_simulations=1
)
0.013935217149223833
Calculate absolute MDE, shows different output
pw_vanilla.mde(
user_df,
n_simulations=1
)
0.19537228277979887
When dividing by baseline to get relative MDE, it is slightly lower, because this would ignore the variance in the baseline.
float(
pw_vanilla.mde(
user_df,
n_simulations=1
) / user_df["orders"].mean()
)
0.013868501242931779
Experiment analysis with absolute and relative effects¶
First we create AnalysisPlan objects for absolute and relative effects
relative_plan_config = {
"metrics": [
{"alias": "Orders", "name": "orders"},
],
"variants": [
{"name": "A", "is_control": True},
{"name": "B", "is_control": False},
],
"analysis_type": "ols",
"variant_col": "_treatment",
"analysis_config": {"relative_effect": True, "covariates": ["X1"]},
}
vanilla_plan_config = deepcopy(relative_plan_config)
vanilla_plan_config["analysis_config"] = {"covariates": ["X1"]}
relative_plan = AnalysisPlan.from_metrics_dict(relative_plan_config)
vanilla_plan = AnalysisPlan.from_metrics_dict(vanilla_plan_config)
Now we run the analysis for both plans
results_rel = relative_plan.analyze(user_df)
results_vanilla = vanilla_plan.analyze(user_df)
Results are obviously different, as one is absolute and the other relative.
results_rel.to_dataframe()[["ate", "ate_ci_lower", "ate_ci_upper", "std_error", "p_value"]]
| ate | ate_ci_lower | ate_ci_upper | std_error | p_value | |
|---|---|---|---|---|---|
| 0 | 0.00147 | -0.008271 | 0.011211 | 0.00497 | 0.767389 |
results_vanilla.to_dataframe()[["ate", "ate_ci_lower", "ate_ci_upper", "std_error", "p_value"]]
| ate | ate_ci_lower | ate_ci_upper | std_error | p_value | |
|---|---|---|---|---|---|
| 0 | 0.020644 | -0.116049 | 0.157337 | 0.069743 | 0.767224 |
When dividing by baseline to get relative effect and confidence intervals, the variance in the baseline is ignored, leading to slightly narrower intervals.
results_df = results_vanilla.to_dataframe()
results_df[["ate", "ate_ci_lower", "ate_ci_upper", "std_error"]] /= results_df["control_variant_mean"].squeeze()
results_df[["ate", "ate_ci_lower", "ate_ci_upper", "std_error", "p_value"]]
| ate | ate_ci_lower | ate_ci_upper | std_error | p_value | |
|---|---|---|---|---|---|
| 0 | 0.00147 | -0.008264 | 0.011204 | 0.004966 | 0.767224 |