Experiment Analysis¶
This notebook demonstrates how to analyze the results of an experiment and get a useful scorecard as a summary of the analysis.
The experiment is an A/B/C test where we randomise users of a food delivery app to test a new ranking algorithm, and we compare the monetary value and the delivery time of the orders created.
First of all we import the necessary libraries and generate some fake data about our experiment.
import pandas as pd
import numpy as np
from cluster_experiments import (
AnalysisPlan,
SimpleMetric,
Dimension,
Variant,
HypothesisTest,
TargetAggregation,
)
def generate_fake_data():
# Constants
NUM_ORDERS = 10_000
NUM_CUSTOMERS = 3_000
EXPERIMENT_GROUPS = ['control', 'treatment_1', 'treatment_2']
GROUP_SIZE = NUM_CUSTOMERS // len(EXPERIMENT_GROUPS)
# Generate customer_ids
customer_ids = np.arange(1, NUM_CUSTOMERS + 1)
# Shuffle and split customer_ids into experiment groups
np.random.shuffle(customer_ids)
experiment_group = np.repeat(EXPERIMENT_GROUPS, GROUP_SIZE)
experiment_group = np.concatenate((experiment_group, np.random.choice(EXPERIMENT_GROUPS, NUM_CUSTOMERS - len(experiment_group))))
# Assign customers to groups
customer_group_mapping = dict(zip(customer_ids, experiment_group))
# Generate orders
order_ids = np.arange(1, NUM_ORDERS + 1)
customers = np.random.choice(customer_ids, NUM_ORDERS)
order_values = np.abs(np.random.normal(loc=10, scale=2, size=NUM_ORDERS)) # Normally distributed around 10 and positive
order_delivery_times = np.abs(np.random.normal(loc=30, scale=5, size=NUM_ORDERS)) # Normally distributed around 30 minutes and positive
order_city_codes = np.random.randint(1, 3, NUM_ORDERS) # Random city codes between 1 and 2
# Create DataFrame
data = {
'order_id': order_ids,
'customer_id': customers,
'experiment_group': [customer_group_mapping[customer_id] for customer_id in customers],
'order_value': order_values,
'order_delivery_time_in_minutes': order_delivery_times,
'order_city_code': order_city_codes
}
df = pd.DataFrame(data)
df.order_city_code = df.order_city_code.astype(str)
pre_exp_df = df.assign(
order_value = lambda df: df['order_value'] + np.random.normal(loc=0, scale=1, size=NUM_ORDERS),
order_delivery_time_in_minutes = lambda df: df['order_delivery_time_in_minutes'] + np.random.normal(loc=0, scale=2, size=NUM_ORDERS)
).sample(int(NUM_ORDERS/3))
return df, pre_exp_df
df, pre_exp_df = generate_fake_data()
# Show the first few rows of the DataFrame
display(df.head())
display(pre_exp_df.head())
| order_id | customer_id | experiment_group | order_value | order_delivery_time_in_minutes | order_city_code | |
|---|---|---|---|---|---|---|
| 0 | 1 | 753 | treatment_1 | 8.044906 | 26.518285 | 1 |
| 1 | 2 | 1477 | treatment_1 | 12.340373 | 35.784601 | 1 |
| 2 | 3 | 2257 | treatment_2 | 8.976782 | 28.306930 | 2 |
| 3 | 4 | 2931 | treatment_2 | 8.718898 | 33.730940 | 2 |
| 4 | 5 | 1550 | treatment_2 | 11.877033 | 33.375767 | 2 |
| order_id | customer_id | experiment_group | order_value | order_delivery_time_in_minutes | order_city_code | |
|---|---|---|---|---|---|---|
| 3380 | 3381 | 504 | control | 13.536650 | 23.508819 | 1 |
| 3828 | 3829 | 2453 | treatment_2 | 6.571081 | 31.107911 | 1 |
| 2227 | 2228 | 1181 | treatment_1 | 10.060709 | 33.062087 | 2 |
| 820 | 821 | 860 | control | 9.676720 | 21.155648 | 2 |
| 4593 | 4594 | 2440 | treatment_2 | 11.923139 | 34.952969 | 2 |
Now that we have a sample experimental dataset and also a pre-experimental dataset that we can use to showcase how to include cupac-style variance reduction in the analysis flow, we can proceed to define the building blocks of the analysis plan: metrics, dimensions and variants first.
Metrics:
- AOV (Average Order Value)
- AVG DT (Average Delivery Time)
Dimensions:
- order_city_code
Variants:
- control
- treatment_1
- treatment_2
dimension__city_code = Dimension(
name='order_city_code',
values=['1','2']
)
metric__order_value = SimpleMetric(
alias='AOV',
name='order_value'
)
metric__delivery_time = SimpleMetric(
alias='AVG DT',
name='order_delivery_time_in_minutes'
)
variants = [
Variant('control', is_control=True),
Variant('treatment_1', is_control=False),
Variant('treatment_2', is_control=False)
]
Now we can define the hypothesis tests that we want to run on the data. We will run two tests:
- A clustered OLS test for the order value:
- no variance reduction
- slice results by the city code of the orders
- A GEE test for the delivery time:
- with variance reduction using cupac (target aggregation)
- no slicing
As you can see, each hypothesis test can be flexible enough to have its own logic.
test__order_value = HypothesisTest(
metric=metric__order_value,
analysis_type="clustered_ols",
analysis_config={"cluster_cols":["customer_id"]},
dimensions=[dimension__city_code]
)
cupac__model = TargetAggregation(agg_col="customer_id", target_col="order_delivery_time_in_minutes")
test__delivery_time = HypothesisTest(
metric=metric__delivery_time,
analysis_type="gee",
analysis_config={"cluster_cols":["customer_id"], "covariates":["estimate_order_delivery_time_in_minutes"]},
cupac_config={"cupac_model":cupac__model,
"target_col":"order_delivery_time_in_minutes"}
)
Finally, we can define the analysis plan where we pack and run all the tests on the data. The results will be displayed in a DataFrame.
Note that all tests included in a single analysis plan must run on the same exact dataset (or datasets, in case the pre-experimental data is provided and used). Should there be the need to use different datasets, the user must create separate analysis plans for each dataset.
analysis_plan = AnalysisPlan(
tests=[test__order_value, test__delivery_time],
variants=variants,
variant_col='experiment_group',
alpha=0.01
)
results = analysis_plan.analyze(exp_data=df, pre_exp_data=pre_exp_df)
results_df = results.to_dataframe()
display(results_df)
# Summary of the analysis plan results (table + key stats)
print(results.summary())
| metric_alias | control_variant_name | treatment_variant_name | control_variant_mean | treatment_variant_mean | analysis_type | ate | ate_ci_lower | ate_ci_upper | p_value | std_error | dimension_name | dimension_value | alpha | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AOV | control | treatment_1 | 10.014383 | 10.017749 | clustered_ols | 0.003367 | -0.128481 | 0.135214 | 0.947560 | 0.051186 | __total_dimension | total | 0.01 |
| 1 | AOV | control | treatment_1 | 10.040822 | 10.008632 | clustered_ols | -0.032189 | -0.216810 | 0.152432 | 0.653358 | 0.071674 | order_city_code | 1 | 0.01 |
| 2 | AOV | control | treatment_1 | 9.987747 | 10.026601 | clustered_ols | 0.038854 | -0.141662 | 0.219370 | 0.579297 | 0.070081 | order_city_code | 2 | 0.01 |
| 3 | AOV | control | treatment_2 | 10.014383 | 9.966632 | clustered_ols | -0.047751 | -0.179076 | 0.083574 | 0.348965 | 0.050984 | __total_dimension | total | 0.01 |
| 4 | AOV | control | treatment_2 | 10.040822 | 9.959410 | clustered_ols | -0.081412 | -0.267607 | 0.104784 | 0.260059 | 0.072286 | order_city_code | 1 | 0.01 |
| 5 | AOV | control | treatment_2 | 9.987747 | 9.974018 | clustered_ols | -0.013729 | -0.191069 | 0.163611 | 0.841937 | 0.068848 | order_city_code | 2 | 0.01 |
| 6 | AVG DT | control | treatment_1 | 30.010740 | 29.890785 | gee | -0.132541 | -0.411551 | 0.146469 | 0.221096 | 0.108319 | __total_dimension | total | 0.01 |
| 7 | AVG DT | control | treatment_2 | 30.010740 | 29.942641 | gee | -0.020008 | -0.292616 | 0.252601 | 0.850055 | 0.105833 | __total_dimension | total | 0.01 |
Analysis plan results
Number of tests: 8
metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha
AOV control treatment_1 10.014383 10.017749 clustered_ols 0.003367 -0.128481 0.135214 0.947560 0.051186 __total_dimension total 0.01
AOV control treatment_1 10.040822 10.008632 clustered_ols -0.032189 -0.216810 0.152432 0.653358 0.071674 order_city_code 1 0.01
AOV control treatment_1 9.987747 10.026601 clustered_ols 0.038854 -0.141662 0.219370 0.579297 0.070081 order_city_code 2 0.01
AOV control treatment_2 10.014383 9.966632 clustered_ols -0.047751 -0.179076 0.083574 0.348965 0.050984 __total_dimension total 0.01
AOV control treatment_2 10.040822 9.959410 clustered_ols -0.081412 -0.267607 0.104784 0.260059 0.072286 order_city_code 1 0.01
AOV control treatment_2 9.987747 9.974018 clustered_ols -0.013729 -0.191069 0.163611 0.841937 0.068848 order_city_code 2 0.01
AVG DT control treatment_1 30.010740 29.890785 gee -0.132541 -0.411551 0.146469 0.221096 0.108319 __total_dimension total 0.01
AVG DT control treatment_2 30.010740 29.942641 gee -0.020008 -0.292616 0.252601 0.850055 0.105833 __total_dimension total 0.01
Single-run inference and model fit summary¶
When you need the full regression output (e.g. coefficient table, R-squared) for a single run—e.g. one metric, one variant comparison—use get_inference_results on the analysis object. The returned InferenceResults includes:
.summary(): High-level summary (ATE, std error, p-value, CI) and, when available, the full model fit table..model_summary(): The underlying model fit (e.g. statsmodels GEE/OLS table) as a string, orNoneif no fitted model was attached.
This is separate from power analysis, which runs many simulations and only uses p-values and point estimates; it does not use InferenceResults.
from cluster_experiments.experiment_analysis import (
ClusteredOLSAnalysis,
GeeExperimentAnalysis,
)
# Prepare data: one comparison (control vs treatment_1), total dimension
df_single = df[df["experiment_group"].isin(["control", "treatment_1"])].copy()
# Example 1: Clustered OLS for order value
analysis_ols = ClusteredOLSAnalysis(
cluster_cols=["customer_id"],
target_col="order_value",
treatment_col="experiment_group",
treatment="treatment_1",
)
inference_ols = analysis_ols.get_inference_results(df_single, alpha=0.01)
print("=== InferenceResults summary (includes model fit when available) ===\n")
print(inference_ols.summary())
# Example 2: GEE for delivery time — show model_summary() only
analysis_gee = GeeExperimentAnalysis(
cluster_cols=["customer_id"],
target_col="order_delivery_time_in_minutes",
treatment_col="experiment_group",
treatment="treatment_1",
)
inference_gee = analysis_gee.get_inference_results(df_single, alpha=0.01)
if inference_gee.model_summary():
print("\n=== GEE model fit (statsmodels) ===\n")
print(inference_gee.model_summary())
=== InferenceResults summary (includes model fit when available) ===
Inference results
ATE: 0.00336656
Std error: 0.0511863
p-value: 0.94756
Confidence interval (1 - alpha = 99.00%)
Lower: -0.128481
Upper: 0.135214
Model fit:
OLS Regression Results
==============================================================================
Dep. Variable: order_value R-squared: 0.000
Model: OLS Adj. R-squared: -0.000
Method: Least Squares F-statistic: 0.004326
Date: dom, 01 mar 2026 Prob (F-statistic): 0.948
Time: 07:43:00 Log-Likelihood: -14041.
No. Observations: 6622 AIC: 2.809e+04
Df Residuals: 6620 BIC: 2.810e+04
Df Model: 1
Covariance Type: cluster
====================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------
Intercept 10.0144 0.037 267.461 0.000 9.941 10.088
experiment_group 0.0034 0.051 0.066 0.948 -0.097 0.104
==============================================================================
Omnibus: 2.202 Durbin-Watson: 2.007
Prob(Omnibus): 0.333 Jarque-Bera (JB): 2.209
Skew: -0.029 Prob(JB): 0.331
Kurtosis: 2.933 Cond. No. 2.65
==============================================================================
Notes:
[1] Standard Errors are robust to cluster correlation (cluster)
=== GEE model fit (statsmodels) ===
GEE Regression Results
=============================================================================================
Dep. Variable: order_delivery_time_in_minutes No. Observations: 6622
Model: GEE No. clusters: 1932
Method: Generalized Min. cluster size: 1
Estimating Equations Max. cluster size: 12
Family: Gaussian Mean cluster size: 3.4
Dependence structure: Exchangeable Num. iterations: 4
Date: dom, 01 mar 2026 Scale: 25.322
Covariance type: robust Time: 07:43:00
====================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------
Intercept 30.0108 0.090 334.078 0.000 29.835 30.187
experiment_group -0.1200 0.124 -0.971 0.332 -0.362 0.122
==============================================================================
Skew: -0.0277 Kurtosis: 0.0673
Centered skew: 0.0106 Centered kurtosis: 0.2621
==============================================================================
Shortcut: creating a simple analysis plan skipping hypothesis tests definition¶
In case the user does not need to define custom hypothesis tests, they can use the AnalysisPlan.from_metrics method to create a simple analysis plan where the user only needs to define the metrics, dimensions and variants. The method will automatically create the necessary hypothesis tests and run them on the data.
This works for the cases where all the desired tests should run with the same analysis type and configuration, and with the same dimensions to slice upon. In case the user needs to run tests with differences in such components, they should use the standard way of defining the analysis plan as illustrated in the previous section.
Below is an example of how to create a simple analysis plan using the same metrics, dimensions and variants as before. The results will be displayed in a DataFrame. Additionally, we also show how setting verbose=True will log the setup of all the comparisons that are performed when running the analysis plan.
simple_analysis_plan = AnalysisPlan.from_metrics(
metrics=[metric__delivery_time, metric__order_value],
variants=variants,
variant_col='experiment_group',
alpha=0.01,
dimensions=[dimension__city_code],
analysis_type="clustered_ols",
analysis_config={"cluster_cols":["customer_id"]},
)
simple_results = simple_analysis_plan.analyze(exp_data=df, verbose=True)
simple_results_df = simple_results.to_dataframe()
display(simple_results_df)
2026-03-01 07:43:01,702 - Metric: AVG DT, Treatment: treatment_1, Dimension: __total_dimension, Value: total 2026-03-01 07:43:01,731 - Metric: AVG DT, Treatment: treatment_1, Dimension: order_city_code, Value: 1 2026-03-01 07:43:01,747 - Metric: AVG DT, Treatment: treatment_1, Dimension: order_city_code, Value: 2 2026-03-01 07:43:01,762 - Metric: AVG DT, Treatment: treatment_2, Dimension: __total_dimension, Value: total 2026-03-01 07:43:01,779 - Metric: AVG DT, Treatment: treatment_2, Dimension: order_city_code, Value: 1 2026-03-01 07:43:01,792 - Metric: AVG DT, Treatment: treatment_2, Dimension: order_city_code, Value: 2 2026-03-01 07:43:01,806 - Metric: AOV, Treatment: treatment_1, Dimension: __total_dimension, Value: total 2026-03-01 07:43:01,824 - Metric: AOV, Treatment: treatment_1, Dimension: order_city_code, Value: 1 2026-03-01 07:43:01,837 - Metric: AOV, Treatment: treatment_1, Dimension: order_city_code, Value: 2 2026-03-01 07:43:01,850 - Metric: AOV, Treatment: treatment_2, Dimension: __total_dimension, Value: total 2026-03-01 07:43:01,868 - Metric: AOV, Treatment: treatment_2, Dimension: order_city_code, Value: 1 2026-03-01 07:43:01,881 - Metric: AOV, Treatment: treatment_2, Dimension: order_city_code, Value: 2
| metric_alias | control_variant_name | treatment_variant_name | control_variant_mean | treatment_variant_mean | analysis_type | ate | ate_ci_lower | ate_ci_upper | p_value | std_error | dimension_name | dimension_value | alpha | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AVG DT | control | treatment_1 | 30.010740 | 29.890785 | clustered_ols | -0.119954 | -0.438521 | 0.198612 | 0.332089 | 0.123675 | __total_dimension | total | 0.01 |
| 1 | AVG DT | control | treatment_1 | 29.933167 | 29.829952 | clustered_ols | -0.103215 | -0.561116 | 0.354685 | 0.561498 | 0.177768 | order_city_code | 1 | 0.01 |
| 2 | AVG DT | control | treatment_1 | 30.088890 | 29.949848 | clustered_ols | -0.139041 | -0.587720 | 0.309637 | 0.424740 | 0.174188 | order_city_code | 2 | 0.01 |
| 3 | AVG DT | control | treatment_2 | 30.010740 | 29.942641 | clustered_ols | -0.068099 | -0.383801 | 0.247603 | 0.578470 | 0.122563 | __total_dimension | total | 0.01 |
| 4 | AVG DT | control | treatment_2 | 29.933167 | 30.126895 | clustered_ols | 0.193728 | -0.261590 | 0.649046 | 0.273097 | 0.176766 | order_city_code | 1 | 0.01 |
| 5 | AVG DT | control | treatment_2 | 30.088890 | 29.754195 | clustered_ols | -0.334695 | -0.786060 | 0.116669 | 0.056130 | 0.175231 | order_city_code | 2 | 0.01 |
| 6 | AOV | control | treatment_1 | 10.014383 | 10.017749 | clustered_ols | 0.003367 | -0.128481 | 0.135214 | 0.947560 | 0.051186 | __total_dimension | total | 0.01 |
| 7 | AOV | control | treatment_1 | 10.040822 | 10.008632 | clustered_ols | -0.032189 | -0.216810 | 0.152432 | 0.653358 | 0.071674 | order_city_code | 1 | 0.01 |
| 8 | AOV | control | treatment_1 | 9.987747 | 10.026601 | clustered_ols | 0.038854 | -0.141662 | 0.219370 | 0.579297 | 0.070081 | order_city_code | 2 | 0.01 |
| 9 | AOV | control | treatment_2 | 10.014383 | 9.966632 | clustered_ols | -0.047751 | -0.179076 | 0.083574 | 0.348965 | 0.050984 | __total_dimension | total | 0.01 |
| 10 | AOV | control | treatment_2 | 10.040822 | 9.959410 | clustered_ols | -0.081412 | -0.267607 | 0.104784 | 0.260059 | 0.072286 | order_city_code | 1 | 0.01 |
| 11 | AOV | control | treatment_2 | 9.987747 | 9.974018 | clustered_ols | -0.013729 | -0.191069 | 0.163611 | 0.841937 | 0.068848 | order_city_code | 2 | 0.01 |
Bonus: Plugging in a custom analysis method¶
In case the user needs to run a custom analysis method that is not covered by the standard analysis types provided by the library, they can define a custom analysis class and plug it into the analysis plan. Below is an example of how to do this.
In this example, we define a custom analysis class that extends the ClusteredOLSAnalysis class provided by the library. The custom class will be used to run a clustered OLS analysis with a custom logic.
The analysis plan will be created with the custom analysis type mapper that will map the custom analysis type to the custom analysis class.
from cluster_experiments.experiment_analysis import ClusteredOLSAnalysis
# assuming we define a meaningful custom ExperimentAnalysis class
class CustomExperimentAnalysis(ClusteredOLSAnalysis):
def __init__(self, **kwargs):
super().__init__(**kwargs)
custom_simple_analysis_plan = AnalysisPlan.from_metrics(
metrics=[metric__order_value],
variants=variants,
variant_col='experiment_group',
alpha=0.01,
dimensions=[dimension__city_code],
analysis_type="custom_clustered_ols",
analysis_config={"cluster_cols":["customer_id"]},
custom_analysis_type_mapper={"custom_clustered_ols": CustomExperimentAnalysis}
)
custom_simple_results = custom_simple_analysis_plan.analyze(exp_data=df)
custom_simple_results_df = custom_simple_results.to_dataframe()
display(custom_simple_results_df)
| metric_alias | control_variant_name | treatment_variant_name | control_variant_mean | treatment_variant_mean | analysis_type | ate | ate_ci_lower | ate_ci_upper | p_value | std_error | dimension_name | dimension_value | alpha | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AOV | control | treatment_1 | 10.014383 | 10.017749 | custom_clustered_ols | 0.003367 | -0.128481 | 0.135214 | 0.947560 | 0.051186 | __total_dimension | total | 0.01 |
| 1 | AOV | control | treatment_1 | 10.040822 | 10.008632 | custom_clustered_ols | -0.032189 | -0.216810 | 0.152432 | 0.653358 | 0.071674 | order_city_code | 1 | 0.01 |
| 2 | AOV | control | treatment_1 | 9.987747 | 10.026601 | custom_clustered_ols | 0.038854 | -0.141662 | 0.219370 | 0.579297 | 0.070081 | order_city_code | 2 | 0.01 |
| 3 | AOV | control | treatment_2 | 10.014383 | 9.966632 | custom_clustered_ols | -0.047751 | -0.179076 | 0.083574 | 0.348965 | 0.050984 | __total_dimension | total | 0.01 |
| 4 | AOV | control | treatment_2 | 10.040822 | 9.959410 | custom_clustered_ols | -0.081412 | -0.267607 | 0.104784 | 0.260059 | 0.072286 | order_city_code | 1 | 0.01 |
| 5 | AOV | control | treatment_2 | 9.987747 | 9.974018 | custom_clustered_ols | -0.013729 | -0.191069 | 0.163611 | 0.841937 | 0.068848 | order_city_code | 2 | 0.01 |
Now it's your turn! Have fun experimenting with the library and analyzing your data!