cluster_experiments¶

License

A Python library for end-to-end A/B testing workflows, featuring: - Experiment analysis and scorecards - Power analysis (simulation-based and normal approximation) - Variance reduction techniques (CUPED, CUPAC) - Support for complex experimental designs (cluster randomization, switchback experiments)

Key Features¶

1. Power Analysis¶

Simulation-based: Run Monte Carlo simulations to estimate power
Normal approximation: Fast power estimation using CLT
Minimum Detectable Effect: Calculate required effect sizes
Multiple designs: Support for:
Simple randomization
Variance reduction techniques in power analysis
Cluster randomization
Switchback experiments
Dict config: Easy to configure power analysis with a dictionary

2. Experiment Analysis¶

Analysis Plans: Define structured analysis plans
Metrics:
Simple metrics
Ratio metrics
Dimensions: Slice results by dimensions
Statistical Methods:
GEE
Mixed Linear Models
Clustered / regular OLS
T-tests
Synthetic Control
Dict config: Easy to define analysis plans with a dictionary

3. Variance Reduction¶

CUPED (Controlled-experiment Using Pre-Experiment Data):
Use historical outcome data to reduce variance, choose any granularity
Support for several covariates
CUPAC (Control Using Predictors as Covariates):
Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data

Quick Start¶

Power Analysis Example¶

import numpy as np
import pandas as pd
from cluster_experiments import PowerAnalysis, NormalPowerAnalysis

# Create sample data
N = 1_000
df = pd.DataFrame({
    "target": np.random.normal(0, 1, size=N),
    "date": pd.to_datetime(
        np.random.randint(
            pd.Timestamp("2024-01-01").value,
            pd.Timestamp("2024-01-31").value,
            size=N,
        )
    ),
})

# Simulation-based power analysis with CUPED
config = {
    "analysis": "ols",
    "perturbator": "constant",
    "splitter": "non_clustered",
    "n_simulations": 50,
}
pw = PowerAnalysis.from_dict(config)
power = pw.power_analysis(df, average_effect=0.1)

# Normal approximation (faster)
npw = NormalPowerAnalysis.from_dict({
    "analysis": "ols",
    "splitter": "non_clustered",
    "n_simulations": 5,
    "time_col": "date",
})
power_normal = npw.power_analysis(df, average_effect=0.1)
power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3])


# MDE calculation
mde = npw.mde(df, power=0.8)

# MDE line with length
mde_timeline = npw.mde_time_line(
    df,
    powers=[0.8],
    experiment_length=[7, 14, 21]
)

print(power, power_line_normal, power_normal, mde, mde_timeline)

Experiment Analysis Example¶

import numpy as np
import pandas as pd
from cluster_experiments import AnalysisPlan

N = 1_000
experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "delivery_time": np.random.normal(10, 1, size=N),
    "experiment_group": np.random.choice(["control", "treatment"], size=N),
    "city": np.random.choice(["NYC", "LA"], size=N),
    "customer_id": np.random.randint(1, 100, size=N),
    "customer_age": np.random.randint(20, 60, size=N),
})

# Create analysis plan
plan = AnalysisPlan.from_metrics_dict({
    "metrics": [
        {"alias": "AOV", "name": "order_value"},
        {"alias": "delivery_time", "name": "delivery_time"},
    ],
    "variants": [
        {"name": "control", "is_control": True},
        {"name": "treatment", "is_control": False},
    ],
    "variant_col": "experiment_group",
    "alpha": 0.05,
    "dimensions": [
        {"name": "city", "values": ["NYC", "LA"]},
    ],
    "analysis_type": "clustered_ols",
    "analysis_config": {"cluster_cols": ["customer_id"]},
})
# Run analysis
print(plan.analyze(experiment_data).to_dataframe())

Variance Reduction Example¶

import numpy as np
import pandas as pd
from cluster_experiments import (
    AnalysisPlan,
    SimpleMetric,
    Variant,
    Dimension,
    TargetAggregation,
    HypothesisTest
)

N = 1000

experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "delivery_time": np.random.normal(10, 1, size=N),
    "experiment_group": np.random.choice(["control", "treatment"], size=N),
    "city": np.random.choice(["NYC", "LA"], size=N),
    "customer_id": np.random.randint(1, 100, size=N),
    "customer_age": np.random.randint(20, 60, size=N),
})

pre_experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "customer_id": np.random.randint(1, 100, size=N),
})

# Define test
cupac_model = TargetAggregation(
    agg_col="customer_id",
    target_col="order_value"
)

hypothesis_test = HypothesisTest(
    metric=SimpleMetric(alias="AOV", name="order_value"),
    analysis_type="clustered_ols",
    analysis_config={
        "cluster_cols": ["customer_id"],
        "covariates": ["customer_age", "estimate_order_value"],
    },
    cupac_config={
        "cupac_model": cupac_model,
        "target_col": "order_value",
    },
)

# Create analysis plan
plan = AnalysisPlan(
    tests=[hypothesis_test],
    variants=[
        Variant("control", is_control=True),
        Variant("treatment", is_control=False),
    ],
    variant_col="experiment_group",
)

# Run analysis
results = plan.analyze(experiment_data, pre_experiment_data)
print(results.to_dataframe())

Installation¶

You can install this package via pip.

pip install cluster-experiments

For detailed documentation and examples, visit our documentation site.

Features¶

The library offers the following classes:

Regarding power analysis:
- PowerAnalysis: to run power analysis on any experiment design, using simulation
- PowerAnalysisWithPreExperimentData: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control)
- NormalPowerAnalysis: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level.
- ConstantPerturbator: to artificially perturb treated group with constant perturbations
- BinaryPerturbator: to artificially perturb treated group for binary outcomes
- RelativePositivePerturbator: to artificially perturb treated group with relative positive perturbations
- RelativeMixedPerturbator: to artificially perturb treated group with relative perturbations for positive and negative targets
- NormalPerturbator: to artificially perturb treated group with normal distribution perturbations
- BetaRelativePositivePerturbator: to artificially perturb treated group with relative positive beta distribution perturbations
- BetaRelativePerturbator: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval
- SegmentedBetaRelativePerturbator: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters
Regarding splitting data:
- ClusteredSplitter: to split data based on clusters
- FixedSizeClusteredSplitter: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control)
- BalancedClusteredSplitter: to split data based on clusters in a balanced way
- NonClusteredSplitter: Regular data splitting, no clusters
- StratifiedClusteredSplitter: to split based on clusters and strata, balancing the number of clusters in each stratus
- RepeatedSampler: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups
- Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency):
  - SwitchbackSplitter: to split data based on clusters and dates, for switchback experiments
  - BalancedSwitchbackSplitter: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters
  - StratifiedSwitchbackSplitter: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus
  - Washover for switchback experiments:
    - EmptyWashover: no washover done at all.
    - ConstantWashover: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval.
Regarding analysis methods:
- GeeExperimentAnalysis: to run GEE analysis on the results of a clustered design
- MLMExperimentAnalysis: to run Mixed Linear Model analysis on the results of a clustered design
- TTestClusteredAnalysis: to run a t-test on aggregated data for clusters
- PairedTTestClusteredAnalysis: to run a paired t-test on aggregated data for clusters
- ClusteredOLSAnalysis: to run OLS analysis on the results of a clustered design
- OLSAnalysis: to run OLS analysis for non-clustered data
- DeltaMethodAnalysis: to run Delta Method Analysis for clustered designs
- TargetAggregation: to add pre-experimental data of the outcome to reduce variance
- SyntheticControlAnalysis: to run synthetic control analysis
Regarding experiment analysis workflow:
- Metric: abstract class to define a metric to be used in the analysis
- SimpleMetric: to create a metric defined at the same level of the data used for the analysis
- RatioMetric: to create a metric defined at a lower level than the data used for the analysis
- Variant: to define a variant of the experiment
- Dimension: to define a dimension to slice the results of the experiment
- HypothesisTest: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions
- AnalysisPlan: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The analyze() method runs the analysis and returns the results
- AnalysisResults: to store the results of an analysis
Other:
- PowerConfig: to conveniently configure PowerAnalysis class
- ConfidenceInterval: to store the data representation of a confidence interval
- InferenceResults: to store the structure of complete statistical analysis results