Cluster Randomization Example¶
This notebook demonstrates how to analyze a cluster-randomized experiment where randomization occurs at the group level (e.g., stores, cities, schools) rather than at the individual level.
Why Cluster Randomization?¶
Cluster randomization is necessary when:
- Spillover Effects: Treatment of one individual affects others (e.g., testing driver incentives in ride-sharing)
- Operational Constraints: You can't randomize at the individual level (e.g., testing a store layout)
- Cost Efficiency: It's cheaper to randomize groups than individuals
Key Consideration¶
With cluster randomization, you need to account for intra-cluster correlation - observations within the same cluster are more similar than observations from different clusters. This requires using clustered standard errors or cluster-level analysis methods.
Setup¶
import pandas as pd
import numpy as np
from cluster_experiments import AnalysisPlan
# Set random seed for reproducibility
np.random.seed(42)
1. Simulate Cluster-Randomized Experiment¶
Let's simulate an experiment where we test a promotional campaign across different stores. Each store is randomly assigned to control or treatment, and we observe multiple transactions per store.
# Define parameters
n_stores = 50 # Number of stores (clusters)
transactions_per_store = 100 # Average transactions per store
# Step 1: Randomly assign stores to treatment
stores = pd.DataFrame({
'store_id': range(n_stores),
'variant': np.random.choice(['control', 'treatment'], n_stores),
})
# Step 2: Generate transaction-level data
transactions = []
for _, store in stores.iterrows():
n_transactions = np.random.poisson(transactions_per_store)
# Base purchase amount
base_amount = 50
# Treatment effect: +$5 average purchase
treatment_effect = 5 if store['variant'] == 'treatment' else 0
# Store-level random effect (intra-cluster correlation)
store_effect = np.random.normal(0, 10)
# Generate transactions
store_transactions = pd.DataFrame({
'store_id': store['store_id'],
'variant': store['variant'],
'purchase_amount': np.random.normal(
base_amount + treatment_effect + store_effect,
20,
n_transactions
).clip(min=0) # No negative purchases
})
transactions.append(store_transactions)
data = pd.concat(transactions, ignore_index=True)
print(f"Total transactions: {len(data):,}")
print(f"Stores in control: {(stores['variant'] == 'control').sum()}")
print(f"Stores in treatment: {(stores['variant'] == 'treatment').sum()}")
print(f"\nFirst few rows:")
data.head()
Total transactions: 5,055 Stores in control: 23 Stores in treatment: 27 First few rows:
| store_id | variant | purchase_amount | |
|---|---|---|---|
| 0 | 0 | control | 83.479541 |
| 1 | 0 | control | 78.039264 |
| 2 | 0 | control | 65.286167 |
| 3 | 0 | control | 63.589803 |
| 4 | 0 | control | 94.543677 |
2. Naive Analysis (WRONG!)¶
First, let's see what happens if we ignore the clustering and use standard OLS. This is wrong because it doesn't account for intra-cluster correlation and will give you incorrect standard errors (typically too small, leading to false positives).
# Naive analysis without clustering
naive_plan = AnalysisPlan.from_metrics_dict({
'metrics': [
{
'alias': 'purchase_amount',
'name': 'purchase_amount',
'metric_type': 'simple'
},
],
'variants': [
{'name': 'control', 'is_control': True},
{'name': 'treatment', 'is_control': False},
],
'variant_col': 'variant',
'analysis_type': 'ols', # Standard OLS (WRONG for clustered data!)
})
naive_results = naive_plan.analyze(data).to_dataframe()
print("=== Naive Analysis (Ignoring Clusters) ===")
print(f"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {naive_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]")
=== Naive Analysis (Ignoring Clusters) === Treatment Effect: $4.26 Standard Error: $0.63 P-value: 0.0000 95% CI: [$3.03, $5.48]
3. Correct Analysis with Clustered Standard Errors¶
Now let's do the correct analysis by accounting for the clustering. We'll use clustered_ols which computes cluster-robust standard errors.
# Correct analysis with clustered standard errors
clustered_plan = AnalysisPlan.from_metrics_dict({
'metrics': [
{
'alias': 'purchase_amount',
'name': 'purchase_amount',
'metric_type': 'simple'
},
],
'variants': [
{'name': 'control', 'is_control': True},
{'name': 'treatment', 'is_control': False},
],
'variant_col': 'variant',
'analysis_type': 'clustered_ols', # Clustered OLS (CORRECT!)
'analysis_config': {
'cluster_cols': ['store_id'] # Specify the clustering variable
}
})
clustered_results = clustered_plan.analyze(data).to_dataframe()
print("=== Correct Analysis (With Clustering) ===")
print(f"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {clustered_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]")
=== Correct Analysis (With Clustering) === Treatment Effect: $4.26 Standard Error: $3.04 P-value: 0.1610 95% CI: [$-1.70, $10.21]
4. Compare Results¶
Let's compare the two approaches side by side:
comparison = pd.DataFrame({
'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],
'Treatment Effect': [
f"${naive_results.iloc[0]['ate']:.2f}",
f"${clustered_results.iloc[0]['ate']:.2f}"
],
'Standard Error': [
f"${naive_results.iloc[0]['std_error']:.2f}",
f"${clustered_results.iloc[0]['std_error']:.2f}"
],
'P-value': [
f"{naive_results.iloc[0]['p_value']:.4f}",
f"{clustered_results.iloc[0]['p_value']:.4f}"
],
'95% CI': [
f"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]",
f"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]"
]
})
print("\n=== Comparison ===")
print(comparison.to_string(index=False))
print("\nNotice: The clustered standard errors are LARGER, reflecting the")
print("additional uncertainty from intra-cluster correlation.")
=== Comparison ===
Method Treatment Effect Standard Error P-value 95% CI
Naive (OLS) $4.26 $0.63 0.0000 [$3.03, $5.48]
Correct (Clustered OLS) $4.26 $3.04 0.1610 [$-1.70, $10.21]
Notice: The clustered standard errors are LARGER, reflecting the
additional uncertainty from intra-cluster correlation.
Key Takeaways¶
- Always account for clustering in your analysis when randomization happens at the cluster level
- Clustered standard errors are typically larger than naive standard errors
- Ignoring clustering leads to overstated confidence - you might claim significance when there isn't any
- Use
clustered_olsanalysis type and specifycluster_colsin the analysis config
When to Use Clustering¶
Use clustered analysis when:
- ✅ Randomization is at the group level (stores, cities, schools)
- ✅ There are spillover effects between individuals
- ✅ Observations within groups are more similar than across groups
Don't use clustering when:
- ❌ Randomization is truly at the individual level
- ❌ There's no reason to believe observations are correlated within groups