Cluster Randomization Example¶

This notebook demonstrates how to analyze a cluster-randomized experiment where randomization occurs at the group level (e.g., stores, cities, schools) rather than at the individual level.

Why Cluster Randomization?¶

Cluster randomization is necessary when:

Spillover Effects: Treatment of one individual affects others (e.g., testing driver incentives in ride-sharing)
Operational Constraints: You can't randomize at the individual level (e.g., testing a store layout)
Cost Efficiency: It's cheaper to randomize groups than individuals

Key Consideration¶

With cluster randomization, you need to account for intra-cluster correlation - observations within the same cluster are more similar than observations from different clusters. This requires using clustered standard errors or cluster-level analysis methods.

Setup¶

In [10]:

Copied!





import pandas as pd
import numpy as np
from cluster_experiments import AnalysisPlan

# Set random seed for reproducibility
np.random.seed(42)
import pandas as pd
import numpy as np
from cluster_experiments import AnalysisPlan

# Set random seed for reproducibility
np.random.seed(42)

1. Simulate Cluster-Randomized Experiment¶

Let's simulate an experiment where we test a promotional campaign across different stores. Each store is randomly assigned to control or treatment, and we observe multiple transactions per store.

In [11]:

Copied!





# Define parameters
n_stores = 50  # Number of stores (clusters)
transactions_per_store = 100  # Average transactions per store

# Step 1: Randomly assign stores to treatment
stores = pd.DataFrame({
    'store_id': range(n_stores),
    'variant': np.random.choice(['control', 'treatment'], n_stores),
})

# Step 2: Generate transaction-level data
transactions = []
for _, store in stores.iterrows():
    n_transactions = np.random.poisson(transactions_per_store)
    
    # Base purchase amount
    base_amount = 50
    
    # Treatment effect: +$5 average purchase
    treatment_effect = 5 if store['variant'] == 'treatment' else 0
    
    # Store-level random effect (intra-cluster correlation)
    store_effect = np.random.normal(0, 10)
    
    # Generate transactions
    store_transactions = pd.DataFrame({
        'store_id': store['store_id'],
        'variant': store['variant'],
        'purchase_amount': np.random.normal(
            base_amount + treatment_effect + store_effect, 
            20, 
            n_transactions
        ).clip(min=0)  # No negative purchases
    })
    
    transactions.append(store_transactions)

data = pd.concat(transactions, ignore_index=True)

print(f"Total transactions: {len(data):,}")
print(f"Stores in control: {(stores['variant'] == 'control').sum()}")
print(f"Stores in treatment: {(stores['variant'] == 'treatment').sum()}")
print(f"\nFirst few rows:")
data.head()
# Define parameters
n_stores = 50  # Number of stores (clusters)
transactions_per_store = 100  # Average transactions per store

# Step 1: Randomly assign stores to treatment
stores = pd.DataFrame({
    'store_id': range(n_stores),
    'variant': np.random.choice(['control', 'treatment'], n_stores),
})

# Step 2: Generate transaction-level data
transactions = []
for _, store in stores.iterrows():
    n_transactions = np.random.poisson(transactions_per_store)
    
    # Base purchase amount
    base_amount = 50
    
    # Treatment effect: +$5 average purchase
    treatment_effect = 5 if store['variant'] == 'treatment' else 0
    
    # Store-level random effect (intra-cluster correlation)
    store_effect = np.random.normal(0, 10)
    
    # Generate transactions
    store_transactions = pd.DataFrame({
        'store_id': store['store_id'],
        'variant': store['variant'],
        'purchase_amount': np.random.normal(
            base_amount + treatment_effect + store_effect, 
            20, 
            n_transactions
        ).clip(min=0)  # No negative purchases
    })
    
    transactions.append(store_transactions)

data = pd.concat(transactions, ignore_index=True)

print(f"Total transactions: {len(data):,}")
print(f"Stores in control: {(stores['variant'] == 'control').sum()}")
print(f"Stores in treatment: {(stores['variant'] == 'treatment').sum()}")
print(f"\nFirst few rows:")
data.head()

Total transactions: 5,055
Stores in control: 23
Stores in treatment: 27

First few rows:

Out[11]:

	variant	purchase_amount
0	control	83.479541
1	control	78.039264
2	control	65.286167
3	control	63.589803
4	control	94.543677

2. Naive Analysis (WRONG!)¶

First, let's see what happens if we ignore the clustering and use standard OLS. This is wrong because it doesn't account for intra-cluster correlation and will give you incorrect standard errors (typically too small, leading to false positives).

In [12]:

Copied!





# Naive analysis without clustering
naive_plan = AnalysisPlan.from_metrics_dict({
    'metrics': [
        {
            'alias': 'purchase_amount',
            'name': 'purchase_amount',
            'metric_type': 'simple'
        },
    ],
    'variants': [
        {'name': 'control', 'is_control': True},
        {'name': 'treatment', 'is_control': False},
    ],
    'variant_col': 'variant',
    'analysis_type': 'ols',  # Standard OLS (WRONG for clustered data!)
})

naive_results = naive_plan.analyze(data).to_dataframe()
print("=== Naive Analysis (Ignoring Clusters) ===")
print(f"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {naive_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]")
# Naive analysis without clustering
naive_plan = AnalysisPlan.from_metrics_dict({
    'metrics': [
        {
            'alias': 'purchase_amount',
            'name': 'purchase_amount',
            'metric_type': 'simple'
        },
    ],
    'variants': [
        {'name': 'control', 'is_control': True},
        {'name': 'treatment', 'is_control': False},
    ],
    'variant_col': 'variant',
    'analysis_type': 'ols',  # Standard OLS (WRONG for clustered data!)
})

naive_results = naive_plan.analyze(data).to_dataframe()
print("=== Naive Analysis (Ignoring Clusters) ===")
print(f"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {naive_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]")

=== Naive Analysis (Ignoring Clusters) ===
Treatment Effect: $4.26
Standard Error: $0.63
P-value: 0.0000
95% CI: [$3.03, $5.48]

3. Correct Analysis with Clustered Standard Errors¶

Now let's do the correct analysis by accounting for the clustering. We'll use clustered_ols which computes cluster-robust standard errors.

In [13]:

Copied!





# Correct analysis with clustered standard errors
clustered_plan = AnalysisPlan.from_metrics_dict({
    'metrics': [
        {
            'alias': 'purchase_amount',
            'name': 'purchase_amount',
            'metric_type': 'simple'
        },
    ],
    'variants': [
        {'name': 'control', 'is_control': True},
        {'name': 'treatment', 'is_control': False},
    ],
    'variant_col': 'variant',
    'analysis_type': 'clustered_ols',  # Clustered OLS (CORRECT!)
    'analysis_config': {
        'cluster_cols': ['store_id']  # Specify the clustering variable
    }
})

clustered_results = clustered_plan.analyze(data).to_dataframe()
print("=== Correct Analysis (With Clustering) ===")
print(f"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {clustered_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]")
# Correct analysis with clustered standard errors
clustered_plan = AnalysisPlan.from_metrics_dict({
    'metrics': [
        {
            'alias': 'purchase_amount',
            'name': 'purchase_amount',
            'metric_type': 'simple'
        },
    ],
    'variants': [
        {'name': 'control', 'is_control': True},
        {'name': 'treatment', 'is_control': False},
    ],
    'variant_col': 'variant',
    'analysis_type': 'clustered_ols',  # Clustered OLS (CORRECT!)
    'analysis_config': {
        'cluster_cols': ['store_id']  # Specify the clustering variable
    }
})

clustered_results = clustered_plan.analyze(data).to_dataframe()
print("=== Correct Analysis (With Clustering) ===")
print(f"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}")
print(f"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}")
print(f"P-value: {clustered_results.iloc[0]['p_value']:.4f}")
print(f"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]")

=== Correct Analysis (With Clustering) ===
Treatment Effect: $4.26
Standard Error: $3.04
P-value: 0.1610
95% CI: [$-1.70, $10.21]

4. Compare Results¶

Let's compare the two approaches side by side:

In [14]:

Copied!





comparison = pd.DataFrame({
    'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],
    'Treatment Effect': [
        f"${naive_results.iloc[0]['ate']:.2f}",
        f"${clustered_results.iloc[0]['ate']:.2f}"
    ],
    'Standard Error': [
        f"${naive_results.iloc[0]['std_error']:.2f}",
        f"${clustered_results.iloc[0]['std_error']:.2f}"
    ],
    'P-value': [
        f"{naive_results.iloc[0]['p_value']:.4f}",
        f"{clustered_results.iloc[0]['p_value']:.4f}"
    ],
    '95% CI': [
        f"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]",
        f"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]"
    ]
})

print("\n=== Comparison ===")
print(comparison.to_string(index=False))
print("\nNotice: The clustered standard errors are LARGER, reflecting the")
print("additional uncertainty from intra-cluster correlation.")
comparison = pd.DataFrame({
    'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],
    'Treatment Effect': [
        f"${naive_results.iloc[0]['ate']:.2f}",
        f"${clustered_results.iloc[0]['ate']:.2f}"
    ],
    'Standard Error': [
        f"${naive_results.iloc[0]['std_error']:.2f}",
        f"${clustered_results.iloc[0]['std_error']:.2f}"
    ],
    'P-value': [
        f"{naive_results.iloc[0]['p_value']:.4f}",
        f"{clustered_results.iloc[0]['p_value']:.4f}"
    ],
    '95% CI': [
        f"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]",
        f"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]"
    ]
})

print("\n=== Comparison ===")
print(comparison.to_string(index=False))
print("\nNotice: The clustered standard errors are LARGER, reflecting the")
print("additional uncertainty from intra-cluster correlation.")

=== Comparison ===
                 Method Treatment Effect Standard Error P-value           95% CI
            Naive (OLS)            $4.26          $0.63  0.0000   [$3.03, $5.48]
Correct (Clustered OLS)            $4.26          $3.04  0.1610 [$-1.70, $10.21]

Notice: The clustered standard errors are LARGER, reflecting the
additional uncertainty from intra-cluster correlation.

Key Takeaways¶

Always account for clustering in your analysis when randomization happens at the cluster level
Clustered standard errors are typically larger than naive standard errors
Ignoring clustering leads to overstated confidence - you might claim significance when there isn't any
Use clustered_ols analysis type and specify cluster_cols in the analysis config

When to Use Clustering¶

Use clustered analysis when:

✅ Randomization is at the group level (stores, cities, schools)
✅ There are spillover effects between individuals
✅ Observations within groups are more similar than across groups

Don't use clustering when:

❌ Randomization is truly at the individual level
❌ There's no reason to believe observations are correlated within groups