AA test clustered
This notebook shows that, when using a clustered splitter, if the clusters explain a part of the variance, using a non-clustered analysis will lead to higher false positive rate than expected.
In particular, we use a clustered splitter and:
- An OLS-clustered robust estimator, we see that it passes the AA test
 - A simple OLS (without clustered standard errors), it fails the AA test as it returns a super high false positive rate
 
In [1]:
Copied!
from datetime import date
import numpy as np
from cluster_experiments import PowerAnalysis, ConstantPerturbator, BalancedClusteredSplitter, ExperimentAnalysis, ClusteredOLSAnalysis
import pandas as pd
import statsmodels.api as sm
# Create fake data
N = 10_000
clusters = [f"Cluster {i}" for i in range(10)]
dates = [f"{date(2022, 1, i):%Y-%m-%d}" for i in range(1, 15)]
df = pd.DataFrame(
    {
        "cluster": np.random.choice(clusters, size=N),
        "date": np.random.choice(dates, size=N),
    }
).assign(
    # Target is a linear combination of cluster and day of week, plus some noise
    cluster_id=lambda df: df["cluster"].astype("category").cat.codes,
    day_of_week=lambda df: pd.to_datetime(df["date"]).dt.dayofweek,
    target=lambda df: df["cluster_id"] + df["day_of_week"] + np.random.normal(size=N),
)
from datetime import date
import numpy as np
from cluster_experiments import PowerAnalysis, ConstantPerturbator, BalancedClusteredSplitter, ExperimentAnalysis, ClusteredOLSAnalysis
import pandas as pd
import statsmodels.api as sm
# Create fake data
N = 10_000
clusters = [f"Cluster {i}" for i in range(10)]
dates = [f"{date(2022, 1, i):%Y-%m-%d}" for i in range(1, 15)]
df = pd.DataFrame(
    {
        "cluster": np.random.choice(clusters, size=N),
        "date": np.random.choice(dates, size=N),
    }
).assign(
    # Target is a linear combination of cluster and day of week, plus some noise
    cluster_id=lambda df: df["cluster"].astype("category").cat.codes,
    day_of_week=lambda df: pd.to_datetime(df["date"]).dt.dayofweek,
    target=lambda df: df["cluster_id"] + df["day_of_week"] + np.random.normal(size=N),
)
In [2]:
Copied!
df.head()
df.head()
Out[2]:
| cluster | date | cluster_id | day_of_week | target | |
|---|---|---|---|---|---|
| 0 | Cluster 3 | 2022-01-08 | 3 | 5 | 7.534487 | 
| 1 | Cluster 2 | 2022-01-06 | 2 | 3 | 5.039041 | 
| 2 | Cluster 1 | 2022-01-14 | 1 | 4 | 5.341845 | 
| 3 | Cluster 7 | 2022-01-12 | 7 | 2 | 9.468617 | 
| 4 | Cluster 0 | 2022-01-10 | 0 | 0 | -0.644678 | 
Some clusters have a higher average outcome than others
In [3]:
Copied!
df.groupby("cluster").agg({"target": ["mean", "std"]})
df.groupby("cluster").agg({"target": ["mean", "std"]})
Out[3]:
| target | ||
|---|---|---|
| mean | std | |
| cluster | ||
| Cluster 0 | 3.027335 | 2.223308 | 
| Cluster 1 | 3.907833 | 2.211297 | 
| Cluster 2 | 4.895215 | 2.270596 | 
| Cluster 3 | 6.045043 | 2.269786 | 
| Cluster 4 | 6.902209 | 2.224554 | 
| Cluster 5 | 8.028794 | 2.313159 | 
| Cluster 6 | 9.046213 | 2.253462 | 
| Cluster 7 | 10.055748 | 2.226720 | 
| Cluster 8 | 11.048716 | 2.273583 | 
| Cluster 9 | 11.939075 | 2.216478 | 
In [4]:
Copied!
# Simple ols to run the analysis
class NonClusteredOLS(ExperimentAnalysis):
    def analysis_pvalue(self, df: pd.DataFrame, verbose: bool = False) -> float:
        """Returns the p-value of the analysis
        Arguments:
            df: dataframe containing the data to analyze
            verbose (Optional): bool, prints the regression summary if True
        """
        results_ols = sm.OLS.from_formula("target ~ treatment", data=df).fit()
        return results_ols.pvalues[self.treatment_col]
# Simple ols to run the analysis
class NonClusteredOLS(ExperimentAnalysis):
    def analysis_pvalue(self, df: pd.DataFrame, verbose: bool = False) -> float:
        """Returns the p-value of the analysis
        Arguments:
            df: dataframe containing the data to analyze
            verbose (Optional): bool, prints the regression summary if True
        """
        results_ols = sm.OLS.from_formula("target ~ treatment", data=df).fit()
        return results_ols.pvalues[self.treatment_col]
In [5]:
Copied!
cluster_cols = ["cluster", "date"]
splitter = BalancedClusteredSplitter(
    cluster_cols=cluster_cols,
)
perturbator = ConstantPerturbator()
alpha = 0.05
n_simulations = 100
# Right power analysis, we use clustered splitter and ols clustered analysis
pw_right = PowerAnalysis(
    splitter=splitter,
    perturbator=perturbator,
    alpha=alpha,
    n_simulations=n_simulations,
    analysis=ClusteredOLSAnalysis(
        cluster_cols=cluster_cols,
    ),
)
# Wrong power analysis, we use clustered splitter and regular ols
pw_wrong = PowerAnalysis(
    splitter=splitter,
    perturbator=perturbator,
    alpha=alpha,
    n_simulations=n_simulations,
    analysis=NonClusteredOLS(
        # We pass cluster_cols here, but we don't use them!!!
        cluster_cols=cluster_cols,
    ),
)
cluster_cols = ["cluster", "date"]
splitter = BalancedClusteredSplitter(
    cluster_cols=cluster_cols,
)
perturbator = ConstantPerturbator()
alpha = 0.05
n_simulations = 100
# Right power analysis, we use clustered splitter and ols clustered analysis
pw_right = PowerAnalysis(
    splitter=splitter,
    perturbator=perturbator,
    alpha=alpha,
    n_simulations=n_simulations,
    analysis=ClusteredOLSAnalysis(
        cluster_cols=cluster_cols,
    ),
)
# Wrong power analysis, we use clustered splitter and regular ols
pw_wrong = PowerAnalysis(
    splitter=splitter,
    perturbator=perturbator,
    alpha=alpha,
    n_simulations=n_simulations,
    analysis=NonClusteredOLS(
        # We pass cluster_cols here, but we don't use them!!!
        cluster_cols=cluster_cols,
    ),
)
Right way of doing it: in the AA test we get a power similar to the type I error of the test
In [6]:
Copied!
pw_right.power_analysis(df, average_effect=0.0)
pw_right.power_analysis(df, average_effect=0.0)
Out[6]:
0.06
Wrong way of doing it: the AA test fails, we have too much power
In [7]:
Copied!
pw_wrong.power_analysis(df, average_effect=0.0)
pw_wrong.power_analysis(df, average_effect=0.0)
Out[7]:
0.79