Paired T test
This notebook shows how the PairedTTestClusteredAnalysis class is performing the paired t test. It's important to get a grasp on the difference between cluster and strata columns.
In [1]:
Copied!
from cluster_experiments.experiment_analysis import PairedTTestClusteredAnalysis
import pandas as pd
from cluster_experiments.experiment_analysis import PairedTTestClusteredAnalysis
import pandas as pd
In [2]:
Copied!
# Let's generate some fake switchback data (the clusters here would be city and date
df = pd.DataFrame(
{
"country_code": ["ES"] * 4 + ["IT"] * 4 + ["PL"] * 4 + ["RO"] * 4,
"date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"] * 4,
"treatment": ["A", 'B'] * 8,
"target": [0.01] * 15 + [0.1],
}
)
# Let's generate some fake switchback data (the clusters here would be city and date
df = pd.DataFrame(
{
"country_code": ["ES"] * 4 + ["IT"] * 4 + ["PL"] * 4 + ["RO"] * 4,
"date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"] * 4,
"treatment": ["A", 'B'] * 8,
"target": [0.01] * 15 + [0.1],
}
)
Let's see what the PairedTTestClusteredAnalysis class is doing under the hood. As I am passing already the treatment column, there's no need for splitter nor perturbator
In [3]:
Copied!
analysis = PairedTTestClusteredAnalysis(
cluster_cols=["country_code", "date"], strata_cols = ['country_code']
)
analysis._preprocessing(df, verbose=True)
analysis = PairedTTestClusteredAnalysis(
cluster_cols=["country_code", "date"], strata_cols = ['country_code']
)
analysis._preprocessing(df, verbose=True)
performing paired t test in this data treatment A B country_code ES 0.01 0.010 IT 0.01 0.010 PL 0.01 0.010 RO 0.01 0.055
Out[3]:
treatment | A | B |
---|---|---|
country_code | ||
ES | 0.01 | 0.010 |
IT | 0.01 | 0.010 |
PL | 0.01 | 0.010 |
RO | 0.01 | 0.055 |
Keep in mind that strata_cols needs to be a subset of cluster_cols and it will be used as the index for pivoting.
In [4]:
Copied!
analysis.analysis_pvalue(df, verbose=True)
analysis.analysis_pvalue(df, verbose=True)
paired t test results: TtestResult(statistic=-1.0, pvalue=0.39100221895577053, df=3)
Out[4]:
0.39100221895577053
In [4]:
Copied!