Paired T test

This notebook shows how the PairedTTestClusteredAnalysis class is performing the paired t test. It's important to get a grasp on the difference between cluster and strata columns.

In [1]:

Copied!

from cluster_experiments.experiment_analysis import PairedTTestClusteredAnalysis

import pandas as pd
from cluster_experiments.experiment_analysis import PairedTTestClusteredAnalysis

import pandas as pd

In [2]:

Copied!





# Let's generate some fake switchback data (the clusters here would be city and date
df = pd.DataFrame(
        {
            "country_code": ["ES"] * 4 + ["IT"] * 4 + ["PL"] * 4 + ["RO"] * 4,
            "date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"] * 4,
            "treatment": ["A", 'B'] * 8,
            "target": [0.01] * 15 + [0.1],
        }
    )
# Let's generate some fake switchback data (the clusters here would be city and date
df = pd.DataFrame(
        {
            "country_code": ["ES"] * 4 + ["IT"] * 4 + ["PL"] * 4 + ["RO"] * 4,
            "date": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04"] * 4,
            "treatment": ["A", 'B'] * 8,
            "target": [0.01] * 15 + [0.1],
        }
    )

Let's see what the PairedTTestClusteredAnalysis class is doing under the hood. As I am passing already the treatment column, there's no need for splitter nor perturbator

In [3]:

Copied!

analysis = PairedTTestClusteredAnalysis(
    cluster_cols=["country_code", "date"], strata_cols = ['country_code']
)

analysis._preprocessing(df, verbose=True)
analysis = PairedTTestClusteredAnalysis(
    cluster_cols=["country_code", "date"], strata_cols = ['country_code']
)

analysis._preprocessing(df, verbose=True)

performing paired t test in this data 
 treatment        A      B
country_code             
ES            0.01  0.010
IT            0.01  0.010
PL            0.01  0.010
RO            0.01  0.055

Out[3]:

treatment	A	B
country_code
ES	0.01	0.010
IT	0.01	0.010
PL	0.01	0.010
RO	0.01	0.055

Keep in mind that strata_cols needs to be a subset of cluster_cols and it will be used as the index for pivoting.

In [4]:

Copied!

analysis.analysis_pvalue(df, verbose=True)
analysis.analysis_pvalue(df, verbose=True)

paired t test results: 
 TtestResult(statistic=-1.0, pvalue=0.39100221895577053, df=3)

Out[4]:

0.39100221895577053

In [4]: