It seems that the variation is larger in endline as well. Let us plot the distribution of gad2 scores by period to see if there are any other patterns
Code
```{python}# Create separate histograms for gad2 distribution by periodimport plotly.subplots as spperiods = sorted(df['period'].unique())n_periods = len(periods)fig = sp.make_subplots( rows=1, cols=n_periods, subplot_titles=[f'Period {p}' for p in periods])for i, period in enumerate(periods, 1): period_data = df[df['period'] == period]['gad2_z'].dropna() fig.add_trace( go.Histogram( x=period_data, name=f'Period {period}', nbinsx=30, showlegend=False ), row=1, col=i )fig.update_layout( title_text='Distribution of GAD-2 Z-Scores by Period', height=400, showlegend=False)fig.update_xaxes(title_text=' ')fig.update_yaxes(title_text='Frequency', col=1)fig```
Perhaps there is some heterogeneity in the skew; however, the rightward shift is clear and not being driven by outliers.
Item Breakdown
Let us tabulate the individual items to see if one is driving the change more than the other.
Code
```{python}# Calculate mean and std for each GAD-2 item by periodgad_items = ['gad_qst1', 'gad_qst2']# Create summary statistics for each itemsummary_data = []for period in sorted(df['period'].unique()): period_df = df[df['period'] == period] row = {'Period': period} for item in gad_items: mean_val = period_df[item].mean() std_val = period_df[item].std() row[f'{item}_mean'] = mean_val row[f'{item}_std'] = std_val summary_data.append(row)summary_df = pd.DataFrame(summary_data)# Rename columns for better readabilitysummary_df.columns = ['Period', 'GAD Q1 Mean', 'GAD Q1 SD', 'GAD Q2 Mean', 'GAD Q2 SD']# Display the tablesummary_df```
Period
GAD Q1 Mean
GAD Q1 SD
GAD Q2 Mean
GAD Q2 SD
0
0.0
1.033891
0.779637
1.055018
0.794112
1
1.0
1.024285
0.767843
1.021047
0.775634
2
2.0
0.936170
0.752861
0.911509
0.783479
3
3.0
0.917444
0.707400
0.846549
0.743463
4
4.0
0.840573
0.702021
0.798983
0.763099
5
5.0
0.831758
0.682128
0.735350
0.708542
6
6.0
1.163402
0.908352
1.168180
0.868176
Code
```{python}# Create table with all 7 GAD items for baseline and endlinegad7_items = [f'gad_qst{i}' for i in range(1, 8)]# Filter for baseline (period 0) and endline (period 6)periods_of_interest = [0, 6]period_labels = {0: 'Baseline', 6: 'Endline'}# Create summary datasummary_data = []for item in gad7_items: row = {'Item': item.replace('gad_qst', 'GAD Q')} means = {} for period in periods_of_interest: period_df = df[df['period'] == period] mean_val = period_df[item].mean() std_val = period_df[item].std() row[period_labels[period]] = f"{mean_val:.3f} ({std_val:.3f})" means[period] = mean_val # Add difference column diff = means[6] - means[0] row['Difference'] = f"{diff:.3f}" summary_data.append(row)gad7_summary = pd.DataFrame(summary_data)# Display the tablegad7_summary```
Item
Baseline
Endline
Difference
0
GAD Q1
1.034 (0.780)
1.163 (0.908)
0.130
1
GAD Q2
1.055 (0.794)
1.168 (0.868)
0.113
2
GAD Q3
1.047 (0.808)
1.111 (0.903)
0.064
3
GAD Q4
1.022 (0.917)
1.145 (0.940)
0.123
4
GAD Q5
1.058 (0.912)
1.275 (0.957)
0.217
5
GAD Q6
0.948 (0.893)
0.934 (0.936)
-0.014
6
GAD Q7
0.952 (0.868)
1.063 (0.947)
0.111
The biggest increase in Q5 of the GAD-7 which is “Being so restless that it is hard to sit still