Mental Health Decline

Published

November 10, 2025

Abstract

Descriptives on mental health decline at endline

Mental Health seems to significantly drop at endline. This is some quick EDA to explore this plus try and find out why

The drop is evident across all mental health measures we have

Code
```{python}
import pandas as pd
import plotly.graph_objects as go
import numpy as np


TIDY_MAIN = "/Users/st2246/Work/Pilot3/data/generated/tidy/main"
# Load the mental health data
df = pd.read_stata(
    f"{TIDY_MAIN}/06_Mental_Health-hh_id-period.dta", convert_categoricals=False
)

# Calculate mean for the following mental health scores by period
# "gad2_z" "phq2_z" "pss4_z" "pswq3_z" "dienier3_z"
mh_means = (
    df.groupby("period")
    .agg(
        {
            "gad2_z": "mean",
            "phq2_z": "mean",
            "pss4_z": "mean",
            "pswq3_z": "mean",
            "diener3_z": "mean",
        }
    )
    .reset_index()
)

# Display table - format numbers to 3 decimal places
mh_means.style.format(
    {
        "period": "{:.0f}",
        "gad2_z": "{:.3f}",
        "phq2_z": "{:.3f}",
        "pss4_z": "{:.3f}",
        "pswq3_z": "{:.3f}",
        "diener3_z": "{:.3f}",
    }
)
```
  period gad2_z phq2_z pss4_z pswq3_z diener3_z
0 0 0.000 0.000 -0.000 -0.000 0.000
1 1 -0.033 -0.155 0.058 -0.064 0.033
2 2 -0.181 -0.282 0.015 -0.150 0.047
3 3 -0.244 -0.285 0.092 -0.119 0.004
4 4 -0.337 -0.322 -0.042 -0.370 0.011
5 5 -0.392 -0.361 0.040 -0.354 -0.002
6 6 0.182 0.079 0.126 0.115 0.284

GAD Deep Dive

First a plot of average gad2 scores (where the decline is most evident) with confidence intervals using the standard deviation

Code
```{python}
# Calculate mean and standard deviation of gad2_z by period
stats = df.groupby("period").agg({"gad2_z": ["mean", "std"]}).reset_index()

# Flatten column names
stats.columns = ["period", "gad2_z_mean", "gad2_z_std"]

# Calculate confidence intervals (1.96 * sd)
stats["ci"] = stats["gad2_z_std"] * 1.96

# Create the plot
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=stats["period"],
        y=stats["gad2_z_mean"],
        mode="lines+markers",
        name="GAD-2 Z",
        line=dict(color="blue", width=2),
        marker=dict(size=8),
    )
)

fig.add_trace(
    go.Scatter(
        x=stats["period"],
        y=stats["gad2_z_mean"] + stats["ci"],
        mode="lines",
        name="Upper CI",
        line=dict(width=0),
        showlegend=False,
    )
)

fig.add_trace(
    go.Scatter(
        x=stats["period"],
        y=stats["gad2_z_mean"] - stats["ci"],
        mode="lines",
        name="Lower CI",
        fill="tonexty",
        fillcolor="rgba(0, 0, 255, 0.2)",
        line=dict(width=0),
        showlegend=False,
    )
)

fig.update_layout(
    title="GAD-2-Z over Time",
    xaxis_title="Period",
    yaxis_title="GAD-2 (Standardized)",
    hovermode="x unified",
)

fig
```

It seems that the variation is larger in endline as well. Let us plot the distribution of gad2 scores by period to see if there are any other patterns

Code
```{python}
# Create separate histograms for gad2 distribution by period
import plotly.subplots as sp

periods = sorted(df['period'].unique())
n_periods = len(periods)

fig = sp.make_subplots(
    rows=1, cols=n_periods,
    subplot_titles=[f'Period {p}' for p in periods]
)

for i, period in enumerate(periods, 1):
    period_data = df[df['period'] == period]['gad2_z'].dropna()
    
    fig.add_trace(
        go.Histogram(
            x=period_data,
            name=f'Period {period}',
            nbinsx=30,
            showlegend=False
        ),
        row=1, col=i
    )

fig.update_layout(
    title_text='Distribution of GAD-2 Z-Scores by Period',
    height=400,
    showlegend=False
)

fig.update_xaxes(title_text=' ')
fig.update_yaxes(title_text='Frequency', col=1)

fig
```

Perhaps there is some heterogeneity in the skew; however, the rightward shift is clear and not being driven by outliers.

Item Breakdown

Let us tabulate the individual items to see if one is driving the change more than the other.

Code
```{python}
# Calculate mean and std for each GAD-2 item by period
gad_items = ['gad_qst1', 'gad_qst2']

# Create summary statistics for each item
summary_data = []
for period in sorted(df['period'].unique()):
    period_df = df[df['period'] == period]
    row = {'Period': period}
    
    for item in gad_items:
        mean_val = period_df[item].mean()
        std_val = period_df[item].std()
        row[f'{item}_mean'] = mean_val
        row[f'{item}_std'] = std_val
    
    summary_data.append(row)

summary_df = pd.DataFrame(summary_data)

# Rename columns for better readability
summary_df.columns = ['Period', 'GAD Q1 Mean', 'GAD Q1 SD', 'GAD Q2 Mean', 'GAD Q2 SD']

# Display the table
summary_df
```
Period GAD Q1 Mean GAD Q1 SD GAD Q2 Mean GAD Q2 SD
0 0.0 1.033891 0.779637 1.055018 0.794112
1 1.0 1.024285 0.767843 1.021047 0.775634
2 2.0 0.936170 0.752861 0.911509 0.783479
3 3.0 0.917444 0.707400 0.846549 0.743463
4 4.0 0.840573 0.702021 0.798983 0.763099
5 5.0 0.831758 0.682128 0.735350 0.708542
6 6.0 1.163402 0.908352 1.168180 0.868176
Code
```{python}
# Create table with all 7 GAD items for baseline and endline
gad7_items = [f'gad_qst{i}' for i in range(1, 8)]

# Filter for baseline (period 0) and endline (period 6)
periods_of_interest = [0, 6]
period_labels = {0: 'Baseline', 6: 'Endline'}

# Create summary data
summary_data = []
for item in gad7_items:
    row = {'Item': item.replace('gad_qst', 'GAD Q')}
    
    means = {}
    for period in periods_of_interest:
        period_df = df[df['period'] == period]
        mean_val = period_df[item].mean()
        std_val = period_df[item].std()
        row[period_labels[period]] = f"{mean_val:.3f} ({std_val:.3f})"
        means[period] = mean_val
    
    # Add difference column
    diff = means[6] - means[0]
    row['Difference'] = f"{diff:.3f}"
    
    summary_data.append(row)

gad7_summary = pd.DataFrame(summary_data)

# Display the table
gad7_summary
```
Item Baseline Endline Difference
0 GAD Q1 1.034 (0.780) 1.163 (0.908) 0.130
1 GAD Q2 1.055 (0.794) 1.168 (0.868) 0.113
2 GAD Q3 1.047 (0.808) 1.111 (0.903) 0.064
3 GAD Q4 1.022 (0.917) 1.145 (0.940) 0.123
4 GAD Q5 1.058 (0.912) 1.275 (0.957) 0.217
5 GAD Q6 0.948 (0.893) 0.934 (0.936) -0.014
6 GAD Q7 0.952 (0.868) 1.063 (0.947) 0.111

The biggest increase in Q5 of the GAD-7 which is “Being so restless that it is hard to sit still