VIP
  • Reports
  • Data Catalogue
  • Weekly Analysis
  • Frontier

Descriptives on ideas for Frontier Paper

Published

March 3, 2026

Abstract

Descriptives on shocks, mental health dynamics and correlation between measures

Code
```{python}
#| label: load-data
import pandas as pd
from plotnine import *
from pathlib import Path


PROJ_ROOT = Path("/Users/st2246/Work/Pilot3")
TIDY = PROJ_ROOT / "data/generated/tidy/main"

in_person = pd.read_stata(TIDY / "07_InPerson_hh_level-hh_id-period.dta", convert_categoricals=False)
in_person = in_person[in_person["period"] == 0] # Drop endline data

mental_health = pd.read_stata(TIDY / "06_Mental_Health-hh_id-period.dta", convert_categoricals=False)

planting = pd.read_stata(f"{TIDY}/29_planting-hh_id-period-crop_id.dta", convert_categoricals=False)
```

Shocks and Mental Health

Types of shock we had and counts

Code
```{python}
#| label: shock-histogram
#| fig-cap: Count of households reporting each shock type at baseline (period 0)

shock_vars = {
    'shock_any':                'Any shock',
    'shock_nonreligious':       'Non-religious\n shock',
    'shock_death_relative':     'Death of\nRelative',
    'shock_death_livestock':    'Death of\nLivestock',
    'shock_natural_disaster':   'Natural\nDisaster',
    'shock_religious_event':    'Religious\nEvent',
    'shock_other':              'Other',
}

color_map = {
    'shock_any': 'purple',
    'shock_religious_event': 'red',
}

shock_counts = (
    pd.DataFrame({
        'shock_type': list(shock_vars.values()),
        'shock_key': list(shock_vars.keys()),
        'count': [in_person[v].eq(1).sum() for v in shock_vars],
    })
    .assign(color=lambda df: df['shock_key'].map(lambda k: color_map.get(k, '#2c7bb6')))
    .sort_values('count', ascending=False)
)

(
    ggplot(shock_counts, aes(x='reorder(shock_type, -count)', y='count', fill='color'))
    + geom_bar(stat='identity', width=0.65)
    + scale_fill_identity()
    + geom_text(aes(label='count'), va='bottom', nudge_y=0.5, size=9)
    + labs(
        x='Shock Type',
        y='Number of Households',
        title='Households Reporting Shocks at Baseline',
    )
    + theme_minimal()
    + theme(
        axis_text_x=element_text(size=9),
        plot_title=element_text(face='bold'),
    )
)
```

Count of households reporting each shock type at baseline (period 0)

Count of households reporting each shock type at baseline (period 0)

Impact on MH at Baseline

Code
```{python}
#| label: shock-mh-regressions
#| output: asis

import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer
from IPython.display import HTML

# Full-scale MH measures available at baseline (period == 0)
mh_full_vars = ['phq8', 'gad7', 'pss10', 'pswq16', 'diener5', 'cantril_total']

mh_cols = ['hh_id'] + mh_full_vars
mh_bl = mental_health[mental_health['period'] == 0].copy()[mh_cols]

merged = in_person[['hh_id','shock_nonreligious', 'shock_religious_event', 'shock_any']].merge(mh_bl, on='hh_id', how='inner')

models = [
    smf.ols(f'{var} ~ shock_nonreligious', data=merged).fit()
    for var in mh_full_vars
]

models_any = [
    smf.ols(f'{var} ~ shock_any', data=merged).fit()
    for var in mh_full_vars
]

models_religious = [
    smf.ols(f'{var} ~ shock_religious_event', data=merged).fit()
    for var in mh_full_vars
]
```
  • Non-religious shocks
  • All shocks
  • Religious shocks
Code
```{python}
#| label: shock-mh-regressions-nonrelgious
#| output: asis
sg = Stargazer(models)
sg.title('Impact of non-relgious shocks on mental health at baseline')
sg.custom_columns(mh_full_vars, [1] * len(mh_full_vars))
sg.show_model_numbers(False)

sg
```
Impact of non-relgious shocks on mental health at baseline
phq8gad7pss10pswq16diener5cantril_total
Intercept8.065***7.244***19.325***51.251***20.729***11.206***
(0.150)(0.138)(0.152)(0.236)(0.225)(0.129)
shock_nonreligious-0.323*-0.2070.350*-0.038-0.251-0.041
(0.190)(0.174)(0.192)(0.298)(0.284)(0.163)
Observations227222722272227222722272
R20.0010.0010.0010.0000.0000.000
Adjusted R20.0010.0000.001-0.000-0.000-0.000
Residual Std. Error4.380 (df=2270)4.017 (df=2270)4.424 (df=2270)6.868 (df=2270)6.543 (df=2270)3.768 (df=2270)
F Statistic2.886* (df=1; 2270)1.406 (df=1; 2270)3.333* (df=1; 2270)0.016 (df=1; 2270)0.780 (df=1; 2270)0.063 (df=1; 2270)
Note:*p<0.1; **p<0.05; ***p<0.01
Code
```{python}
#| label: shock-mh-regressions-any
#| output: asis

sg = Stargazer(models_any)
sg.title('Impact of any shock on mental health at baseline')
sg.custom_columns(mh_full_vars, [1] * len(mh_full_vars))
sg.show_model_numbers(False)

sg
```
Impact of any shock on mental health at baseline
phq8gad7pss10pswq16diener5cantril_total
Intercept9.106***8.000***19.663***51.606***21.482***12.216***
(0.259)(0.238)(0.264)(0.409)(0.389)(0.223)
shock_any-1.420***-1.011***-0.135-0.433-1.039**-1.183***
(0.277)(0.255)(0.282)(0.437)(0.416)(0.238)
Observations227222722272227222722272
R20.0110.0070.0000.0000.0030.011
Adjusted R20.0110.006-0.000-0.0000.0020.010
Residual Std. Error4.358 (df=2270)4.005 (df=2270)4.427 (df=2270)6.867 (df=2270)6.535 (df=2270)3.748 (df=2270)
F Statistic26.226*** (df=1; 2270)15.743*** (df=1; 2270)0.231 (df=1; 2270)0.982 (df=1; 2270)6.244** (df=1; 2270)24.592*** (df=1; 2270)
Note:*p<0.1; **p<0.05; ***p<0.01
Code
```{python}
#| label: shock-mh-regressions-religious
#| output: asis

sg = Stargazer(models_religious)
sg.title('Impact of religious shocks on mental health at baseline')
sg.custom_columns(mh_full_vars, [1] * len(mh_full_vars))
sg.show_model_numbers(False)

sg
```
Impact of religious shocks on mental health at baseline
phq8gad7pss10pswq16diener5cantril_total
Intercept8.179***7.270***19.608***51.764***19.878***12.067***
(0.173)(0.159)(0.175)(0.271)(0.258)(0.147)
shock_religious_event-0.441**-0.217-0.089-0.748**0.967***-1.235***
(0.204)(0.187)(0.206)(0.320)(0.304)(0.174)
Observations227222722272227222722272
R20.0020.0010.0000.0020.0040.022
Adjusted R20.0020.000-0.0000.0020.0040.021
Residual Std. Error4.378 (df=2270)4.017 (df=2270)4.427 (df=2270)6.860 (df=2270)6.530 (df=2270)3.727 (df=2270)
F Statistic4.673** (df=1; 2270)1.337 (df=1; 2270)0.186 (df=1; 2270)5.478** (df=1; 2270)10.083*** (df=1; 2270)50.532*** (df=1; 2270)
Note:*p<0.1; **p<0.05; ***p<0.01
  • Strange, the cantril moves in its own direction
  • Religious event good but also other shocks good? (noisy though)
  • We do have the sample size to talk about shocks (nearly 50-50 split in who experiences it)

Dynamics and Recovery

Code
```{python}
#| label: shock-period-regressions
#| warning: false
mental_health_vars = ['phq2', 'gad2', 'pss4', 'pswq3']
mh_all = mental_health.copy()[['hh_id', 'period', 'treatment'] + mental_health_vars]
dynamic_data = mh_all.merge(in_person[['hh_id','shock_nonreligious', 'shock_religious_event', 'shock_any']], on='hh_id', how='left')

model_store = {}
for period in range(0, 7):
    model_store[period] = {}
    model_store[period]['nonreligious'] = {}
    for var in mental_health_vars:
         model = smf.ols(f'{var} ~ shock_nonreligious + C(treatment) ', data=dynamic_data[dynamic_data['period'] == period]).fit()
         model_store[period]['nonreligious'][var] = model

    model_store[period]['religious'] = {}
    for var in mental_health_vars:
         model = smf.ols(f'{var} ~ shock_religious_event + C(treatment)  ', data=dynamic_data[dynamic_data['period'] == period]).fit()
         model_store[period]['religious'][var] = model
    model_store[period]['any'] = {}
    for var in mental_health_vars:
        model = smf.ols(f'{var} ~ shock_any + C(treatment) ', data=dynamic_data[dynamic_data['period'] == period]).fit()
        model_store[period]['any'][var] = model
    


# Extract tuple of (coef, se) for shock variable across periods and outcomes
results_store = []
for period in model_store.keys():
    for shock_type in model_store[period].keys():
        for outcome in model_store[period][shock_type].keys():
            model = model_store[period][shock_type][outcome]
            coef = model.params[4]  # Assuming shock variable is the first after intercept
            se = model.bse[4]
            results_store.append({
                'period': period,
                'shock_type': shock_type,
                'outcome': outcome,
                'coef': coef,
                'se': se,
            })
results_clean = pd.DataFrame(results_store) 
results_clean['period'] = results_clean['period'].astype(int)
results_clean['coef'] = results_clean['coef'].astype(float)
results_clean['se'] = results_clean['se'].astype(float)
ojs_define(data=results_clean.to_dict(orient='records')) 
```
Code
```{ojs}
//| panel: sidebar

viewof mental_var = Inputs.select(
  ["phq2", "gad2", "pss4", "pswq3"],
  { label: "Mental scale", value: "phq2" }
)

viewof shock_type = Inputs.select(
  ["nonreligious", "religious", "any"],
  { label: "Shock type", value: "any" }
)
```
Code
filtered = data.filter(d => d.outcome === mental_var && d.shock_type === shock_type)

Plot.plot({
  width: 800,
  height: 500,
  marginLeft: 60,
  grid: true,
  x: { label: "Survey Period" },
  y: { label: "Coefficient Estimate (95% CI)" },
  marks: [
    Plot.ruleY([0]),
    Plot.dot(filtered, { x: "period", y: "coef" }),
    Plot.ruleX(filtered, {
      x: "period",
      y1: d => d.coef - 1.96 * d.se,
      y2: d => d.coef + 1.96 * d.se,
      strokeWidth: 1.5
    })
  ]
})

Mental Health Measures

Code
```{python}
# Define list of mental health variables to work with
inverted = ['diener3', 'cantril_total']
mental_health_vars = ['phq2', 'gad2', 'pss4', 'pswq3']

mental_health = pd.concat(
    [mental_health, pd.DataFrame(
        {f"{var}_z_inv": -mental_health[var] for var in inverted}
    )],
    axis=1
)

mental_health_zs = [f"{var}_z_inv" for var in inverted]
mental_health_zs += [f"{var}_z" for var in mental_health_vars]
```

Correlation heatmap between MH measures broken by period

  • Almost no correlation between positives and negatives. It seems like they are almost a separate dimension of mental health.
  • Clear link to some mental health measures like depression and anxiety. But correlation with other measures like stress and worry are less strong.
Code
```{python}
#| label: mh-correlation-all
#| warning: false
#| fig-align: center


import numpy as np

var_labels = {
    'phq2_z':              'PHQ-2\n(Depression)',
    'gad2_z':              'GAD-2\n(Anxiety)',
    'pss4_z':              'PSS-4\n(Stress)',
    'pswq3_z':             'PSWQ-3\n(Worry)',
    'diener3_z_inv':       'Diener-3\n(Happiness ↓)',
    'cantril_total_z_inv': 'Cantril\n(Life Sat. ↓)',
}

def corr_long(df, vars, label_map=None):
    """Return long-form pairwise correlation DataFrame."""
    corr_mat = df[vars].corr()
    corr_mat.index.name = 'var1'
    corr_mat.columns.name = 'var2'
    long = corr_mat.stack().reset_index()
    long.columns = ['var1', 'var2', 'r']
    long['label'] = long['r'].round(2).astype(str)
    if label_map:
        long['var1'] = long['var1'].map(label_map)
        long['var2'] = long['var2'].map(label_map)
    return long

mh_data = mental_health[['period'] + mental_health_zs].copy()
corr_all = corr_long(mh_data, mental_health_zs, var_labels)

(
    ggplot(corr_all, aes('var1', 'var2', fill='r'))
    + geom_tile(color='white', size=0.4)
    + geom_text(aes(label='label'), size=7.5, color='black')
    + scale_fill_gradient2(
        low='#d73027', mid='white', high='#1a9850',
        midpoint=0, limits=[-1, 1], name='r',
    )
    + labs(
        title='MH Measure Correlations — All Periods',
        x='', y='',
    )
    + theme_minimal()
    + theme(
        axis_text_x=element_text(size=8),
        axis_text_y=element_text(size=8),
        plot_title=element_text(face='bold'),
        figure_size=(6, 5),
    )
)
```

Code
```{python}
#| label: mh-correlation-by-period
#| fig-cap: Pairwise Pearson correlations faceted by survey period.
#| warning: false

corr_by_period = []
phone_survey_mh_zs = set(mental_health_zs) 
phone_survey_mh_zs.discard('cantril_total_z_inv')
for period, grp in mh_data.groupby('period'):
    tmp = corr_long(grp, list(phone_survey_mh_zs), var_labels)
    tmp['Period'] = f'Period {int(period)}'
    corr_by_period.append(tmp)

# Remove the cantril total from the plot

corr_period_df = pd.concat(corr_by_period, ignore_index=True)

(
    ggplot(corr_period_df, aes('var1', 'var2', fill='r'))
    + geom_tile(color='white', size=0.4)
    + geom_text(aes(label='label'), size=5.5, color='black')
    + scale_fill_gradient2(
        low='#d73027', mid='white', high='#1a9850',
        midpoint=0, limits=[-1, 1], name='r',
    )
    + facet_wrap('~Period', ncol=3)
    + labs(
        title='MH Measure Correlations by Period',
        x='', y='',
    )
    + theme_minimal()
    + theme(
        axis_text_x=element_text(size=6, angle=30, ha='right'),
        axis_text_y=element_text(size=6),
        plot_title=element_text(face='bold'),
        figure_size=(10, 10),
        subplots_adjust={'wspace': 0.4, 'hspace': 0.4},
        # Hide color legend
        legend_position='none',
    )
)
```

Pairwise Pearson correlations faceted by survey period.

Pairwise Pearson correlations faceted by survey period.

Mental Health Dynamics

  • Measurements are remarkably stable. PHQ and GAD are a lot more stable than the stress and worry measures.
  • For the main study, we should consider using the other two measures. This + the correlations not necessarily being super high suggests there is a signal in the different measures
Note

The figures and tables below show how mental health is changing over time. Change is measured as difference in z-scores. Z-scores are computed relative to baseline distribution

Code
```{python}
#| label: mh-dynamics-data
#| fig-align: center


# 2 & 3. Period-over-period change in z-scores, divided by baseline SD → # of SDs changed
dyn = (
    mental_health[['hh_id', 'period', 'treated'] + [f'{v}_z' for v in mental_health_vars]]
    .sort_values(['hh_id', 'period'])
    .copy()
)

for var in mental_health_vars:
    z_col   = f'{var}_z'
    diff_col = f'{var}_z_diff'
    dyn[diff_col] = dyn.groupby('hh_id')[z_col].diff() 

diff_vars = [f'{v}_z_diff' for v in mental_health_vars]
diff_labels = {
    'phq2_z_diff':  'PHQ-2 (Depression)',
    'gad2_z_diff':  'GAD-2 (Anxiety)',
    'pss4_z_diff':  'PSS-4 (Stress)',
    'pswq3_z_diff': 'PSWQ-3 (Worry)',
}

dyn = dyn[dyn["period"] > 0].copy() 

def make_changes_distribution(var, title):
    return (
        ggplot(dyn.dropna(subset=[var]), aes(x=var))
        + geom_histogram(bins=40)
        + facet_wrap('~period', ncol=2, labeller='label_both')
        + labs(
            title=f'{title}',
            x='',
            y='Count',
        )
        + theme_minimal()
        + theme(
            figure_size=(10, 10),
        )
    )

def make_changes_distribution_grouped(var, title):
    return (
        ggplot(dyn.dropna(subset=[var]), aes(x=var))
        + geom_histogram(bins=40)
        + labs(
            title=f'{title}',
            x='',
            y='Count',
        )
        + theme_minimal()
        + theme(
            figure_size=(10, 6),
        )
    )

def make_changes_distribution_by_treatment(var, title):
    df = dyn.dropna(subset=[var]).copy()
    df['Treatment'] = df['treated'].map({1: 'Treated', 0: 'Control'})
    return (
        ggplot(df, aes(x=var, group='Treatment', fill='Treatment'))
        + geom_histogram(aes(y=after_stat('density')), bins=40, alpha=0.5, position='identity')
        + scale_fill_manual(values={'Treated': '#d73027', 'Control': '#2c7bb6'})
        + labs(
            title=f'{title}',
            x='',
            y='Density',
        )
        + theme_minimal()
        + theme(
            figure_size=(10, 6),
        )
    )
```
Code
```{python}
#| label: mh-dynamics-summary-table

from scipy.stats import median_abs_deviation
from IPython.display import display

rows = []
for var in diff_vars:
    series = dyn[var].dropna()
    row = {
        'Measure': diff_labels[var],
        'Mean': round(series.mean(), 3),
        'SD':   round(series.std(), 3),
        'MAD':  round(median_abs_deviation(series), 3),
    }
    for period, grp in dyn.groupby('period'):
        s = grp[var].dropna()
        row[f'Period {int(period)} SD'] = round(s.std(), 3) if len(s) > 1 else float('nan')
    rows.append(row)

summary_table = pd.DataFrame(rows).set_index('Measure')
summary_table
```
Mean SD MAD Period 1 SD Period 2 SD Period 3 SD Period 4 SD Period 5 SD Period 6 SD
Measure
PHQ-2 (Depression) 0.009 1.186 0.709 1.312 1.230 1.125 1.058 1.075 1.213
GAD-2 (Anxiety) 0.025 1.283 0.751 1.301 1.343 1.274 1.120 1.172 1.339
PSS-4 (Stress) 0.005 1.398 0.852 1.348 1.406 1.411 1.264 1.276 1.624
PSWQ-3 (Worry) 0.002 1.388 0.784 1.380 1.386 1.345 1.269 1.351 1.501
Code
```{python}
(
    (
        make_changes_distribution_grouped('phq2_z_diff', 'PHQ-2') 
        | make_changes_distribution_grouped('gad2_z_diff', 'GAD-2')) 
        / (make_changes_distribution_grouped('pss4_z_diff', 'PSS-4') 
        | make_changes_distribution_grouped('pswq3_z_diff', 'PSWQ-3')
    )
)
```

  • PHQ-2
  • GAD-2
  • PSS-4
  • PSWQ-3
Code
```{python}
#| label: mh-dynamics-phq2
#| warning: false

make_changes_distribution('phq2_z_diff', 'PHQ-2')
```

Code
```{python}
#| label: mh-dynamics-gad2
#| warning: false
make_changes_distribution('gad2_z_diff', 'GAD-2')
```

Code
```{python}
#| label: mh-dynamics-pss4
#| warning: false
make_changes_distribution('pss4_z_diff', 'PSS-4')
```

Code
```{python}
#| label: mh-dynamics-pswq3
#| warning: false
make_changes_distribution('pswq3_z_diff', 'PSWQ-3')
```

By treatment

Code
```{python}
(
    (
        make_changes_distribution_by_treatment('phq2_z_diff', 'PHQ-2')
        | make_changes_distribution_by_treatment('gad2_z_diff', 'GAD-2'))
        / (make_changes_distribution_by_treatment('pss4_z_diff', 'PSS-4')
        | make_changes_distribution_by_treatment('pswq3_z_diff', 'PSWQ-3')
    )
)
```

 
Cookie Preferences