VIP
  • Reports
  • Data Catalogue
  • Weekly Analysis
  • Frontier

Measuring mental health

Published

April 2, 2026

Abstract

TBD

Code
```{python}
import pandas as pd
from plotnine import *
from pathlib import Path


PROJ_ROOT = Path("/Users/st2246/Work/Pilot3")
TIDY = PROJ_ROOT / "data/generated/tidy/main"
TRANSFORM = PROJ_ROOT / "data/generated/main/transform"
COMMUNITY_DATA = PROJ_ROOT / "quarto/census_sampling/resources/community_data.dta"

mental_health = pd.read_stata(TIDY / "06_Mental_Health-hh_id-period.dta", convert_categoricals=False)
period_data = pd.read_stata(TIDY / "16_period_hh_level-hh_id-period.dta", convert_categoricals=False)
mental_health = mental_health.merge(period_data[["hh_id", "period", "survey_date"]], on=["hh_id", "period"], how="left")
mental_health["hh_id"] = mental_health["hh_id"].astype(int)
```

Important

We explored this line of work to see if negative mental health (depression, anxiety) is really the opposite of positive mental health. We had noticed that positive / negative mh were positively correlated leading us to this line of work.

However, that result was an error in Simon’s code. I (Simon) had recoded the postive mental health measures (life satisfaction, happiness) such that higher values were worse mental health to make them all point in the same direction (the initial work exploring correlation was to see if we could choose a smaller set of questions for measuring various aspects of mental health; I forgot to note this in the figure thus leading to the confusion).

The results below show that positive and negative mental health have the relationship that we would expect.

Positive and Negative MH

  • Median Cross tab
  • p75
Code
```{python}
mental_health_measures = mental_health.copy()

medians = mental_health_measures.groupby("period").agg(
    phq2_z_p50=("phq2_z", lambda x: x.quantile(0.5)),
    gad2_z_p50=("gad2_z", lambda x: x.quantile(0.5)),
    pss4_z_p50=("pss4_z", lambda x: x.quantile(0.5)),
    pswq3_z_p50=("pswq3_z", lambda x: x.quantile(0.5)),
    diener3_p50=("diener3_z", lambda x: x.quantile(0.5)),
)

mental_health_measures = mental_health_measures.merge(medians, on="period", how="left")
# Tag below above medians
mental_health_measures["phq2_above_median"] = (
    mental_health_measures["phq2_z"] > mental_health_measures["phq2_z_p50"]
).astype(int)
mental_health_measures["gad2_above_median"] = (
    mental_health_measures["gad2_z"] > mental_health_measures["gad2_z_p50"]
).astype(int)
mental_health_measures["pss4_above_median"] = (
    mental_health_measures["pss4_z"] > mental_health_measures["pss4_z_p50"]
).astype(int)
mental_health_measures["pswq3_above_median"] = (
    mental_health_measures["pswq3_z"] > mental_health_measures["pswq3_z_p50"]
).astype(int)
mental_health_measures["diener3_above_median"] = (
    mental_health_measures["diener3_z"] > mental_health_measures["diener3_p50"]
).astype(int)

# Great a group variable based on the 4 possible combinatons of phq2_above_median and diener3_above_median (
mental_health_measures["happy_sad_group"] = (
    mental_health_measures["phq2_above_median"].astype(str)
    + "_"
    + mental_health_measures["diener3_above_median"].astype(str)
)

group_map = {
    "0_1": "Happy above p50",
    "1_0": "Sad above p50",
    "0_0": "Both below p50",
    "1_1": "Both above p50",
}
mental_health_measures["happy_sad_group"] = mental_health_measures[
    "happy_sad_group"
].map(group_map)

mental_health_measures["happy_sad_group"] = pd.Categorical(
    mental_health_measures["happy_sad_group"],
    categories=["Happy above p50", "Sad above p50", "Both below p50", "Both above p50"],
    ordered=True,
)

# Print counts of the group
print(mental_health_measures["happy_sad_group"].value_counts())
```
happy_sad_group
Both below p50     6696
Happy above p50    6314
Sad above p50      2395
Both above p50     1595
Name: count, dtype: int64
Code
```{python}
mental_health_measures = mental_health.copy()

medians = mental_health_measures.groupby("period").agg(
    phq2_z_p50=("phq2_z", lambda x: x.quantile(0.75)),
    gad2_z_p50=("gad2_z", lambda x: x.quantile(0.75)),
    pss4_z_p50=("pss4_z", lambda x: x.quantile(0.75)),
    pswq3_z_p50=("pswq3_z", lambda x: x.quantile(0.75)),
    diener3_p50=("diener3_z", lambda x: x.quantile(0.75)),
)

mental_health_measures = mental_health_measures.merge(medians, on="period", how="left")
# Tag below above medians
mental_health_measures["phq2_above_median"] = (
    mental_health_measures["phq2_z"] > mental_health_measures["phq2_z_p50"]
).astype(int)
mental_health_measures["gad2_above_median"] = (
    mental_health_measures["gad2_z"] > mental_health_measures["gad2_z_p50"]
).astype(int)
mental_health_measures["pss4_above_median"] = (
    mental_health_measures["pss4_z"] > mental_health_measures["pss4_z_p50"]
).astype(int)
mental_health_measures["pswq3_above_median"] = (
    mental_health_measures["pswq3_z"] > mental_health_measures["pswq3_z_p50"]
).astype(int)
mental_health_measures["diener3_above_median"] = (
    mental_health_measures["diener3_z"] > mental_health_measures["diener3_p50"]
).astype(int)

# Great a group variable based on the 4 possible combinatons of phq2_above_median and diener3_above_median (
mental_health_measures["happy_sad_group"] = (
    mental_health_measures["phq2_above_median"].astype(str)
    + "_"
    + mental_health_measures["diener3_above_median"].astype(str)
)

group_map = {
    "0_1": "Happy above p75",
    "1_0": "Sad above p75",
    "0_0": "Both below p75",
    "1_1": "Both above p75",
}
mental_health_measures["happy_sad_group"] = mental_health_measures[
    "happy_sad_group"
].map(group_map)

mental_health_measures["happy_sad_group"] = pd.Categorical(
    mental_health_measures["happy_sad_group"],
    categories=["Happy above p75", "Sad above p75", "Both below p75", "Both above p75"],
    ordered=True,
)

# Print counts of the group
print(mental_health_measures["happy_sad_group"].value_counts())
```
happy_sad_group
Both below p75     11219
Happy above p75     2932
Sad above p75       2371
Both above p75       478
Name: count, dtype: int64

Scatterplot

  • Binscatter Index
  • Index
  • Depression
Code
```{python}
import plotly.express as px
from binscatter import binscatter

binscatter(
    mental_health_measures,
    x="diener3_z",
    y="mental_health_index_neg",
    poly_line=1,
)
```
Code
```{python}
(
    ggplot(mental_health_measures, aes(x="diener3_z", y="mental_health_index_neg"))
    + geom_jitter(alpha=0.2)
    + geom_smooth(method="lm", color="blue")
    + labs(x="Diener-3 (Life Satisfaction)", y="Index of Negative measures")
    + theme_minimal()
)
```
/Users/st2246/Work/Pilot3/venv/lib/python3.12/site-packages/plotnine/layer.py:374: PlotnineWarning: geom_jitter : Removed 62 rows containing missing values.

Code
```{python}
(
    ggplot(mental_health_measures, aes(x='diener3_z', y='phq2_z')) +
    geom_jitter(alpha=0.2) +
    geom_smooth(method='lm', color='blue') +
    labs(x='Diener-3 (Life Satisfaction)', y='PHQ-2 (Depression)') +
    theme_minimal()
)
```
/Users/st2246/Work/Pilot3/venv/lib/python3.12/site-packages/plotnine/layer.py:374: PlotnineWarning: geom_jitter : Removed 62 rows containing missing values.

Regressions

  • Poor mental health predicting happiness
  • Happiness predicting poor mental health
Code
```{python}
#| label: neg-mh-on-diener
#| output: asis

import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer

dep_vars = ["mental_health_index_neg", "phq2_z", "gad2_z", "pswq3_z", "pss4_z"]
models = []
for var in dep_vars:
    df = mental_health.rename(columns={var: "negative_mh"}).copy()
    models.append(smf.ols("diener3_z ~ negative_mh + C(period)", data=df).fit())

sg = Stargazer(models)
sg.title("Negative MH Measures on Life Satisfaction (Diener)")
sg.custom_columns(dep_vars, [1] * len(dep_vars))
sg.show_model_numbers(False)
sg.covariate_order(["Intercept", "negative_mh"])

sg
```
Negative MH Measures on Life Satisfaction (Diener)
Dependent variable: diener3_z
mental_health_index_negphq2_zgad2_zpswq3_zpss4_z
Intercept0.036*-0.000-0.000-0.000-0.000
(0.021)(0.021)(0.021)(0.021)(0.020)
negative_mh-0.252***-0.156***-0.113***-0.130***-0.246***
(0.008)(0.008)(0.008)(0.007)(0.007)
Observations1693816938169381693816938
R20.0680.0280.0200.0270.076
Adjusted R20.0680.0280.0200.0260.076
Residual Std. Error0.980 (df=16929)1.001 (df=16929)1.005 (df=16929)1.002 (df=16929)0.976 (df=16929)
F Statistic154.539*** (df=8; 16929)62.012*** (df=8; 16929)44.286*** (df=8; 16929)58.163*** (df=8; 16929)174.618*** (df=8; 16929)
Note:*p<0.1; **p<0.05; ***p<0.01
Code
```{python}
#| label: diener-on-neg-mh
#| output: asis

import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer

dep_vars = ["mental_health_index_neg", "phq2_z", "gad2_z", "pswq3_z", "pss4_z"]
models = [
    smf.ols(f"{var} ~ diener3_z + C(period)", data=mental_health).fit()
    for var in dep_vars
]

sg = Stargazer(models)
sg.title("Negative MH Measures on Life Satisfaction (Diener)")
sg.custom_columns(dep_vars, [1] * len(dep_vars))
sg.show_model_numbers(False)
sg.covariate_order(["Intercept", "diener3_z"])

sg
```
Negative MH Measures on Life Satisfaction (Diener)
mental_health_index_negphq2_zgad2_zpswq3_zpss4_z
Intercept0.144***0.0000.000-0.0000.000
(0.020)(0.019)(0.021)(0.022)(0.022)
diener3_z-0.240***-0.131***-0.110***-0.143***-0.279***
(0.007)(0.007)(0.008)(0.008)(0.008)
Observations1693816938169381693816938
R20.0850.0450.0470.0400.072
Adjusted R20.0850.0450.0460.0400.071
Residual Std. Error0.957 (df=16929)0.915 (df=16929)0.990 (df=16929)1.051 (df=16929)1.040 (df=16929)
F Statistic197.727*** (df=8; 16929)100.531*** (df=8; 16929)103.913*** (df=8; 16929)88.070*** (df=8; 16929)163.446*** (df=8; 16929)
Note:*p<0.1; **p<0.05; ***p<0.01
 
Cookie Preferences