Exploration of SES screening using household assets (equitytools)
Code
```{stata}do "$code/setup_p3_legacy.do"qui use "$data_gen_p3/Pilot_3_Merged.dta" if period == 0, clearlabel var mental_health_index "Mental Health Index"label var total_food_reported "Total food consumption"label var total_food_purchases "Total food purchases"```
. /*******
> Author: Simon Taye
> Date: Dec 8, 2025
> Purpose: Add paths for old pilot 3 paths for compatibility
> *******/
. ************************************************
. *****************START: Configs
. ************************************************
.
.
. **** Path for Pilot 3 generated data (for quarto)
. global data_gen_p3 "$pilot3/data/generated/p3"
. global data_gen $data_gen_p3
.
. * Location of calorie data which changes in other places this code is used
. global cal "$data/raw/pilot_3/calories_database.dta"
. * Structure for raw data - Pilot 3
. global ps "$data/raw/pilot_3/04_Phone_surveys"
. global bl "$data/raw/pilot_3/02_Baseline"
. global el "$data/raw/pilot_3/08_Endline_data"
. global fcm "$data/raw/pilot_3/07_Food_Consumption_Measure"
. global census "$data/raw/pilot_3/01_Census"
. global referral "$data/raw/pilot_3/11_Refferals/"
. * Nested structure for intervention data
. global intervention "$data/raw/pilot_3/03_Intervention_data"
. global training "$intervention/03_Treatment dissemination"
. global dropoff "$intervention/02_Drop_off"
. global pickup "$intervention/01_pick_up"
.
.
.
end of do-file
We discussed screening households by SES during census.
Data in the census
See this spreadsheet for a breakdown . Quick summary is that
Not a lot of variation in flooring material (82% cement)
Roofing material: 47% sheet metal, 41% use sheet metal + wood
Some correlation between flooring and roofing material (eg. corr between cement floor and wood roof is 0.3)
No clear connection between community size and the materials used
It seems for a lot of communities, they standardize on a certain roofing material
We also expenditure and income questions in census (0.3 correlation between them after winsorizing)
Asset-based Wealth Index
In the in-person surveys we have equity questions—do you own x, where x is a radio, watch, refrigerator and more—with the intention to create a wealth index based on them (the questions and my index was based on: https://www.equitytool.org/ghana/)
Our index seems to be a good predictor of various objective variables that captures wealth
Code
```{stata}graph combine g1 g2 g3 g4, row(2) title("Wealth Index and objective measures")```
Total value of assets (durable goods, farmland, animals)
Wealth Index (Standardized PCA)
47.22**
38.95***
3073.8***
(17.43)
(11.35)
(665.3)
Constant
341.0***
136.2***
10017.2***
(17.60)
(11.46)
(671.7)
Observations
390
390
390
R2
0.019
0.029
0.052
Concerns
The trend lines look a lot flatter when considering the raw data instead of the binscatter. Binning seems to smooth out the variation perhaps too much.
If we had screened our existing sample based on the wealth index (say at a cutoff of 2), we would have still kept the house with the highest assets and the majority of those who report fairly high food purchases. At 1, we would still have the highest-asset households.
Code
```{stata}*| output: asisquietly count if wealth_index < 2qui local c = r(N)qui local cp = string((`c' / 390) * 100, "%5.2f")quietly count if wealth_index < 1qui local c2 = r(N)qui local c2p = string((`c2' / 390) * 100, "%5.2f")quietly count if wealth_index < 1.5qui local c3 = r(N)qui local c3p = string((`c3' / 390) * 100, "%5.2f")di "*A cutoff of 2 would give us a sample of `c' / 390 households* (`cp'%); <br>" di "*A cut of of 1.5 gives `c3' participants* (`c3p'%) <br>"di "*A cut of of 1 gives `c2' participants* (`c2p'%) <br>"```
A cutoff of 2 would give us a sample of 371 / 390 households (95.13%); A cut of of 1.5 gives 359 participants (92.05%) A cut of of 1 gives 322 participants (82.56%)