EDA for Randomization Data

Published

July 22, 2025

Abstract

Plots and descriptives for potential screening criteria

Number of Eliglible Households

Code
```{stata}
tab eligible
```

  Household |
     has at |
  least one |
   eligible |
    members |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        702       16.21       16.21
          1 |      3,629       83.79      100.00
------------+-----------------------------------
      Total |      4,331      100.00

Our target sample is ~2800 so we can cut about 20% of all eligible HHs

Descriptives of Screening Criteria

Some possible screening criteria to consider:

Wealth / SES:

  • Subjective ability to cope with lean season
  • Income
  • Expenditure
  • Salaried Worker

Others:

  • Household Size
  • Primarily Agricultural
Important

The descriptives stats below only include eligible households

Code
```{stata}
graph display desc_agri
```

Code
```{stata}
graph display desc_agri_share
```

Code
```{stata}
label var hh_income "Household Income (1 month)"
su hh_income, d
```

                 Household Income (1 month)
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%          100              0
10%          100              0       Obs               3,567
25%          230              0       Sum of wgt.       3,567

50%          500                      Mean            1032.08
                        Largest       Std. dev.      1596.417
75%         1050          16000
90%         2400          20000       Variance        2548548
95%         3500          25000       Skewness       5.906999
99%         7500          30000       Kurtosis       66.68888
Code
```{stata}
label var total_meal "Total Expenses (1 wk)"
su total_meal, d
```

                    Total Expenses (1 wk)
-------------------------------------------------------------
      Percentiles      Smallest
 1%           35            -99
 5%           70              0
10%          100             10       Obs               3,629
25%          150             10       Sum of wgt.       3,629

50%          300                      Mean           461.4188
                        Largest       Std. dev.      688.1489
75%          500           7000
90%         1000           7500       Variance       473548.9
95%         1500           9500       Skewness       9.742997
99%         3000          20000       Kurtosis       205.5522
Note

Figure available in appendix

Code
```{stata}
label var hh_cope_lean "How difficult is it to cope w/ lean season"
tab hh_cope_lean
```

 How difficult is it |
     to cope w/ lean |
              season |      Freq.     Percent        Cum.
---------------------+-----------------------------------
Not difficult at all |         89        2.45        2.45
  Slightly difficult |        336        9.26       11.71
Moderately difficult |        592       16.31       28.02
      Very difficult |      2,077       57.23       85.26
 Extremely difficult |        535       14.74      100.00
---------------------+-----------------------------------
               Total |      3,629      100.00
Code
```{stata}
su roster_size, d
```

               Number of surveyed individuals
-------------------------------------------------------------
      Percentiles      Smallest
 1%            3              1
 5%            4              1
10%            4              2       Obs               3,629
25%            6              2       Sum of wgt.       3,629

50%            7                      Mean           7.681731
                        Largest       Std. dev.       2.69884
75%            9             15
90%           11             15       Variance       7.283737
95%           12             15       Skewness       .3739434
99%           15             15       Kurtosis       2.739106

Screening Correlations

Raw scatter plots in the Appendix

Code
```{stata}
corr hh_income_98 total_meal_98 wealth_index non_agri_share
```
(obs=3,088)

             | hh_in~98 total~98 wealth~x non_ag~e
-------------+------------------------------------
hh_income_98 |   1.0000
total_mea~98 |   0.2720   1.0000
wealth_index |   0.1760   0.1355   1.0000
non_agri_s~e |  -0.0338   0.0072   0.2474   1.0000

Wealth Index

Note

Income and Expenditure winsorized to 98%

Code
```{stata}
binscatter hh_income_98 total_meal_98 wealth_index
```
warning: nquantiles(20) was specified, but only 12 were generated. see help file under nquantiles() for explanation.

Salary

Code
```{stata}
cap label define _salaried_work 0 "No Salaried Worker" 1 "Salaried Worker"
label values salaried_work _salaried_work

graph box hh_income_98 total_meal_98, over(salaried_work) legend(order(1 "Income" 2 "Expenditure") rows(1) pos(6)) title()
```

Appendix

Income and Consumption Figures

Code
```{stata}
graph display desc_income
```

Code
```{stata}
graph display desc_consum
```

Wealth Index and Objective Measures

Code
```{stata}
twoway (scatter hh_income_98 wealth_index) (lfit hh_income_98 wealth_index), ///
    legend(off) ytitle("Reported Income (1 month)")
```

Code
```{stata}
twoway (scatter total_meal_98 wealth_index) (lfit total_meal_98 wealth_index), ///
    legend(off) ytitle("Total Expenditure (1wk)")
```

Agriculure

Agriculture don’t really seem correlated

Code
```{stata}
binscatter hh_income_98 total_meal_98 non_agri_share
```

Code
```{stata}
twoway (scatter hh_income_98 non_agri_share) (scatter total_meal_98 non_agri_share), legend(pos(6))
```

Number of Eliglible Households

Code
```{stata}
tab eligible
```

  Household |
     has at |
  least one |
   eligible |
    members |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        842       19.44       19.44
          1 |      3,489       80.56      100.00
------------+-----------------------------------
      Total |      4,331      100.00

Our target sample is ~2800 so we can cut about 20% of all eligible HHs

Descriptives of Screening Criteria

Some possible screening criteria to consider:

Wealth / SES:

  • Subjective ability to cope with lean season
  • Income
  • Expenditure
  • Salaried Worker

Others:

  • Household Size
  • Primarily Agricultural
Important

The descriptives stats below only include eligible households

Code
```{stata}
graph display desc_agri
```

Code
```{stata}
graph display desc_agri_share
```

Code
```{stata}
label var hh_income "Household Income (1 month)"
su hh_income, d
```

                 Household Income (1 month)
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%           80              0       Obs               3,485
25%          200              0       Sum of wgt.       3,485

50%          500                      Mean           1003.955
                        Largest       Std. dev.       1680.19
75%         1000          20000
90%         2250          25000       Variance        2823038
95%         3500          25200       Skewness        6.30252
99%         7600          30000       Kurtosis       70.65732
Code
```{stata}
label var total_meal "Total Expenses (1 wk)"
su total_meal, d
```

                    Total Expenses (1 wk)
-------------------------------------------------------------
      Percentiles      Smallest
 1%           20              0
 5%           50              0
10%           80              0       Obs               3,489
25%          140              0       Sum of wgt.       3,489

50%          250                      Mean           449.1192
                        Largest       Std. dev.      692.7614
75%          500           7000
90%         1000           7500       Variance       479918.4
95%         1500           9500       Skewness       9.880501
99%         3000          20000       Kurtosis       208.5001
Note

Figure available in appendix

Code
```{stata}
label var hh_cope_lean "How difficult is it to cope w/ lean season"
tab hh_cope_lean
```

 How difficult is it |
     to cope w/ lean |
              season |      Freq.     Percent        Cum.
---------------------+-----------------------------------
Not difficult at all |         85        2.44        2.44
  Slightly difficult |        319        9.14       11.58
Moderately difficult |        561       16.08       27.66
      Very difficult |      2,001       57.35       85.01
 Extremely difficult |        523       14.99      100.00
---------------------+-----------------------------------
               Total |      3,489      100.00
Code
```{stata}
su roster_size, d
```

               Number of surveyed individuals
-------------------------------------------------------------
      Percentiles      Smallest
 1%            3              1
 5%            4              1
10%            4              2       Obs               3,489
25%            6              2       Sum of wgt.       3,489

50%            8                      Mean            7.72485
                        Largest       Std. dev.      2.706699
75%           10             15
90%           11             15       Variance        7.32622
95%           12             15       Skewness       .3577694
99%           15             15       Kurtosis       2.732372

Screening Correlations

Raw scatter plots in the Appendix

Code
```{stata}
corr hh_income_98 total_meal_98 wealth_index non_agri_share
```
(obs=3,013)

             | hh_in~98 total~98 wealth~x non_ag~e
-------------+------------------------------------
hh_income_98 |   1.0000
total_mea~98 |   0.2618   1.0000
wealth_index |   0.1734   0.1377   1.0000
non_agri_s~e |  -0.0323   0.0182   0.2413   1.0000

Wealth Index

Note

Income and Expenditure winsorized to 98%

Code
```{stata}
binscatter hh_income_98 total_meal_98 wealth_index
```
warning: nquantiles(20) was specified, but only 12 were generated. see help file under nquantiles() for explanation.

Salary

Code
```{stata}
cap label define _salaried_work 0 "No Salaried Worker" 1 "Salaried Worker"
label values salaried_work _salaried_work

graph box hh_income_98 total_meal_98, over(salaried_work) legend(order(1 "Income" 2 "Expenditure") rows(1) pos(6)) title()
```

Appendix

Income and Consumption Figures

Code
```{stata}
graph display desc_income
```

Code
```{stata}
graph display desc_consum
```

Wealth Index and Objective Measures

Code
```{stata}
twoway (scatter hh_income_98 wealth_index) (lfit hh_income_98 wealth_index), ///
    legend(off) ytitle("Reported Income (1 month)")
```

Code
```{stata}
twoway (scatter total_meal_98 wealth_index) (lfit total_meal_98 wealth_index), ///
    legend(off) ytitle("Total Expenditure (1wk)")
```

Agriculure

Agriculture don’t really seem correlated

Code
```{stata}
binscatter hh_income_98 total_meal_98 non_agri_share
```

Code
```{stata}
twoway (scatter hh_income_98 non_agri_share) (scatter total_meal_98 non_agri_share), legend(pos(6))
```