Lumpiness of Food Purchases

Published

April 6, 2026

Abstract

Lumpiness in household food purchases

Within-Household Dispersion

Three measures of within-household volatility—range, CV, and IQR—computed across periods for food purchases and consumption. All arms overlaid for direct comparison.

Assuming that predictable and risky households are more likely to smooth through lumpy purchases, we would expect to see higher dispersion over time in those arms than control and stable. The graphs below indicate that it doesn’t seem like that is through.

Important

Additionally, if the smoothing hypothesis was true, we would also expect purchases to exhibit much more lumpiness, and thus, dispersion than the consumption data. However, in practice purchase is much more smoother than consumption.

Code

```{python}
#| label: fig-range
#| fig-cap: Within-household range (max − min) by treatment arm.

dispersion_plot(hh_df, "purchase_range", "consumption_range", "Range (Max − Min)")
```

Figure 1: Within-household range (max − min) by treatment arm.

Code

```{python}
#| label: fig-cv
#| fig-cap: Within-household coefficient of variation by treatment arm.

dispersion_plot(hh_df, "purchase_cv", "consumption_cv", "Coefficient of Variation (SD / Mean)")
```

Figure 2: Within-household coefficient of variation by treatment arm.

Code

```{python}
#| label: fig-iqr
#| fig-cap: Within-household IQR by treatment arm.

dispersion_plot(hh_df, "purchase_iqr", "consumption_iqr", "IQR (Q75 − Q25)")
```

Figure 3: Within-household IQR by treatment arm.

Simulated Benchmarks

However, it is difficult to see if the above graphs are just not showing differences well or if the smoothing is truly uniform. To calibrate our expectations, we simulate some scenarios with the same number of households and periods as our data but with different degrees of “lumpiness” in their food purchases:

Smooth (no lumpiness): per-period food purchases are draw from a normal distribution with mean and SD matching the control arm’s empirical distribution (truncated at 0 if values are randomly drawn to be negative)
Lumpy (aggressive): half of each household’s periods are drawn from a distribution with mean 1.8x the control arm’s empirical mean and SD, with the other half being drawn from a normal with 0.2x the control arm’s mean. Thus, the overall mean should be same but with a forced lumpiness. The simulation is itended to match purchasing a lot of food during high draws (thus the 1.8x multiplier) and very little during low draws (0.2x multiplier) while maintaining average purchase over the study to be the same.
Lumpy (mild): same as above but with 1.3x and 0.7x the control arm’s empirical mean for the high and low draws, respectively, to simulate a milder lumpiness

From looking at the figures, it seems that there is almost no lumpiness in the observed food purchase data for most households. However, it does seem like some households might be experiencing lumpiness as

Code

```{python}
#| label: fig-sim-range-agg
#| fig-cap: 'Range: observed vs. aggressive simulated benchmarks.'

def sim_density(df, x_col, x_label):
    return (
        ggplot(df.dropna(subset=[x_col]), aes(x=x_col, color="scenario"))
        + stat_ecdf()
        + labs(x=x_label, y="Cumulative Probability", color=None)
        + theme_minimal()
        + theme(legend_position="top", figure_size=(8, 4))
    )

sim_density(sim_compare_aggressive, "range", "Range (Max − Min)")
```

Figure 4: Range: observed vs. aggressive simulated benchmarks.

Code

```{python}
#| label: fig-sim-cv-agg
#| fig-cap: 'CV: observed vs. aggressive simulated benchmarks.'

sim_density(sim_compare_aggressive, "cv", "Coefficient of Variation")
```

Figure 5: CV: observed vs. aggressive simulated benchmarks.

Code

```{python}
#| label: fig-sim-iqr-agg
#| fig-cap: 'IQR: observed vs. aggressive simulated benchmarks.'

sim_density(sim_compare_aggressive, "iqr", "IQR (Q75 − Q25)")
```

Figure 6: IQR: observed vs. aggressive simulated benchmarks.

Treatment effect on non-staple categories

It is difficult to see treatment effects on extensive margin of consumption / purchases on non-staple food categories (meat, dairy, vegetables). The rates of consumption are already fairly high

However, it does seem that the treatments do result in more vegetable purchases and consumption.

Code

```{python}
#| label: fig-extensive-share
#| fig-cap: Share of HH-periods with any purchase by arm and category.

desc_rows = []
for cat in cats:
    for arm_code, arm_label in TREATMENT_LABELS.items():
        subset = reg_df[reg_df["treatment"] == arm_code]
        share = subset[f"any_purchase_{cat}"].mean()
        desc_rows.append({"Category": cat.title(), "Arm": arm_label, "Share": share})

desc_df = pd.DataFrame(desc_rows)
desc_df["Arm"] = pd.Categorical(desc_df["Arm"], categories=ARM_ORDER, ordered=True)

(
    ggplot(desc_df, aes(x="Arm", y="Share", fill="Arm"))
    + geom_col()
    + facet_wrap("Category", nrow=1)
    + labs(x=None, y="Share with Any Purchase")
    + theme_minimal()
    + theme(legend_position="none", figure_size=(10, 4),
            axis_text_x=element_text(rotation=30, ha="right"))
)
```

Code

```{python}
#| label: tbl-treatment-effects-intensive
#| output: asis

fitted = []
for cat in cats:
    m = smf.ols(
        f"food_purchase_{cat}_99 ~ C(treatment, Treatment(reference=0)) + C(period_str)",
        data=reg_df,
    ).fit(cov_type="HC3")
    fitted.append(m)

fitted_c = []
for cat in cats:
    m = smf.ols(
        f"food_consumption_{cat}_99 ~ C(treatment, Treatment(reference=0)) + C(period_str)",
        data=reg_df,
    ).fit(cov_type="HC3")
    fitted_c.append(m)

n_pos_purchase = [int((reg_df[f"food_purchase_{cat}_99"] == 1).sum()) for cat in cats]
n_pos_consumption = [
    int((reg_df[f"food_consumption_{cat}_99"] == 1).sum()) for cat in cats
]

sg_intensive = Stargazer(fitted + fitted_c)
sg_intensive.title("Amount purchase / consumption — Treatment Effects")
sg_intensive.custom_columns(
    [
        "Meat - Purchase",
        "Dairy - Purchase",
        "Vegetables - Purchase",
        "Meat - Cons.",
        "Dairy - Cons.",
        "Vegetables - Cons.",
    ],
    [1, 1, 1, 1, 1, 1],
)
sg_intensive.covariate_order(
    [
        "C(treatment, Treatment(reference=0))[T.1]",
        "C(treatment, Treatment(reference=0))[T.2]",
        "C(treatment, Treatment(reference=0))[T.3]",
    ]
)
sg_intensive.rename_covariates(
    {
        "C(treatment, Treatment(reference=0))[T.1]": "Stable",
        "C(treatment, Treatment(reference=0))[T.2]": "Predictable",
        "C(treatment, Treatment(reference=0))[T.3]": "Risky",
    }
)
sg_intensive.add_line("Period FEs", ["Yes"] * 6, LineLocation.FOOTER_BOTTOM)
sg_intensive.show_degrees_of_freedom(False)
sg_intensive
```

Table 1

Amount purchase / consumption — Treatment Effects



	Meat - Purchase	Dairy - Purchase	Vegetables - Purchase	Meat - Cons.	Dairy - Cons.	Vegetables - Cons.
	(1)	(2)	(3)	(4)	(5)	(6)

Stable	1.537	-0.042	3.788^***	0.460	0.146	1.956^**
	(1.243)	(0.341)	(0.696)	(1.875)	(0.483)	(0.841)
Predictable	4.300^***	1.636^***	3.944^***	2.983	2.122^***	2.535^***
	(1.308)	(0.374)	(0.697)	(1.895)	(0.512)	(0.843)
Risky	2.188^**	0.262	2.564^***	-0.715	0.787^*	2.134^***
	(1.033)	(0.286)	(0.551)	(1.521)	(0.403)	(0.694)

Observations	14865	14865	14865	14865	14865	14865
R²	0.058	0.028	0.015	0.100	0.025	0.022
Adjusted R²	0.058	0.027	0.014	0.100	0.024	0.022
Residual Std. Error	46.585	12.865	25.303	66.156	17.972	30.611
F Statistic	77.352^***	41.238^***	21.713^***	135.921^***	31.200^***	40.615^***
Period FEs	Yes	Yes	Yes	Yes	Yes	Yes

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Code

```{python}
#| label: tbl-treatment-effects
#| output: asis

fitted = []
for cat in cats:
    m = smf.ols(
        f"any_purchase_{cat} ~ C(treatment, Treatment(reference=0)) + C(period_str)",
        data=reg_df,
    ).fit(cov_type="HC3")
    fitted.append(m)

fitted_c = []
for cat in cats:
    m = smf.ols(
        f"any_consumption_{cat} ~ C(treatment, Treatment(reference=0)) + C(period_str)",
        data=reg_df,
    ).fit(cov_type="HC3")
    fitted_c.append(m)

n_pos_purchase = [int((reg_df[f"any_purchase_{cat}"] == 1).sum()) for cat in cats]
n_pos_consumption = [int((reg_df[f"any_consumption_{cat}"] == 1).sum()) for cat in cats]

sg = Stargazer(fitted + fitted_c)
sg.title("Any Purchase / Consumption (LPM) — Treatment Effects")
sg.custom_columns(
    [
        "Meat - Purchase",
        "Dairy - Purchase",
        "Vegetables - Purchase",
        "Meat - Cons.",
        "Dairy - Cons.",
        "Vegetables - Cons.",
    ],
    [1, 1, 1, 1, 1, 1],
)
sg.covariate_order(
    [
        "C(treatment, Treatment(reference=0))[T.1]",
        "C(treatment, Treatment(reference=0))[T.2]",
        "C(treatment, Treatment(reference=0))[T.3]",
    ]
)
sg.rename_covariates(
    {
        "C(treatment, Treatment(reference=0))[T.1]": "Stable",
        "C(treatment, Treatment(reference=0))[T.2]": "Predictable",
        "C(treatment, Treatment(reference=0))[T.3]": "Risky",
    }
)
sg.add_line("N (Positive)", [str(n) for n in n_pos_purchase + n_pos_consumption], LineLocation.FOOTER_BOTTOM)
sg.add_line("Period FEs", ["Yes"] * 6, LineLocation.FOOTER_BOTTOM)
sg.show_degrees_of_freedom(False)
sg
```

Table 2

Any Purchase / Consumption (LPM) — Treatment Effects



	Meat - Purchase	Dairy - Purchase	Vegetables - Purchase	Meat - Cons.	Dairy - Cons.	Vegetables - Cons.
	(1)	(2)	(3)	(4)	(5)	(6)

Stable	-0.001	-0.006	0.007	-0.006	0.002	0.008^*
	(0.008)	(0.012)	(0.006)	(0.006)	(0.013)	(0.004)
Predictable	0.005	0.046^***	0.013^**	-0.003	0.052^***	0.010^***
	(0.007)	(0.012)	(0.005)	(0.006)	(0.013)	(0.004)
Risky	0.004	0.019^*	0.013^***	-0.002	0.027^**	0.010^***
	(0.006)	(0.010)	(0.005)	(0.005)	(0.011)	(0.003)

Observations	14865	14865	14865	14865	14865	14865
R²	0.009	0.025	0.003	0.013	0.017	0.002
Adjusted R²	0.008	0.024	0.003	0.013	0.017	0.002
Residual Std. Error	0.267	0.448	0.188	0.223	0.483	0.126
F Statistic	11.089^***	43.452^***	5.462^***	12.684^***	28.895^***	3.585^***
N (Positive)	13710	4294	14315	14077	5782	14623
Period FEs	Yes	Yes	Yes	Yes	Yes	Yes

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Draw Effects

Draws don’t seem to impact purchase and consumption choices within the risky and predictable arms much.

Code

```{python}
#| label: tbl-draw-effects-intensive

draw_df = reg_df[
    reg_df["treatment"].isin([2, 3]) & reg_df["draw"].isin(["H", "M", "L"])
].copy()

fitted_d = []
for cat in cats:
    m = smf.ols(
        f"food_purchase_{cat}_99 ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=draw_df,
    ).fit(cov_type="HC3")
    fitted_d.append(m)

fitted_dc = []
for cat in cats:
    m = smf.ols(
        f"food_consumption_{cat}_99 ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=draw_df,
    ).fit(cov_type="HC3")
    fitted_dc.append(m)

n_pos_draw_purchase = [
    int((draw_df[f"food_purchase_{cat}_99"] == 1).sum()) for cat in cats
]
n_pos_draw_consumption = [
    int((draw_df[f"food_consumption_{cat}_99"] == 1).sum()) for cat in cats
]

sg_draw_intensive = Stargazer(fitted_d + fitted_dc)
sg_draw_intensive.title("Purchase / Consumption — Draw Effects (ref: Low Draw)")
sg_draw_intensive.custom_columns(
    [
        "Meat - Purchase",
        "Dairy - Purchase",
        "Vegetables - Purchase",
        "Meat - Consumption",
        "Dairy - Consumption",
        "Vegetables - Consumption",
    ],
    [1, 1, 1, 1, 1, 1],
)
sg_draw_intensive.covariate_order(
    [
        "C(draw, Treatment(reference='L'))[T.H]",
    ]
)
sg_draw_intensive.rename_covariates(
    {
        "C(draw, Treatment(reference='L'))[T.H]": "High Draw",
    }
)
sg_draw_intensive.add_line("Period FEs", ["Yes"] * 6, LineLocation.FOOTER_BOTTOM)
sg_draw_intensive.show_degrees_of_freedom(False)
sg_draw_intensive
```

Table 3

Purchase / Consumption — Draw Effects (ref: Low Draw)



	Meat - Purchase	Dairy - Purchase	Vegetables - Purchase	Meat - Consumption	Dairy - Consumption	Vegetables - Consumption
	(1)	(2)	(3)	(4)	(5)	(6)

High Draw	-0.683	0.158	-0.286	-1.991	-0.010	0.122
	(0.943)	(0.276)	(0.534)	(1.395)	(0.418)	(0.703)

Observations	7733	7733	7733	7733	7733	7733
R²	0.041	0.015	0.005	0.131	0.028	0.004
Adjusted R²	0.040	0.015	0.004	0.131	0.027	0.003
Residual Std. Error	41.464	12.129	23.459	61.292	18.338	30.903
F Statistic	40.142^***	20.485^***	7.187^***	116.872^***	29.321^***	4.739^***
Period FEs	Yes	Yes	Yes	Yes	Yes	Yes

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Code

```{python}
#| label: tbl-draw-effects

draw_df = reg_df[
    reg_df["treatment"].isin([2, 3]) & reg_df["draw"].isin(["H", "M", "L"])
].copy()

fitted_d = []
for cat in cats:
    m = smf.ols(
        f"any_purchase_{cat} ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=draw_df,
    ).fit(cov_type="HC3")
    fitted_d.append(m)

fitted_dc = []
for cat in cats:
    m = smf.ols(
        f"any_consumption_{cat} ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=draw_df,
    ).fit(cov_type="HC3")
    fitted_dc.append(m)

n_pos_draw_purchase = [int((draw_df[f"any_purchase_{cat}"] == 1).sum()) for cat in cats]
n_pos_draw_consumption = [
    int((draw_df[f"any_consumption_{cat}"] == 1).sum()) for cat in cats
]

sg3 = Stargazer(fitted_d + fitted_dc)
sg3.title("Any Purchase / Consumption (LPM) — Draw Effects (ref: Low Draw)")
sg3.custom_columns(
    [
        "Meat - Purchase",
        "Dairy - Purchase",
        "Vegetables - Purchase",
        "Meat - Consumption",
        "Dairy - Consumption",
        "Vegetables - Consumption",
    ],
    [1, 1, 1, 1, 1, 1],
)
sg3.covariate_order(
    [
        "C(draw, Treatment(reference='L'))[T.H]",
    ]
)
sg3.rename_covariates(
    {
        "C(draw, Treatment(reference='L'))[T.H]": "High Draw",
    }
)

sg3.add_line(
    "N (Positive)",
    [str(n) for n in n_pos_draw_purchase + n_pos_draw_consumption],
    LineLocation.FOOTER_BOTTOM,
)
sg3.add_line("Period FEs", ["Yes"] * 6, LineLocation.FOOTER_BOTTOM)
sg3.show_degrees_of_freedom(False)
sg3
```

Table 4

Any Purchase / Consumption (LPM) — Draw Effects (ref: Low Draw)



	Meat - Purchase	Dairy - Purchase	Vegetables - Purchase	Meat - Consumption	Dairy - Consumption	Vegetables - Consumption
	(1)	(2)	(3)	(4)	(5)	(6)

High Draw	0.002	-0.015	0.001	0.002	-0.019^*	0.000
	(0.006)	(0.010)	(0.004)	(0.005)	(0.011)	(0.002)

Observations	7733	7733	7733	7733	7733	7733
R²	0.001	0.016	0.003	0.001	0.020	0.002
Adjusted R²	0.001	0.016	0.002	-0.000	0.019	0.001
Residual Std. Error	0.245	0.448	0.182	0.198	0.485	0.108
F Statistic	1.996^*	21.120^***	4.005^***	0.694	25.553^***	2.971^***
N (Positive)	7236	2211	7467	7418	3099	7641
Period FEs	Yes	Yes	Yes	Yes	Yes	Yes

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Code

```{python}
#| label: tbl-draw-effects-intensive-by-arm

pred_df = reg_df[
    (reg_df["treatment"] == 2) & reg_df["draw"].isin(["H", "M", "L"])
].copy()
risky_df = reg_df[
    (reg_df["treatment"] == 3) & reg_df["draw"].isin(["H", "M", "L"])
].copy()

fitted_pred = []
for cat in cats:
    m = smf.ols(
        f"any_purchase_{cat} ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=pred_df,
    ).fit(cov_type="HC3")
    fitted_pred.append(m)

fitted_risky = []
for cat in cats:
    m = smf.ols(
        f"any_purchase_{cat} ~ C(draw, Treatment(reference='L')) + C(period_str)",
        data=risky_df,
    ).fit(cov_type="HC3")
    fitted_risky.append(m)

sg_by_arm = Stargazer(fitted_pred + fitted_risky)
sg_by_arm.title("Purchase — Effects of high draw by Arm")
sg_by_arm.custom_columns(
    ["Predictable", "Risky"],
    [3, 3],
)
sg_by_arm.custom_columns(
    [
        "Meat - Predictable",
        "Dairy - Predictable",
        "Vegetables - Predictable",
        "Meat - Risky",
        "Dairy - Risky",
        "Vegetables - Risky",
    ],
    [1, 1, 1, 1, 1, 1],
)
sg_by_arm.covariate_order(
    [
        "C(draw, Treatment(reference='L'))[T.H]",
    ]
)
sg_by_arm.rename_covariates(
    {
        "C(draw, Treatment(reference='L'))[T.H]": "High Draw",
    }
)
sg_by_arm.show_degrees_of_freedom(False)
sg_by_arm
```

Table 5

Purchase — Effects of high draw by Arm



	Meat - Predictable	Dairy - Predictable	Vegetables - Predictable	Meat - Risky	Dairy - Risky	Vegetables - Risky
	(1)	(2)	(3)	(4)	(5)	(6)

High Draw	-0.012	0.030	-0.003	0.008	-0.033^***	0.002
	(0.011)	(0.019)	(0.008)	(0.007)	(0.012)	(0.005)

Observations	2246	2246	2246	5487	5487	5487
R²	0.005	0.013	0.004	0.001	0.019	0.002
Adjusted R²	0.002	0.011	0.001	-0.000	0.018	0.001
Residual Std. Error	0.249	0.459	0.184	0.244	0.443	0.181
F Statistic	1.747	4.955^***	2.002^*	0.995	17.751^***	2.430^**

Note:	^p<0.1; ^p<0.05; ^**p<0.01