Bag Quality

Published

February 11, 2026

Abstract

Quick overview of bag quality data to see if there are any patterns in bag quality over time or by treatment group.

Simple Summary Stats

Below is a table of summary stats for the people for whom we collect data. I have omitted 20 risky observations for which we do not have the original pickup and dropoff data; since we cannot infer their draw, I chose to skip them but will add them back in future discussions.

By Treatment:
|    |   Predictable |   Risky |   Stable |
|:---|--------------:|--------:|---------:|
| N  |           326 |     841 |      335 |


By Draw:
|    |   H |   L |   M |
|:---|----:|----:|----:|
| N  | 768 | 719 | 335 |


By Period:
|    |    2 |    5 |
|:---|-----:|-----:|
| N  | 1290 | 1029 |

Looking at the Measures and Their Correlations

We collect 5 individual measures of bag quality and an overall rating. These measures are highly correlated; below is a plot of their distributions and a correlation matrix. Note: I exclude one of the quality measures, which was a simple yes or no where only 5 people say no.

Correlation Matrix of Quality Measures
	Stitch Spacing	Stitch Tension	Firmness	Handle Firmness	Overall Rating	Quality Average
Stitch Spacing	1.00	0.80	0.71	0.66	0.82	0.91
Stitch Tension	0.80	1.00	0.72	0.64	0.80	0.90
Firmness	0.71	0.72	1.00	0.64	0.80	0.87
Handle Firmness	0.66	0.64	0.64	1.00	0.77	0.83
Overall Rating	0.82	0.80	0.80	0.77	1.00	0.91
Quality Average	0.91	0.90	0.87	0.83	0.91	1.00

Width and Length Measurements

We also collect width and length measurements. They measure both left/right and top/bottom. The graphs below show the differences between these measurements and deviations from the expected bag dimensions. I flagged deviations above 1 inch as incorrect.

Note: I made manual corrections in 2 cases. One seemed to have been reported in inches. In the other, the bottom measurement was mis-entered as 0.9.

Variations

Below is a regression table showing the effect of draw, treatment, and period on bag quality scores. Almost none have any impact. What does make a big difference are the enumerators. Columns 1-3 regress arm, draw, and period on quality score. Columns 4-6 include enumerator fixed effects. Columns 7-9 add the number of bags produced so far.


	Dependent variable: bq_overall_rating

	(1)	(2)

C(draw)[T.L]	0.081
	(0.054)
C(draw)[T.M]	-0.157
	(0.483)
C(period)[T.5]		0.002
		(0.034)
Intercept	3.959^***	3.999^***
	(0.483)	(0.483)

Observations	2319	2319
R²	0.748	0.747
Adjusted R²	0.283	0.281
Residual Std. Error	0.682 (df=816)	0.683 (df=816)
F Statistic	1.609^*** (df=1502; 816)	1.603^*** (df=1502; 816)

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Summary by Enumerator

enum_id	230854474	240317188	241265651	250246600
N	381.00	582.00	718.00	638.00
Mean	3.98	3.81	4.14	3.33
Std	0.54	0.87	0.39	0.98
Min	2.00	1.00	2.00	1.00
P25%	4.00	4.00	4.00	3.00
Median	4.00	4.00	4.00	3.00
P75%	4.00	4.00	4.00	4.00
Max	5.00	5.00	5.00	5.00

Dimension Deviation Analysis

Looking at the sum of length and width deviation from the expected bag size, there doesn’t seem to be a notable pattern. Period 5’s coefficient is significant in one regression but in the opposite direction of what we’d expect, suggesting bags are getting worse.


	Dependent variable: bq_sum_deviation

	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)

C(draw)[T.L]		-0.123			-0.033			-0.050
		(0.121)			(0.114)			(0.114)
C(draw)[T.M]		-0.122			-0.056			-0.061
		(0.140)			(0.132)			(0.131)
C(period)[T.5]			-0.130			0.270^***			0.327
			(0.107)			(0.104)			(0.202)
C(treatment)[T.Risky]	-0.154			-0.116			-0.120
	(0.135)			(0.127)			(0.127)
C(treatment)[T.Stable]	-0.175			-0.125			-0.124
	(0.160)			(0.151)			(0.151)
Intercept	-0.334^***	-0.387^***	-0.402^***	-1.372^***	-1.438^***	-1.582^***	-1.507^***	-1.568^***	-1.564^***
	(0.115)	(0.084)	(0.071)	(0.155)	(0.138)	(0.132)	(0.168)	(0.152)	(0.142)
Bags Made So Far							0.007^**	0.007^**	-0.002
							(0.003)	(0.003)	(0.007)
Enumerator FE	No	No	No	Yes	Yes	Yes	Yes	Yes	Yes

Observations	2319	2319	2319	2319	2319	2319	2319	2319	2319
R²	0.001	0.001	0.001	0.117	0.117	0.119	0.118	0.118	0.119
Adjusted R²	-0.000	-0.000	0.000	0.115	0.115	0.117	0.116	0.116	0.117
Residual Std. Error	2.562 (df=2316)	2.563 (df=2316)	2.562 (df=2317)	2.410 (df=2313)	2.411 (df=2313)	2.407 (df=2314)	2.409 (df=2312)	2.409 (df=2312)	2.407 (df=2313)
F Statistic	0.770 (df=2; 2316)	0.637 (df=2; 2316)	1.464 (df=1; 2317)	61.193^*** (df=5; 2313)	61.023^*** (df=5; 2313)	78.140^*** (df=4; 2314)	51.763^*** (df=6; 2312)	51.631^*** (df=6; 2312)	62.510^*** (df=5; 2313)

Note:	^p<0.1; ^p<0.05; ^**p<0.01

There also doesn’t seem to be much correlation between the sum of measurement deviations and the quality score:

Correlation: -0.074

Correlation Within Households

Let’s see how correlated the scores are within households. This tells us whether bag quality is consistent within households or whether there is significant variation worth collecting more data for.

Distribution of Rating Differences (Period 5 - Period 2)

count    817.000000
mean       0.002448
std        0.966342
min       -3.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        3.000000
Name: diff, dtype: float64