Bag Quality

Published

February 11, 2026

Abstract

Quick overview of bag quality data to see if there are any patterns in bag quality over time or by treatment group.

Simple Summary Stats

Below is a table of summary stats for the people for whom we collect data. I have omitted 20 risky observations for which we do not have the original pickup and dropoff data; since we cannot infer their draw, I chose to skip them but will add them back in future discussions.

By Treatment:
|    |   Predictable |   Risky |   Stable |
|:---|--------------:|--------:|---------:|
| N  |           326 |     841 |      335 |

By Draw:
|    |   H |   L |   M |
|:---|----:|----:|----:|
| N  | 768 | 719 | 335 |

By Period:
|    |    2 |    5 |
|:---|-----:|-----:|
| N  | 1290 | 1029 |

Looking at the Measures and Their Correlations

We collect 5 individual measures of bag quality and an overall rating. These measures are highly correlated; below is a plot of their distributions and a correlation matrix. Note: I exclude one of the quality measures, which was a simple yes or no where only 5 people say no.

Distribution of Quality Measures

Distribution of Quality Measures
Correlation Matrix of Quality Measures
Stitch Spacing Stitch Tension Firmness Handle Firmness Overall Rating Quality Average
Stitch Spacing 1.00 0.80 0.71 0.66 0.82 0.91
Stitch Tension 0.80 1.00 0.72 0.64 0.80 0.90
Firmness 0.71 0.72 1.00 0.64 0.80 0.87
Handle Firmness 0.66 0.64 0.64 1.00 0.77 0.83
Overall Rating 0.82 0.80 0.80 0.77 1.00 0.91
Quality Average 0.91 0.90 0.87 0.83 0.91 1.00

Width and Length Measurements

We also collect width and length measurements. They measure both left/right and top/bottom. The graphs below show the differences between these measurements and deviations from the expected bag dimensions. I flagged deviations above 1 inch as incorrect.

Note: I made manual corrections in 2 cases. One seemed to have been reported in inches. In the other, the bottom measurement was mis-entered as 0.9.

Distribution of Dimension Measurements

Distribution of Dimension Measurements

Variations

Below is a regression table showing the effect of draw, treatment, and period on bag quality scores. Almost none have any impact. What does make a big difference are the enumerators. Columns 1-3 regress arm, draw, and period on quality score. Columns 4-6 include enumerator fixed effects. Columns 7-9 add the number of bags produced so far.

Dependent variable: bq_overall_rating
(1)(2)
C(draw)[T.L]0.081
(0.054)
C(draw)[T.M]-0.157
(0.483)
C(period)[T.5]0.002
(0.034)
Intercept3.959***3.999***
(0.483)(0.483)
Observations23192319
R20.7480.747
Adjusted R20.2830.281
Residual Std. Error0.682 (df=816)0.683 (df=816)
F Statistic1.609*** (df=1502; 816)1.603*** (df=1502; 816)
Note:*p<0.1; **p<0.05; ***p<0.01

Summary by Enumerator

enum_id 230854474 240317188 241265651 250246600
N 381.00 582.00 718.00 638.00
Mean 3.98 3.81 4.14 3.33
Std 0.54 0.87 0.39 0.98
Min 2.00 1.00 2.00 1.00
P25% 4.00 4.00 4.00 3.00
Median 4.00 4.00 4.00 3.00
P75% 4.00 4.00 4.00 4.00
Max 5.00 5.00 5.00 5.00

Dimension Deviation Analysis

Looking at the sum of length and width deviation from the expected bag size, there doesn’t seem to be a notable pattern. Period 5’s coefficient is significant in one regression but in the opposite direction of what we’d expect, suggesting bags are getting worse.

Dependent variable: bq_sum_deviation
(1)(2)(3)(4)(5)(6)(7)(8)(9)
C(draw)[T.L]-0.123-0.033-0.050
(0.121)(0.114)(0.114)
C(draw)[T.M]-0.122-0.056-0.061
(0.140)(0.132)(0.131)
C(period)[T.5]-0.1300.270***0.327
(0.107)(0.104)(0.202)
C(treatment)[T.Risky]-0.154-0.116-0.120
(0.135)(0.127)(0.127)
C(treatment)[T.Stable]-0.175-0.125-0.124
(0.160)(0.151)(0.151)
Intercept-0.334***-0.387***-0.402***-1.372***-1.438***-1.582***-1.507***-1.568***-1.564***
(0.115)(0.084)(0.071)(0.155)(0.138)(0.132)(0.168)(0.152)(0.142)
Bags Made So Far0.007**0.007**-0.002
(0.003)(0.003)(0.007)
Enumerator FENoNoNoYesYesYesYesYesYes
Observations231923192319231923192319231923192319
R20.0010.0010.0010.1170.1170.1190.1180.1180.119
Adjusted R2-0.000-0.0000.0000.1150.1150.1170.1160.1160.117
Residual Std. Error2.562 (df=2316)2.563 (df=2316)2.562 (df=2317)2.410 (df=2313)2.411 (df=2313)2.407 (df=2314)2.409 (df=2312)2.409 (df=2312)2.407 (df=2313)
F Statistic0.770 (df=2; 2316)0.637 (df=2; 2316)1.464 (df=1; 2317)61.193*** (df=5; 2313)61.023*** (df=5; 2313)78.140*** (df=4; 2314)51.763*** (df=6; 2312)51.631*** (df=6; 2312)62.510*** (df=5; 2313)
Note:*p<0.1; **p<0.05; ***p<0.01

There also doesn’t seem to be much correlation between the sum of measurement deviations and the quality score:

Correlation: -0.074

Correlation Within Households

Let’s see how correlated the scores are within households. This tells us whether bag quality is consistent within households or whether there is significant variation worth collecting more data for.

Distribution of Rating Differences (Period 5 - Period 2)

Distribution of Rating Differences (Period 5 - Period 2)
count    817.000000
mean       0.002448
std        0.966342
min       -3.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        3.000000
Name: diff, dtype: float64