Documenting draw data, in particular for the unpredictable households
This document explores the issue of missing draw data and more broadly, inconsistencies that exist in the pickup and dropoff data. To do, I look at discrepancies between the pickup and dropoff data. I then use survey data to investigate these cases and as an additional check on the pickup / dropoff data. The survey data has self-reported study payments which can be used to verify whether pickups occurred (and implicitly, dropoffs). Finally we have info dissemination data, which is basically when the participants got told about how the study will work.
Below, I explain some of the nuances of the datasets in case it comes up in the future.
Data Paths
The document will refer to raw (input) data which is what IPA shared with us and generated (output) data which is the result of our cleaning and processing.
All the data lives in this folder: Dropbox/consumption smoothing/09. Main study/09. Data. The generated data referred here are found here: Dropbox/consumption smoothing/09. Main study/09. Data/20. Generated/tidy (which contains the first pass of data cleaning) and Dropbox/consumption smoothing/09. Main study/09. Data/20. Generated/simon_analysis which creates analysis-ready datasets. See Data Catalogue for more info if needed.
Pickup and Dropoff Data
The raw pickup and dropoff data is found in the 07. Drops_and_pickups raw data folder. These datasets are pre-cleaned by IPA before they share them with us. Due to the inconsistencies described below, I (Simon) requested IPA share the raw, unclean data as well. That can be found in the 16. Raw folder. In the end, I was not able to recover new observations and thus, I did not integrate the uncleaned raw data in the data cleaning used to the generate the final analysis datasets.
However, I had written the code to do so; it is found in 00_1_prep_dropoff_pickup.do. Additionally, there is a README in that folder with a bit more detail.
Info Dissemination Data
One source of issues in the data is that in period 1, dropoffs were conducted during info dissemintation. Info Dissemintaiton refers to the initial process where the study work and payment where explaining to participants and they gave consent. After the training, the first dropoff was conducted. For a few households, the information about their period 1 draws is only available in the raw info dissemination data and not dropoff data. As part of generating the analysis-ready datasets, I integrate dropoff information from the info dissemination data for period 1.
Household Survey Data
The household survey data include self-reported study bag payments, payment counts (i.e. number of pickup visits), and hours spent making bags. These variables are not direct records of field pickup or dropoff activity, but they can help catch inconsistencies.
Inconsistencies
In the document below, I look at three main cases: dropoffs but no pickup, pickups but no dropoffs, and cases where pickup or dropoff data does not agree with self-reported survey data.
These could represent cases where the household simply decided not to make bags after receving material. To see how common this is, we will use self-reported survey data.
```{python}drop_no_pickup_total = len(drop_no_pickup_cases)drop_no_pickup_no_payment = int(drop_no_pickup_cases["no_payment"].sum())drop_no_pickup_no_survey = int(drop_no_pickup_cases["no_survey"].sum())drop_no_pickup_has_issue = int(drop_no_pickup_cases["has_issue"].sum())Markdown( f"There are {drop_no_pickup_total:,} cases with a completed dropoff but no " f"completed pickup. In {drop_no_pickup_no_payment:,} of these, participants " "report no payment, which is consistent with not making any bags. In " f"{drop_no_pickup_no_survey:,} cases, the participant did not complete the " "survey, so the survey cannot confirm whether bag work happened. The " f"remaining {drop_no_pickup_has_issue:,} cases have reported income from " "making bags but no entry in the pickup data.")```
There are 42 cases with a completed dropoff but no completed pickup. In 30 of these, participants report no payment, which is consistent with not making any bags. In 7 cases, the participant did not complete the survey, so the survey cannot confirm whether bag work happened. The remaining 12 cases have reported income from making bags but no entry in the pickup data.
List of Cases
Code
```{python}# Sort by arm hh_id perioddrop_no_pickup_cases_display = ( drop_no_pickup_cases[ ["hh_id", "arm", "period", "no_payment", "no_survey", "has_issue"] ] .copy() .rename( columns={ "hh_id": "HH ID", "arm": "Arm", "period": "Period", "no_survey": "No Survey", "no_payment": "No Payment", "has_issue": "Should have pickup", } ))# Format HH ID as stringdrop_no_pickup_cases_display.style.hide(axis="index").format( {"Period": fmt_int, "HH ID": lambda x: "" if pd.isna(x) else str(x)})```
HH ID
Arm
Period
No Payment
No Survey
Should have pickup
2054576447
Stable
1
True
True
False
3069749544
Stable
1
True
False
False
3071732445
Stable
1
True
True
False
3080258847
Stable
1
True
True
False
1007994076
Predictable
1
True
False
False
1009141269
Predictable
1
True
False
False
1011611433
Predictable
1
True
True
False
1014599930
Predictable
1
True
False
False
2041595061
Predictable
1
True
False
False
3068666023
Predictable
1
True
False
False
4091425970
Predictable
1
True
False
False
4091956538
Predictable
1
True
False
False
1009160224
Risky
1
True
False
False
1010895220
Risky
1
True
False
False
1011693364
Risky
1
True
False
False
2036100744
Risky
1
True
False
False
2044369144
Risky
1
True
True
False
3081390327
Risky
1
True
False
False
3083205127
Risky
1
True
False
False
3083437374
Risky
1
True
False
False
3083571578
Risky
1
True
False
False
4091210810
Risky
1
True
False
False
4091281998
Risky
1
True
False
False
4110309159
Risky
1
True
False
False
2042136116
Risky
4
True
True
False
4107720453
Risky
4
True
True
False
4109481438
Risky
4
True
False
False
4113394600
Risky
5
True
False
False
3076152427
Risky
6
True
False
False
4118188574
Risky
6
True
False
False
3062140657
Stable
1
False
False
True
2047288837
Predictable
3
False
False
True
3062868753
Risky
1
False
False
True
3069697866
Risky
1
False
False
True
4117959465
Risky
1
False
False
True
1008160831
Risky
2
False
False
True
4109481438
Risky
2
False
False
True
3073779373
Risky
4
False
False
True
4109173646
Risky
4
False
False
True
4109181488
Risky
4
False
False
True
4109530874
Risky
4
False
False
True
4109891684
Risky
4
False
False
True
Type 2: Pickup completed but no dropoff completed
These cases are more concerning, particularly in the risky arm. In such cases, we are truly missing data on the draw, although self-reported survey payment data can often help infer what happened.
Code
```{python}pickup_no_dropoff_cases = inconsistent_hh_periods[ inconsistent_hh_periods["pickup_no_dropoff"] == 1].copy()pickup_no_dropoff_cases["has_survey_payment"] = ( (pickup_no_dropoff_cases["study_bag_pay_total"].fillna(0) > 0) & (pickup_no_dropoff_cases["study_bag_pay_count"].fillna(0) > 0))bad_cases_pick_no_drop = pickup_no_dropoff_cases.copy()pickup_no_dropoff_total = len(pickup_no_dropoff_cases)pickup_no_dropoff_period_1 = int((pickup_no_dropoff_cases["period"] == 1).sum())pickup_no_dropoff_risky = int((pickup_no_dropoff_cases["arm"] == "Risky").sum())pickup_no_dropoff_with_payment = int( pickup_no_dropoff_cases["has_survey_payment"].sum())Markdown( f"There are {pickup_no_dropoff_total:,} cases with a completed pickup but no " f"completed dropoff. {pickup_no_dropoff_period_1:,} happen in period 1, " "where some issues may be related to the info dissemination process. " f"{pickup_no_dropoff_risky:,} are in the risky arm. In " f"{pickup_no_dropoff_with_payment:,} cases, the household has positive " "self-reported study payment data, which can be used alongside pickup data " "to investigate the missing draw.")```
There are 11 cases with a completed pickup but no completed dropoff. 1 happen in period 1, where some issues may be related to the info dissemination process. 4 are in the risky arm. In 8 cases, the household has positive self-reported study payment data, which can be used alongside pickup data to investigate the missing draw.
These are cases where the cleaned pickup/dropoff records and the household survey payment reports disagree. In particular, we look at cases where people report a study payment but we don’t have a pickup / dropoff on record or vice versa.
Code
```{python}type3_cases = panel[panel["survey_admin_issue"]].copy()type3_cases["mismatch_direction"] = np.select( [ (~type3_cases["no_admin_record"]) & (type3_cases["no_survey_payment"]), (type3_cases["no_admin_record"]) & (~type3_cases["no_survey_payment"]), ], [ "Admin record but no survey payment", "Survey payment but no admin record", ], default="Other",)type3_total = len(type3_cases)type3_admin_no_payment = int( ( (~type3_cases["no_admin_record"]) & (type3_cases["no_survey_payment"]) ).sum())type3_payment_no_admin = int( ( (type3_cases["no_admin_record"]) & (~type3_cases["no_survey_payment"]) ).sum())Markdown( f"There are {type3_total:,} household-periods where the survey and " "pickup/dropoff records disagree. In " f"{type3_admin_no_payment:,} cases, the administrative data show a pickup " "or dropoff record but the survey reports no study-bag payment. In " f"{type3_payment_no_admin:,} cases, the survey reports a study-bag payment " "but the administrative data have neither a pickup nor a dropoff record.")```
There are 260 household-periods where the survey and pickup/dropoff records disagree. In 204 cases, the administrative data show a pickup or dropoff record but the survey reports no study-bag payment. In 56 cases, the survey reports a study-bag payment but the administrative data have neither a pickup nor a dropoff record.
As of May 5, 2026, Simon is still investigating these cases. Any further update will be provided here. In all likelihood, we will not be able to perfectly resolve these; additionally, the majority happened in Period 3 and 4 where the study pause likely caused issues.
I was not able to recover any new observations, especially to address the inconsistencies. See the above section describing pickup and dropoff data for more discussion or where to investigate further if desired.