Labor and Survey Modality

Published

February 23, 2026

Abstract

Set of regressions run on some outputs requested by Nick

1 U-shaped graphs

Code
```{python}
merged = pd.read_stata(f"/Users/st2246/Work/Pilot3/data/generated/main/transform/30_merged_panel-hh_id-period.dta", convert_categoricals=False)

merged_grouped = merged.groupby(["period", "treatment"])[
    "money_work_p_corrected_99"
].mean().reset_index()
merged_grouped["treatment"] = merged_grouped["treatment"].replace({
    0: "Control",
    1: "Stable",
    2: "Predictable",
    3: "Risky"
})

# Make gg
( ggplot(merged_grouped, aes(x = "period", y = "money_work_p_corrected_99", color = "treatment", group = "treatment")) +
    geom_line() +
    labs(
        title = "Work Income (Participant)",
        x = "Period",
        y = "Cedis"
    ) +
    # Covert R snippet to python
    scale_color_manual(
        name="Treatment",
        values={"Control": "#F8766D", "Stable": "#00BA38", "Predictable": "#619CFF", "Risky": "#FF61C3"}
    ) +
    theme_minimal()
)
```

The goal of this analysis is to investigate whether the u-shaped curve in employment-related outcomes is driven by survey modality (phone vs. in-person) or seasonality. The latter is a viable explanation since the baseline is done during the planting season while the endline is close to the harvest season, thus explaining spikes in labor.

2 Planting and Harvest Dates

Below, we use participants’ self-reported planting and harvest dates to confirm the overlap of the planting and harvest seasons with the baseline and endline respectively.

2.1 Planting

Important

The dates are winsorized. The earliest date in the raw data is in 2006 (likely because of accidentally clicking “months” instead of “days”)

Code
```{python}
#| label: fig-planting-ecdf
#| fig-cap: Cumulative distribution of all planting start dates (winsorized)

(
ggplot(planting, aes(x='planting_start_date_w'))
+ stat_ecdf(geom='step', pad=False)
+ geom_vline(data=period_starts, mapping=aes(xintercept='period_start'),
             linetype='dashed', alpha=0.5, color='steelblue')
+ geom_text(data=period_starts, mapping=aes(x='period_start', label='label'),
            y=0.02, size=7, color='steelblue', ha='left', nudge_x=1)
+ scale_x_datetime(date_breaks='1 month', date_labels='%b %Y')
+ labs(x='Planting start date', y='')
+ theme_minimal()
+ theme(axis_text_x=element_text(rotation=45, ha='right'))
)
```
Figure 1: Cumulative distribution of all planting start dates (winsorized)
Code
```{python}
#| label: fig-planting-hh
#| fig-cap: Earliest and latest planting date per household

planting_hh = (planting
    .groupby("hh_id")['planting_start_date_w']
    .agg(["min", "max"])
    .reset_index()
    .rename(columns={"min": "earliest_planting_date", "max": "latest_planting_date"})
)

planting_hh_long = planting_hh.melt(
    id_vars="hh_id",
    value_vars=["earliest_planting_date", "latest_planting_date"],
    var_name="type", value_name="date"
).replace({"earliest_planting_date": "Earliest", "latest_planting_date": "Latest"})

(
ggplot(planting_hh_long, aes(x="date", color="type"))
+ stat_ecdf(geom='step', pad=False)
+ geom_vline(data=period_starts, mapping=aes(xintercept='period_start'),
             linetype='dashed', alpha=0.5, color='steelblue')
+ geom_text(data=period_starts, mapping=aes(x='period_start', label='label'),
            y=0.02, size=7, color='steelblue', ha='left', nudge_x=1)
+ scale_x_datetime(date_breaks='1 month', date_labels='%b %Y')
+ labs(x='Planting date', y='', color='')
+ theme_minimal()
+ theme(axis_text_x=element_text(rotation=45, ha='right'))
)
```
Figure 2: Earliest and latest planting date per household

2.2 Harvest

Important

Harvest dates combine two sources: endline (actual, back-calculated) and phone surveys (expected, forward-calculated). Dates are winsorized to Jun 2025 – Feb 2026.

Code
```{python}
#| label: fig-harvest-ecdf
#| fig-cap: Cumulative distribution of all harvest start dates

(
ggplot(harvest_all, aes(x='harvest_date', color='source'))
+ stat_ecdf(geom='step', pad=False)
+ geom_vline(data=period_starts, mapping=aes(xintercept='period_start'),
             linetype='dashed', alpha=0.5, color='steelblue')
+ geom_text(data=period_starts, mapping=aes(x='period_start', label='label'),
            y=0.02, size=7, color='steelblue', ha='left', nudge_x=1)
+ scale_x_datetime(date_breaks='1 month', date_labels='%b %Y')
+ labs(title='Harvest Start Dates — All Crops Pooled',
       x='Harvest start date', y='')
+ theme_minimal()
+ theme(axis_text_x=element_text(rotation=45, ha='right'))
)
```
Figure 3: Cumulative distribution of all harvest start dates
Code
```{python}
#| label: fig-harvest-hh
#| fig-cap: Earliest and latest harvest date per household

harvest_hh = (hel_begun
    .groupby("hh_id")['harvest_date']
    .agg(["min", "max"])
    .reset_index()
    .rename(columns={"min": "earliest_harvest_date", "max": "latest_harvest_date"})
)

harvest_hh_long = harvest_hh.melt(
    id_vars="hh_id",
    value_vars=["earliest_harvest_date", "latest_harvest_date"],
    var_name="type", value_name="date"
).replace({"earliest_harvest_date": "Earliest", "latest_harvest_date": "Latest"})

(
ggplot(harvest_hh_long, aes(x="date", color="type"))
+ stat_ecdf(geom='step', pad=False)
+ geom_vline(data=period_starts, mapping=aes(xintercept='period_start'),
             linetype='dashed', alpha=0.5, color='steelblue')
+ geom_text(data=period_starts, mapping=aes(x='period_start', label='label'),
            y=0.02, size=7, color='steelblue', ha='left', nudge_x=1)
+ scale_x_datetime(date_breaks='1 month', date_labels='%b %Y')
+ labs(x='Harvest date', y='', color='')
+ theme_minimal()
+ theme(axis_text_x=element_text(rotation=45, ha='right'))
)
```
Figure 4: Earliest and latest harvest date per household

3 Date and Impact on Labor

Given that the overlap is there, we want to disentangle the impact of survey modality from seasonality in explaining the u-shaped curve. To do so, we exploit variation in survey date within each period.

Code
```{python}
period_dta = pd.read_stata(f"{TIDY}/16_period_hh_level-hh_id-period.dta")
# Survey date is in the startdate variable. Generate a days since start variable
period_dta["days_since_start"] = (
    period_dta["startdate"] - period_dta["startdate"].min()
).dt.days

participant_employment = pd.read_stata(
    f"/Users/st2246/Work/Pilot3/data/generated/main/transform/02_flag_double_reports-hh_id-period.dta"
)
employment = pd.read_stata(
    f"{TIDY}/05_Employment-hh_id-period-member_id.dta", convert_categoricals=False
)
adults = pd.read_stata(
    f"{TIDY}/08_inperson_demographics-hh_id-period-member_id.dta",
    convert_categoricals=False,
)
adults = adults[adults["age"] >= 18]
adults = adults[["hh_id", "member_id", "participant_id"]].drop_duplicates()
adults = adults.loc[adults.index.repeat(7)].copy()
adults["period"] = adults.groupby(["hh_id", "member_id"]).cumcount()
emp = adults.merge(
    employment[
        [
            "hh_id",
            "period",
            "member_id",
            "money_work_99",
            "days_worked_",
            "work_engaged_",
        ]
    ],
    on=["hh_id", "period", "member_id"],
    how="left",
)

# Fill in 0s for money_work_99 and days_worked_ work_engaged_ when missing
emp["money_work_99"] = emp["money_work_99"].fillna(0)
emp["days_worked_"] = emp["days_worked_"].fillna(0)
emp["work_engaged_"] = emp["work_engaged_"].fillna(0)


# For rows where member_id == participant_id, replace employment vars with corrected values
emp = emp.merge(
    participant_employment[
        [
            "hh_id",
            "period",
            "money_work_p_corrected_99",
            "days_worked_p_corrected",
            "work_engaged_p_corrected",
        ]
    ],
    on=["hh_id", "period"],
    how="left",
)
is_participant = emp["member_id"] == emp["participant_id"]
emp.loc[is_participant, "money_work_99"] = emp.loc[
    is_participant, "money_work_p_corrected_99"
].values
emp.loc[is_participant, "days_worked_"] = emp.loc[
    is_participant, "days_worked_p_corrected"
].values
emp.loc[is_participant, "work_engaged_"] = emp.loc[
    is_participant, "work_engaged_p_corrected"
].values.astype(int)

# df_allmembers: all members (with participant corrections) merged with period data
df_allmembers = emp.merge(
    period_dta[["hh_id", "period", "startdate", "days_since_start", "enum_id"]],
    left_on=["hh_id", "period"],
    right_on=["hh_id", "period"],
)
df_allmembers["inperson_survey"] = (
    df_allmembers["period"].apply(lambda p: 0 if 1 <= p <= 5 else 1).astype(int)
)

# df_noparticipant: exclude the participant row
df_noparticipant = df_allmembers[
    df_allmembers["member_id"] != df_allmembers["participant_id"]
].copy()

# Merge the period data with participant-level corrected employment for participant-only regressions
df = df_allmembers[df_allmembers["member_id"] == df_allmembers["participant_id"]].copy()
```
/var/folders/rj/c4rjx52d217gsm3ccfx5q20r0000gr/T/ipykernel_62160/2028874538.py:65: RuntimeWarning: invalid value encountered in cast
Code
```{python}
(
    ggplot(df[["hh_id", "period", "startdate"]].drop_duplicates(), aes(x='startdate', fill='factor(period)')) +
    geom_histogram(binwidth=1, position='stack') +
    labs(x='Survey Date', y='Count', fill='Period') +
    theme_minimal()
)
```

Code
```{python}
survey_hist_data = df[["hh_id", "period", "startdate"]].drop_duplicates()

bin_counts = survey_hist_data["startdate"].value_counts().sort_index()
max_count = bin_counts.max()

def make_ecdf_df(series, label):
    s = series.dropna().sort_values()
    n = len(s)
    return pd.DataFrame({"date": s.values, "ecdf": np.arange(1, n + 1) / n, "type": label})

overlay = pd.concat([
    make_ecdf_df(planting["planting_start_date_w"], "Planting"),
    make_ecdf_df(hel_begun["harvest_date"], "Harvest"),
])
overlay["scaled"] = overlay["ecdf"] * max_count

(
    ggplot(survey_hist_data, aes(x='startdate', fill='factor(period)')) +
    geom_histogram(binwidth=1, position='stack') +
    geom_line(data=overlay, mapping=aes(x='date', y='scaled', color='type', linetype='type'), size=0.8, inherit_aes=False) +
    scale_linetype_manual(values={"Planting": "solid", "Harvest": "dashed"}) +
    scale_color_manual(values={"Planting": "black", "Harvest": "black"}) +
    labs(x='Survey Date', y='Count', fill='Period', color='Season', linetype='Season') +
    theme_minimal()
)
```

3.1 Piecewise Linear Regression to Find Optimal Breakpoint

Under the seasonality hypothesis, we would expect the relationship between days since start and employment outcomes to be non-linear. At the beginning, it should be negative as we move further away from the planting season. However, as we approach the harvest season, the relationship should become positive.

Below, we run a series of piecewise linear regressions to find the optimal breakpoint. That is, we look for the date at which the relationship between days since start and employment outcomes changes by finding the date that minimizes the residual sum of squares.

Code
```{python}
import statsmodels.api as sm

def find_best_spline(column: str) -> tuple:
    y = df_allmembers[column].values
    x = df_allmembers['days_since_start'].values
    # Search over candidate breakpoints (exclude extremes)
    candidates = np.arange(x.min(), x.max())
    rss_values = []

    for bp in candidates:
        # Piecewise linear: x, and (x - bp)_+ 
        x_spline = np.maximum(x - bp, 0)
        X = sm.add_constant(np.column_stack([x, x_spline]))
        model = sm.OLS(y, X).fit()
        rss_values.append(model.ssr)
    candidate_dates = [df['startdate'].min() + pd.Timedelta(days=int(bp)) for bp in candidates]
    return candidates, rss_values, candidate_dates 
```
Code
```{python}
candidates, rss_values, candidate_dates = find_best_spline('days_worked_')
rss_values = np.array(rss_values)
best_date = candidate_dates[np.argmin(rss_values)]

days_all = (
    ggplot(pd.DataFrame({'breakpoint': candidate_dates, 'RSS': rss_values}), 
           aes(x='breakpoint', y='RSS')) +
    geom_line() +
    geom_vline(xintercept=best_date, linetype='dashed', color='red') +
    annotate('text', x=best_date, y=rss_values.max() * 0.99, 
             label=f'Best = {best_date.strftime("%Y-%m-%d")}', color='red', ha='left') +
    labs(x='Split dates', y='RSS', title="All household adults") +
    theme_minimal()
)
```
Code
```{python}
candidates_p, rss_values_p, candidate_dates_p = find_best_spline('days_worked_p_corrected')
rss_values_p = np.array(rss_values_p)
best_date_p = candidate_dates_p[np.argmin(rss_values_p)]

days_participant = (
    ggplot(pd.DataFrame({'breakpoint': candidate_dates_p, 'RSS': rss_values_p}), 
           aes(x='breakpoint', y='RSS')) +
    geom_line() +
    geom_vline(xintercept=best_date_p, linetype='dashed', color='red') +
    annotate('text', x=best_date_p, y=rss_values_p.max() * 0.99, 
             label=f'Best = {best_date_p.strftime("%Y-%m-%d")}', color='red', ha='left') +
    labs(x='Split Dates', y='RSS', title="Participant Only") +
    theme_minimal()
)
```
Code
```{python}
days_all / days_participant
```

Code
```{python}
candidates_m, rss_values_m, candidate_dates_m = find_best_spline('money_work_99')
rss_values_m = np.array(rss_values_m)
best_date_m = candidate_dates_m[np.argmin(rss_values_m)]

(
    ggplot(pd.DataFrame({'breakpoint': candidate_dates_m, 'RSS': rss_values_m}), 
           aes(x='breakpoint', y='RSS')) +
    geom_line() +
    geom_vline(xintercept=best_date_m, linetype='dashed', color='red') +
    annotate('text', x=best_date_m, y=rss_values_m.max() * 0.99, 
             label=f'Best = {best_date_m.strftime("%Y-%m-%d")}', color='red', ha='left') +
    labs(x='Candidate Breakpoint (date)', y='RSS') +
    theme_minimal()
)
```

Code
```{python}
candidates_w, rss_values_w, candidate_dates_w = find_best_spline('work_engaged_')
rss_values_w = np.array(rss_values_w)
best_date_w = candidate_dates_w[np.argmin(rss_values_w)]

work_all = (
    ggplot(pd.DataFrame({'breakpoint': candidate_dates_w, 'RSS': rss_values_w}), 
           aes(x='breakpoint', y='RSS')) +
    geom_line() +
    geom_vline(xintercept=best_date_w, linetype='dashed', color='red') +
    annotate('text', x=best_date_w, y=rss_values_w.max() * 0.99, 
             label=f'Best = {best_date_w.strftime("%Y-%m-%d")}', color='red', ha='left') +
    labs(x='Candidate Breakpoint (date)', y='RSS', title="All Household members") +
    theme_minimal()
)
```
Code
```{python}
candidates_wp, rss_values_wp, candidate_dates_wp = find_best_spline('work_engaged_p_corrected')
rss_values_wp = np.array(rss_values_wp)
best_date_wp = candidate_dates_m[np.argmin(rss_values_wp)]

work_p = (
    ggplot(pd.DataFrame({'breakpoint': candidate_dates_wp, 'RSS': rss_values_wp}), 
           aes(x='breakpoint', y='RSS')) +
    geom_line() +
    geom_vline(xintercept=best_date_wp, linetype='dashed', color='red') +
    annotate('text', x=best_date_wp, y=rss_values_wp.max() * 0.99, 
             label=f'Best = {best_date_wp.strftime("%Y-%m-%d")}', color='red', ha='left') +
    labs(x='Candidate Breakpoint (date)', y='RSS', title="Participant Only") +
    theme_minimal()
)
```
Code
```{python}
work_all / work_p
```

It seems early August is the ideal splitting point. Given the smoothness of the curves, I am not too concerned with picking the exact date. For the purposes of the analysis below, I will pick August 1st

Code
```{python}
best_day = candidates[np.argmin(rss_values)] - 8  # Best date is Aug 9, subtract 8 to get Aug 1

df_allmembers["days_after_start"] = df_allmembers["days_since_start"].apply(lambda x: min(x, best_day))
df_allmembers["days_since_aug1"] = df_allmembers["days_since_start"].apply(lambda x: max(0, x - best_day))

df_participant = df_allmembers[df_allmembers["member_id"] == df_allmembers["participant_id"]].copy()
df_noparticipant = df_allmembers[df_allmembers["member_id"] != df_allmembers["participant_id"]].copy()
```

3.2 Regressions

The regressions below suggest that the u-shaped curve is mostly driven by seasonality

Code
```{python}
from stargazer.stargazer import Stargazer

def run_regs(data: pd.DataFrame, dv: str) -> list:
    sub = data[[dv, "days_since_start", "days_after_start", "days_since_aug1",
                "inperson_survey", "enum_id"]].dropna()
    y = sub[dv]

    # TODO I'd spec where we interact days_since_aug1 and days_after_start with in-person survey. 
    specs = [
        ["days_since_start"],
        ["days_after_start", "days_since_aug1"],
        ["days_since_start", "inperson_survey"],
        ["days_after_start", "days_since_aug1", "inperson_survey"],
        ["days_since_start", "inperson_survey", "enum_id"],
        ["days_after_start", "days_since_aug1", "inperson_survey", "enum_id"],
    ]

    enum_dummies = pd.get_dummies(sub["enum_id"], prefix="enum", drop_first=True, dtype=float)

    models = []
    for spec in specs:
        if "enum_id" in spec:
            cols = [c for c in spec if c != "enum_id"]
            X = sm.add_constant(pd.concat([sub[cols], enum_dummies], axis=1))
        else:
            X = sm.add_constant(sub[spec])
        models.append(sm.OLS(y, X).fit())
    return models

models_all = run_regs(df_allmembers, "days_worked_")
models_p   = run_regs(df_participant, "days_worked_p_corrected")
```

3.2.1 Days Worked

Code
```{python}
#| output: asis
def render_table(models: list, title: str) -> str:
    sg = Stargazer(models)
    sg.title(title)
    sg.custom_columns(["Days Since Start", "w/ split", "Control for Modality", "", "Control for Enum", "(6)"], [1, 1, 1, 1, 1, 1])

    # Collect all enum dummy names across all models and hide them
    enum_cols = []
    for m in models:
        enum_cols += [p for p in m.params.index if p.startswith("enum_")]
    if enum_cols:
        all_params = []
        for m in models:
            for p in m.params.index:
                if not p.startswith("enum_") and p not in all_params:
                    all_params.append(p)
        sg.covariate_order(all_params)
        sg.add_line("Enum FE", ["No", "No", "No", "No", "Yes", "Yes"])

    return sg.render_html()

print(render_table(models_all, "Days Worked — All Household Adults"))
```
Days Worked — All Household Adults
Dependent variable: days_worked_
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 0.528*** 0.760*** 0.326*** 0.603*** 0.743*** 0.969***
(0.016) (0.018) (0.018) (0.036) (0.131) (0.132)
days_since_start -0.002*** -0.001*** -0.002***
(0.000) (0.000) (0.000)
days_after_start -0.006*** -0.004*** -0.006***
(0.000) (0.000) (0.000)
days_since_aug1 0.019*** 0.013*** 0.013***
(0.001) (0.002) (0.001)
inperson_survey 0.438*** 0.172*** 0.447*** 0.160***
(0.016) (0.034) (0.023) (0.036)
Enum FE No No No No Yes Yes
Observations 54873 54873 54873 54873 54873 54873
R2 0.001 0.016 0.015 0.016 0.145 0.146
Adjusted R2 0.001 0.016 0.015 0.016 0.143 0.145
Residual Std. Error 1.706 (df=54871) 1.694 (df=54870) 1.695 (df=54870) 1.694 (df=54869) 1.581 (df=54767) 1.579 (df=54766)
F Statistic 69.691*** (df=1; 54871) 438.338*** (df=2; 54870) 411.538*** (df=2; 54870) 300.860*** (df=3; 54869) 88.209*** (df=105; 54767) 88.525*** (df=106; 54766)
Note: p<0.1; p<0.05; p<0.01
Code
```{python}
#| output: asis
print(render_table(models_p, "Days Worked — Participant Only"))
```
Days Worked — Participant Only
Dependent variable: days_worked_p_corrected
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 0.485*** 0.793*** 0.216*** 0.578*** 1.161*** 1.441***
(0.035) (0.038) (0.038) (0.077) (0.271) (0.275)
days_since_start 0.001 0.002*** -0.000
(0.000) (0.000) (0.000)
days_after_start -0.006*** -0.003*** -0.005***
(0.001) (0.001) (0.001)
days_since_aug1 0.028*** 0.019*** 0.018***
(0.002) (0.003) (0.003)
inperson_survey 0.583*** 0.235*** 0.531*** 0.176**
(0.034) (0.072) (0.048) (0.075)
Enum FE No No No No Yes Yes
Observations 14865 14865 14865 14865 14865 14865
R2 0.000 0.021 0.020 0.021 0.184 0.186
Adjusted R2 0.000 0.021 0.019 0.021 0.178 0.180
Residual Std. Error 1.897 (df=14863) 1.878 (df=14862) 1.879 (df=14862) 1.877 (df=14861) 1.720 (df=14759) 1.718 (df=14758)
F Statistic 2.091 (df=1; 14863) 157.542*** (df=2; 14862) 147.878*** (df=2; 14862) 108.611*** (df=3; 14861) 31.642*** (df=105; 14759) 31.767*** (df=106; 14758)
Note: p<0.1; p<0.05; p<0.01

3.2.2 Money Earned

Code
```{python}
#| output: asis
print(render_table(models_all_m, "Money Earned — All Household Adults"))
```
Money Earned — All Household Adults
Dependent variable: money_work_99
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 17.557*** 28.015*** 7.325*** 10.969*** 15.574*** 19.274***
(0.630) (0.699) (0.685) (1.390) (5.292) (5.367)
days_since_start -0.066*** -0.023*** -0.030***
(0.007) (0.007) (0.009)
days_after_start -0.287*** -0.072*** -0.095***
(0.010) (0.018) (0.018)
days_since_aug1 0.876*** 0.153*** 0.211***
(0.029) (0.059) (0.059)
inperson_survey 22.163*** 18.662*** 23.072*** 18.380***
(0.617) (1.316) (0.926) (1.468)
Enum FE No No No No Yes Yes
Observations 54873 54873 54873 54873 54873 54873
R2 0.002 0.021 0.025 0.025 0.072 0.072
Adjusted R2 0.002 0.021 0.025 0.025 0.070 0.070
Residual Std. Error 66.321 (df=54871) 65.670 (df=54870) 65.556 (df=54870) 65.551 (df=54869) 64.002 (df=54767) 63.993 (df=54766)
F Statistic 89.042*** (df=1; 54871) 592.207*** (df=2; 54870) 690.208*** (df=2; 54870) 463.228*** (df=3; 54869) 40.450*** (df=105; 54767) 40.240*** (df=106; 54766)
Note: p<0.1; p<0.05; p<0.01
Code
```{python}
#| output: asis
print(render_table(models_p_m, "Money Earned — Participants"))
```
Money Earned — Participants
Dependent variable: money_work_p_corrected_99
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 8.699*** 19.394*** -1.178 7.138*** 18.884** 25.635***
(0.982) (1.084) (1.062) (2.156) (7.915) (8.022)
days_since_start 0.054*** 0.096*** 0.057***
(0.011) (0.011) (0.014)
days_after_start -0.172*** -0.017 -0.063**
(0.015) (0.028) (0.028)
days_since_aug1 1.020*** 0.499*** 0.501***
(0.046) (0.091) (0.089)
inperson_survey 21.386*** 13.404*** 18.762*** 10.214***
(0.957) (2.040) (1.396) (2.202)
Enum FE No No No No Yes Yes
Observations 14865 14865 14865 14865 14865 14865
R2 0.002 0.033 0.034 0.035 0.135 0.137
Adjusted R2 0.002 0.032 0.034 0.035 0.129 0.130
Residual Std. Error 53.770 (df=14863) 52.933 (df=14862) 52.891 (df=14862) 52.858 (df=14861) 50.223 (df=14759) 50.182 (df=14758)
F Statistic 24.826*** (df=1; 14863) 250.164*** (df=2; 14862) 262.321*** (df=2; 14862) 181.644*** (df=3; 14861) 21.962*** (df=105; 14759) 22.027*** (df=106; 14758)
Note: p<0.1; p<0.05; p<0.01

3.2.3 Working?

Code
```{python}
#| output: asis
print(render_table(models_all_w, "Work dummy — All Household Adults"))
```
Work dummy — All Household Adults
Dependent variable: work_engaged_
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 0.083*** 0.129*** 0.041*** 0.082*** 0.108*** 0.143***
(0.002) (0.003) (0.003) (0.005) (0.020) (0.020)
days_since_start -0.000*** 0.000* -0.000***
(0.000) (0.000) (0.000)
days_after_start -0.001*** -0.001*** -0.001***
(0.000) (0.000) (0.000)
days_since_aug1 0.004*** 0.002*** 0.002***
(0.000) (0.000) (0.000)
inperson_survey 0.091*** 0.052*** 0.088*** 0.044***
(0.002) (0.005) (0.003) (0.006)
Enum FE No No No No Yes Yes
Observations 54873 54873 54873 54873 54873 54873
R2 0.000 0.025 0.026 0.027 0.139 0.141
Adjusted R2 0.000 0.025 0.026 0.027 0.138 0.139
Residual Std. Error 0.259 (df=54871) 0.256 (df=54870) 0.256 (df=54870) 0.256 (df=54869) 0.241 (df=54767) 0.241 (df=54766)
F Statistic 22.296*** (df=1; 54871) 713.638*** (df=2; 54870) 728.282*** (df=2; 54870) 510.959*** (df=3; 54869) 84.335*** (df=105; 54767) 84.683*** (df=106; 54766)
Note: p<0.1; p<0.05; p<0.01
Code
```{python}
#| output: asis
print(render_table(models_p_w, "Work dummy — Participants"))
```
Work dummy — Participants
Dependent variable: work_engaged_p_corrected
Days Since Start w/ split Control for Modality Control for Enum (6)
(1) (2) (3) (4) (5) (6)
const 0.070*** 0.134*** 0.011* 0.064*** 0.167*** 0.211***
(0.005) (0.006) (0.006) (0.012) (0.042) (0.043)
days_since_start 0.000*** 0.001*** 0.000***
(0.000) (0.000) (0.000)
days_after_start -0.001*** -0.000 -0.001***
(0.000) (0.000) (0.000)
days_since_aug1 0.006*** 0.003*** 0.003***
(0.000) (0.001) (0.000)
inperson_survey 0.128*** 0.076*** 0.112*** 0.058***
(0.005) (0.011) (0.007) (0.012)
Enum FE No No No No Yes Yes
Observations 14865 14865 14865 14865 14865 14865
R2 0.003 0.039 0.040 0.042 0.198 0.200
Adjusted R2 0.002 0.039 0.040 0.041 0.192 0.194
Residual Std. Error 0.299 (df=14863) 0.294 (df=14862) 0.294 (df=14862) 0.293 (df=14861) 0.269 (df=14759) 0.269 (df=14758)
F Statistic 37.592*** (df=1; 14863) 298.845*** (df=2; 14862) 308.781*** (df=2; 14862) 215.019*** (df=3; 14861) 34.605*** (df=105; 14759) 34.702*** (df=106; 14758)
Note: p<0.1; p<0.05; p<0.01