Using Post-Double Selection Lasso in Field Experiments
Authors: Jacobus Cilliers, Nour Elashmawy, David McKenzie
Affiliations: Georgetown University, Development Research Group, World Bank
Keywords: Treatment effect, Randomized Experiment, Post-Double Selection Lasso, Attrition, Statistical Power
JEL Classification Codes: C93, C21, O12
Abstract
The Policy Research Working Paper Series disseminates findings from ongoing research to foster discussion on development issues. This paper re-evaluates 780 treatment effects from published papers to assess the practical impact of using the post-double selection Lasso (PDS Lasso) estimator in field experiments. On average, PDS Lasso reduces standard errors by less than 1%, and it selects fewer than three control variables in over half the cases. The study highlights key practical issues and performance of PDS Lasso, comparing it to standard ANCOVA.
Introduction
In simple randomized experiments, the difference-in-means estimator provides an unbiased estimate of the average impact of treatment. However, in small sample sizes, covariate imbalances can lead to biased estimates. Ex-post adjustments through ANCOVA can mitigate these issues, but choosing relevant covariates can be challenging and may introduce bias.
The post-double selection Lasso (PDS Lasso) estimator, originally designed for observational studies, has gained popularity in field experiments. PDS Lasso uses Lasso twice: first to select covariates predictive of the outcome, and second to select covariates predictive of treatment status. These selected covariates are then used in the treatment regression. While this method aims to improve precision, its effectiveness in randomized experiments is questioned, especially in small sample sizes where non-linear machine learning approaches may be less beneficial.
Key Findings
- Sample Size and Attrition: Field experiments often deal with small sample sizes (100 to 1,000 observations) and high attrition rates (15% on average). These factors complicate the selection of covariates.
- Performance of PDS Lasso:
- Typically, PDS Lasso selects very few control variables, with a median of three controls across the 780 treatment estimates.
- In over half the cases, no variables are selected in the treatment regression step.
- Selected variables rarely overlap between predicting the outcome and predicting treatment status.
- On average, PDS Lasso leads to minimal changes in treatment estimates and standard errors, with a median change of 0.01 standard deviations in coefficients and a median standard error 99.2% of that with ANCOVA.
- In about a quarter of cases, standard errors are slightly larger than with ANCOVA.
Practical Implications
- Power Gains: Researchers should not expect significant power gains from using PDS Lasso.
- Attrition and Performance: PDS Lasso is more likely to select control variables in the presence of attrition, but the changes in coefficients remain small.
- Precision and Mean-Squared Error: PDS Lasso can sometimes be less precise than ANCOVA, particularly when many control variables are included, risking the exclusion of important variables like the lagged dependent variable.
Recommendations
- Include Lagged Dependent Variable: To avoid excluding important variables, researchers should include the lagged dependent variable in the amelioration set.
- Penalty Parameter Selection: Further research is needed to determine optimal penalty parameters for different scenarios.
Conclusion
While PDS Lasso offers a principled approach to selecting control variables, its practical benefits in field experiments are limited, especially in small sample sizes and with high attrition rates. Researchers should carefully consider the trade-offs and potential drawbacks before adopting this method.