Microdata from agricultural surveys are crucial for understanding the economic lives of smallholder farmers. However, collecting accurate data in low-income countries is challenging due to various factors such as the complexity and seasonality of agricultural operations, illiteracy among respondents, and unfamiliarity with standard units of measurement. These challenges necessitate complex and costly survey operations, making cost-effectiveness and design efficiency key areas of research.
Plot-level crop yield statistics are typically derived from either farmers' reported harvest weights or enumerator-measured crop-cut harvest weights. Self-reported crop yields tend to be higher than objectively measured yields obtained through crop cutting. While crop-cutting is the most precise method, it is more costly and time-intensive, often available only for a subset of the survey. Statistical methods and imputation techniques are thus necessary to fill these data gaps.
Recent advancements in machine learning (ML) algorithms have made them a go-to tool for analysts. ML techniques have been particularly effective in prediction-based problems, such as estimating crop yields. These methods are used to model complex factors like soil quality, weather, input timing, and management choices, which have nonlinearities and interactions.
This study uses data from Mali's Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA) project, covering two consecutive rounds of a nationally representative agricultural survey. The analysis extends to multiple crops: sorghum, millet, maize, rice, groundnut, and cowpea. The goal is to validate an imputation framework that predicts crop yields using machine learning models, focusing on within-survey and survey-to-survey imputations.
The study contributes valuable insights into improving cost-efficiency in agricultural surveys and the potential of imputation methods. By leveraging ML techniques, researchers can better estimate crop yields, enhancing the accuracy and reliability of agricultural data.
This summary provides a structured overview of the key aspects and findings of the study, highlighting the importance of machine learning in filling data gaps in agricultural surveys.