PublicDisclosureAuthorized PublicDisclosureAuthorized PolicyResearchWorkingPaper10559 MachineLearningImputationofHighFrequencyPriceSurveysinPapuaNewGuinea BoPieterJohannesAndréeUtzJohannPape DevelopmentDataGroup AgricultureandFoodGlobalPractice&PovertyandEquityGlobalPracticeSeptember2023 PolicyResearchWorkingPaper10559 Abstract Capabilitiestotrackfast-movingeconomicdevelopmentsre-mainlimitedinmanyregionsofthedevelopingworld.Thiscomplicatesprioritizingpoliciesaimedatsupportingvulnerablepopulations.Togaininsightintotheevolutionoffluideventsinadatascarcecontext,thispaperexplorestheabilityofrecentmachine-learningadvancestoproducecontinuousdatainnear-real-timebyimputingmultipleentriesinongoingsurveys.ThepaperattemptstotrackinflationinfreshproducepricesatthelocalmarketlevelinPapuaNewGuinea,relyingonlyonincompleteandinter-mittentsurveydata.Thisapplicationismadechallengingbyhighintra-monthpricevolatility,lowcross-marketprice correlations,andweakpricetrends.Themodelingapproachuseschainedequationstoproduceanensemblepredictionformultiplepricequotessimultaneously.Thepaperrunscross-validationofthepredictionstrategyunderdifferentdesignsintermsofmarkets,foods,andtimeperiodscov-ered.Theresultsshowthatwhenthesurveyiswell-designed,imputationscanachieveaccuracythatisattractivewhencomparedtocostly–andlogisticallyofteninfeasible–directmeasurement.ThemethodshavewiderapplicabilityandcouldhelptofillcrucialdatagapsindatascarceregionssuchasthePacificIslands,especiallyinconjunctionwithspecificallydesignedcontinuoussurveys. ThispaperisaproductoftheDevelopmentDataGroup,DevelopmentEconomics,theAgricultureandFoodGlobalPractice;andthePovertyandEquityGlobalPractice.ItispartofalargereffortbytheWorldBanktoprovideopenaccesstoitsresearchandmakeacontributiontodevelopmentpolicydiscussionsaroundtheworld.PolicyResearchWorkingPapersarealsopostedontheWebathttp://www.worldbank.org/prwp.Theauthorsmaybecontactedatbandree@worldbank.organdupape@worldbank.org. ThePolicyResearchWorkingPaperSeriesdisseminatesthefindingsofworkinprogresstoencouragetheexchangeofideasaboutdevelopmentissues.Anobjectiveoftheseriesistogetthefindingsoutquickly,evenifthepresentationsarelessthanfullypolished.Thepaperscarrythenamesoftheauthorsandshouldbecitedaccordingly.Thefindings,interpretations,andconclusionsexpressedinthispaperareentirelythoseoftheauthors.TheydonotnecessarilyrepresenttheviewsoftheInternationalBankforReconstructionandDevelopment/WorldBankanditsaffiliatedorganizations,orthoseoftheExecutiveDirectorsoftheWorldBankorthegovernmentstheyrepresent. ProducedbytheResearchSupportTeam MachineLearningImputationofHighFrequencyPriceSurveysinPapuaNewGuinea ByBoPieterJohannesAndre´eandUtzJohannPape∗ JEL:C01,C14,C25,C53,O10. Keywords:Inflation,AgricultureandFoodSecurity,FoodPriceAnalysis,EconomicShocksandVulnerability,MacroeconomicMonitoring. ∗BoPieterJohannesAndr´ee,TheWorldBank,DevelopmentEconomics,DataGroup,canbecontactedatbandree(at)worldbank.org.UtzJohannPape,TheWorldBank,Poverty&EquityGlobalPractice,EastAsiaandPacific,aswellasUniversityofG¨ottingen,canbecontactedatup-ape(at)worldbank.org.Thefindings,interpretations,andconclusionsexpressedinthispaperareentirelythoseoftheauthors.TheydonotnecessarilyrepresenttheviewsoftheWorldBankanditsaffiliatedorganizations,orthoseoftheExecutiveDirectorsoftheWorldBankorthegovernmentstheyrepresent. 1 I.Introduction Statisticalagenciesaroundtheworldareincreasinglyinterestedintheuseofmachinelearningintheproductionofofficialstatistics.Inparticular,theareaofmissingdataimputationisoneofpotentiallypromisingapplications.ArecentsurveyontheuseofmachinelearningmethodsinofficialstatisticscommissionedbytheUnitedNationsEconomicCommissionandconductedatselectednationalandinternationalstatisticalinstitutionsrevealedthatmissingdataimputationwassecondinarankingofpromisingareas(Becketal.,2022).Real-timeimputa-tionofeconomicdatamayalsoholdthekeytoenablingreliableforecastingandmonitoringofrisksinhumanitariansettingswhereprimarydataoftencannotbecollected(Andr´eeetal.,2020;Wangetal.,2020,2022),orindevelopmentcon-textswherelargescaledataoperationsaretypicallycarriedoutonaninfrequentbasis(Mahleretal.,2021). Traditionally,surveyshavebeendeployedasself-containeddatagatheringop-erationsaimedatcapturingasnapshotofanevolvingpopulationstatisticsuchasthepovertyrate,marketsentiment,ortheconsumerpriceindex.Developingsuchone-timeanalyseshasbeenthebreadandbuttertaskofeconomistsfordecades,andthego-toapproachforpolicymakerstoinformtheirnextactions.Theissueofmissingdatahastraditionallybeenapproachedfromtheangleofcorrectingforthebiasanduncertaintythatariseinthisanalyticalcontext.Inparticular,theworkofRubin(1976);CampionandRubin(1989);Rubin(1996);LittleandRubin(2012);vanBuuren(2012)onmultipleimputationhasprovidedimpor-tantanswerstothequestionofhowtodealwithnon-responsewhenestimatingeconomicrelationships. Increasingly,however,economistsandpolicymakersarelookingforcontinu-ousinsight,asshownbythesurgeinliteratureon“now-casting”andreal-timeindicators(Khanetal.,2022).Theliteraturehaspu