PolicyResearchWorkingPaper10673 MissingEvidence TrackingAcademicDataUsearoundtheWorld BrianStacyLucasKitzmüllerXiaoyuWang DanielGerszonMahlerUmarSerajuddin PublicDisclosureAuthorized PublicDisclosureAuthorized DevelopmentEconomicsDevelopmentDataGroupJanuary2024 Averifiedreproducibilitypackageforthispaperisavailableathttp://reproducibility.worldbank.org,clickherefordirectaccess. PolicyResearchWorkingPaper10673 Abstract Data-drivenresearchonacountryiskeytoproducingevidence-basedpublicpolicies.Yetlittleisknownaboutwheredata-drivenresearchislackingandhowitcouldbeexpanded.Thispaperproposesamethodfortrackingacademicdatausebycountryofsubject,applyingnaturallanguageprocessingtoopen-accessresearchpapers.Themodel’spredictionsproducecountryestimatesofthenumberofarticlesusingdatathatarehighlycorrelatedwithahuman-codedapproach,withacorrelationof0.99.Analyzingmorethan1millionacademicarticles,thepaperfindsthatthenumberofarticlesonacountryisstrongly correlatedwithitsgrossdomesticproductpercapita,popu-lation,andthequalityofitsnationalstatisticalsystem.Thepaperidentifiesdatasourcesthatarestronglyassociatedwithdata-drivenresearchandfindsthatavailabilityofsub-nationaldataappearstobeparticularlyimportant.Finally,thepaperclassifiescountriesintogroupsbasedonwhethertheycouldmostbenefitfromincreasingtheirsupplyofordemandfordata.Thefindingsshowthattheformerappliestomanylow-andlower-middle-incomecountries,whilethelatterappliestomanyupper-middle-andhigh-incomecountries. ThispaperisaproductoftheDevelopmentDataGroup,DevelopmentEconomics.ItispartofalargereffortbytheWorldBanktoprovideopenaccesstoitsresearchandmakeacontributiontodevelopmentpolicydiscussionsaroundtheworld.PolicyResearchWorkingPapersarealsopostedontheWebathttp://www.worldbank.org/prwp.Theauthorsmaybecontactedatbstacy@worldbank.org.Averifiedreproducibilitypackageforthispaperisavailableathttp://reproducibility.worldbank.org,clickherefordirectaccess. ThePolicyResearchWorkingPaperSeriesdisseminatesthefindingsofworkinprogresstoencouragetheexchangeofideasaboutdevelopmentissues.Anobjectiveoftheseriesistogetthefindingsoutquickly,evenifthepresentationsarelessthanfullypolished.Thepaperscarrythenamesoftheauthorsandshouldbecitedaccordingly.Thefindings,interpretations,andconclusionsexpressedinthispaperareentirelythoseoftheauthors.TheydonotnecessarilyrepresenttheviewsoftheInternationalBankforReconstructionandDevelopment/WorldBankanditsaffiliatedorganizations,orthoseoftheExecutiveDirectorsoftheWorldBankorthegovernmentstheyrepresent. ProducedbytheResearchSupportTeam MissingEvidence: TrackingAcademicDataUsearoundtheWorld BrianStacy,LucasKitzmüller,XiaoyuWang,DanielGerszonMahler,andUmarSerajuddin1 Keywords:Data,academia,research,naturallanguageprocessingJELcodes:C45,C52,O30 1Stacy,Wang,Mahler,andSerajuddinarewiththeWorldBank’sDevelopmentDataGroup.LucasKitzmüllercompletedtheworkwhileattheEuropeanBankforReconstructionandDevelopment(EBRD).Correspondingauthor:BrianStacy(bstacy@worldbank.org).WearegratefulforcommentsfromDeanJolliffe,JishnuDas,OlivierDupriez,andPatrickBrock.WeacknowledgefinancialsupportfromaWorldBankResearchSupportGrant(P178728).Thefindings,interpretations,andconclusionsexpressedinthispaperareentirelythoseoftheauthors.TheydonotnecessarilyrepresenttheviewsoftheWorldBankanditsaffiliatedorganizations,orthoseoftheExecutiveDirectorsoftheWorldBankorthegovernmentstheyrepresent.ViewspresentedarethoseoftheauthorsandnotnecessarilyoftheEBRD. 1Introduction Inrecentdecades,theamountofdataproducedhasexploded,generatingboundlessopportunitiesforpoliciestoimprovepeople’slives(WorldBank2021).Thoughdatacanbevaluableintheirrawform,thefullvalueofdataisonlyrealizedwhentheyareanalyzedtocreateinsights,andtheseinsightsareconvertedtopublicpoliciesorincreasedaccountability. Researchershaveavitalroletoplayinthisregard.Manyresearchersspendcountlesshoursdigestingdata,usingdatatocreatenewknowledge,andcommunicatingthisknowledgewiththeintentofimpactingpublicdiscourseandpublicpolicies.Therearenumerousexamplesofdata-drivenanalyseshavingrealandimportantimpactsonpeople’slives(Jolliffeetal.2023).OneexamplefromBrazilexplicitlylooksatresearchers’abilitytoinfluencepolicyoutcomes.There,evidencefrom2,150municipalitiesfoundthatinformingmunicipalmayorsofresearchfindingsontheeffectivenessofasimplepolicychangeincreasedtheprobabilitythattheirmunicipalityimplementedthepolicyby10percentagepoints(Hjortetal.2021). Withoutresearch,thereisariskthatthereturnofdatatosocietywillbereduced,andpoliciestoimprovelivesunrealized.Yetverylittleisknownaboutwherethereismissingdata-drivenevidenceandhowgovernmentscanbeststimulateanevidencebaseforlocaldecisionmakers.Thispaperattemptstofillthesegapsbyaddressingtwoquestions:(1)Whichcountriesarethesubjectofresearchpapersusingdata?(2)Howcancountriesincreasetheirnationalevidencebase?Wefocusondata-drivenresearchduetotheincreasingimportanceofdataforpolicymakingandthespecificpoliciesthatareneededtoincreasethesupplyanddemandofdata,suchasboostingstatisticalcapacityandimprovingdataliteracy. Toanswert