LytleLecture2023-2024 M-51,HSO Objective-DrivenAI TowardsAIsystemsthatcanlearn,remember,reason,plan, havecommonsense,yetaresteerableandsafe YannLeCun NewYorkUniversity Meta–FundamentalAIResearch UniversityofWashington LytleLecture2024-01-24 MachineLearningsucks!(comparedtohumansandanimals) Y.LeCun Supervisedlearning(SL)requireslargenumbersoflabeledsamples.Reinforcementlearning(RL)requiresinsaneamountsoftrials. Self-SupervisedLearning(SSL)worksgreatbut... Generativepredictiononlyworksfortextandotherdiscretemodalities Animalsandhumans: Canlearnnewtasksveryquickly.UnderstandhowtheworldworksCanreasonanplan HumansandanimalshavecommonsenseTherebehaviorisdrivenbyobjectives(drives) Y.LeCun WeNeedHuman-LevelAIforIntelligentAssistant Smartglasses Communicatesthroughvoice,vision,display,electro-myograminterfaces(EMG) IntelligentAsistant CananswerallofourquestionsHelpsusinourdailylives Knowsourpreferencesandinterests “Her” (2013) Forthis,weneedmachineswithcommonsense MachinesthatunderstandhowtheworldworksMachinesthatcanremember,reason,plan. FutureAIAssistantsneedHuman-LevelAI Y.LeCun AIassistantswillrequire(super-)human-levelintelligence Likehavingastaffofsmart“people”workingforus But,wearenowherenearhuman-levelAItoday Any17year-oldcanlearntodrivein20hoursoftraining Any10year-oldcanlearntoclearthedinnertableinoneshotAnyhousecatcanplancomplexactions Whatarewemissing? Learninghowtoworldworks(notjustfromtext)Worldmodels.Commonsense Memory,Reasoning,HierarchicalPlanning DesiderataforAMI(AdvancedMachineIntelligence) Y.LeCun Systemsthatlearnworldmodelsfromsensoryinputs E.g.learnintuitivephysicsfromvideo Systemsthathavepersistentmemory Large-scaleassociativememories Systemsthatcanplanactions Soastofulfillanobjective Systemsthatarecontrollable&safe Bydesign,notbyfine-tuning. Objective-DrivenAIArchitecture Self-SupervisedLearninghastakenovertheworld Forunderstandingandgeneratingtext,images,video,3Dmodels,speech,proteins,... Y.LeCun Self-SupervisedLearningviaDenoising/Reconstruction DenoisingAuto-Encoder[Vincent2008],BERT[Devlin2018],RoBERTa[Ott2019] Learnedrepresentation Corruptionmasking Thisisa[...]oftextextracted[...]alargesetof[...]articles Thisisapieceoftextextractedfromalargesetofnewsarticles NoLanguageLeftBehind(NLLB) Y.LeCun Languagetranslationbetween202languages inanyofthe40602directions Trainingset:18billionpairsofsentencesfor2440languagedirectionsMostpairshavelessthan1millionsentenceshttps://ai.facebook.com/research/no-language-left-behind/ Asingleneuralnetwith 54billionparametersPerformancegetsbetterasmorelanguagesare added ReliesonSelf-SupervisedLearningandback-translation. Y.LeCun NoLanguageLeftBehind(NLLB) SeamlessM4T Y.LeCun Speechortextinput:100languagesTextoutput:100languages Speechoutput:35languages SeamlessExpressive:real-time,preservesvoice&expressionhttps://ai.meta.com/blog/seamless-m4t/ Y.LeCun DeepLearningConnectsPeopletoknowledge&toeachother Meta(FB,Instagram),Google,YouTube,Amazon,arebuiltaroundDeeplearning TakeDeepLearningoutofthem,andtheycrumble.DLhelpsusdealwiththeinformationdelugeSearch,retrieval,ranking,question-answeringRequiresmachinestounderstandcontentTranslation/transcription/accessibility language↔language;text↔speech;image→text Peoplespeakthousandsofdifferentlanguages3billionpeoplecan’tusetechnologytoday. 800millionareilliterate,300millionarevisuallyimpaired On-LineContentModeration Y.LeCun Filteringoutillegalanddangerouscontent Whatconstitutesacceptablecontent? Metadoesn’tseeitselfashavingthelegitimacytodecideButintheabsenceofregulations,ithastodoit. TypesofobjectionablecontentonFacebook (with%takendownpreemptively&prevalence,Q12022) HateSpeech(95.6%,0.02%),Violenceincitement(98.1%,0.03%), Violence(99.5%,0.04%),Bullying/Harassment(67%,0.09%),Child endangerment(96.4%),Suicide/Self-Injury(98.8%),Nudity(96.7%,0.04%),Terrorism(16Mpieces),Fakeaccounts(1.5B),Spam(1.8B) https://transparency.fb.com/data/community-standards-enforcement AIisthesolution,nottheproblem Hatespeechsuppression/down-rankingonFacebook Y.LeCun Oftheviolatingcontentweactionedforhatespeech,howmuchdidwefindandactionbeforepeoplereportedit? https://transparency.fb.com/reports/community-standards-enforcement/hate-speech/facebook/ 95.6% 23.6% ProteinFolding: ESMfold,ESMfold-2(FAIR) AlphaFold,AlphaFold-2(DeepMind) Proteinfoldingandinversefolding(proteindesign) Y.LeCun fromasequenceofaminoacidsto3Dstructure [Jumper21,Rives19] ProteinGeneration [Linetal.2021] ProteinDesign: from3Dstructuretosequencesofaminoacids Fordrugdesign [Lin&al.BioRxiv:2022.07.20.500902] ESMMetagenomicAtlas(FAIR+NYU) Y.LeCun 615millionproteinswithpredicted3DstructureInteractivewebsite https://esmatlas.com/ Paper: [Linetal.2022]Evolutionary-scalepredictionofatomiclevelproteinstructurewithalanguage model https://www.biorxiv.org/content/10.1101/2022.07.20.500902 Code: https://github.com/facebookresearch/esm GenerativeAIandAuto-Regressive LargeLanguageModels Auto-RegressiveGenerativeArchitectures Y.LeCun Outputsone“to