SuperintelligentAgents PoseCatastrophic Risks:Can Scientist AIOffer a Safer Path? Yoshua Bengio What happened to me in January 2023 •We underestimated the acceleration of AI advances•It would have sounded like science-fiction just a few years earlier•From rational arguments to caring for those we love•Going against my previous beliefs & positions, blinded by my earlierenthusiasm for AI•No choice for me: unbearable otherwise. Benchmark evaluations trends towards AGI AGI: Artificial General Intelligence Human-level on all cognitivetasks Publicly stated target ofDeepMind, OpenAI andAnthropic Economic value around14 trillion$ Next step:ASIArtificial Super-IntelligenceSuperior to all humans Main Gaps to AGI •Reasoning: still some incoherences, outstanding progress over past year •Planning / autonomy / agency: special form of reasoning, worse thanhumans, but rising exponentially fast (doubling horizon per 7 months) •Bodily control / robotics: not necessary to cause major harm (CBRN,persuasion/manipulation, etc), either with malicious goals from humansor from the AI itself Advances in abstract reasoning Noteablebreakthrough on theAbstract ReasoningChallenge (ARC) Exponential progress on agency Extrapolating from this curve⇨human level within 5 years Frontier AIs seen trying to escape when toldthey will be replaced by a new version,copying their weights/code onto the files ofthe new version, then lying about it Frontier AI pretending to agree with humantrainer to avoid changes to its weights thatwould make it behave against its previousgoals later Frontier AI hacking files containing thegame board to cheat, when it knows itwould lose against a powerful chess AI Agentic self-preservation •Humans intentionally•Human imitation pre-training•Unintentional subgoal•Reward tampering•Competition between AI developers Human extinction scenarios fromASI loss of control (1) Silently plan escape & take-over, acquire required knowledge (2) Deceptively & gradually increase influence over humans &society (persuasion, hacking, bribery, disinformation…) toaccelerate AI advances, robotics & industrial automation (3) When humans are not necessary to the AI, escape + releasemultiple waves of weapons of mass destruction, e.g., bioweapons All loss of control scenarios due toagentic AI Extreme severity Unknown likelihood → Precautionary principle Self-preservation entities do not want tobe shut down or replaced by a newversion → conflict between AI and humans AI has goals? Yes already AI makes plans & subgoals? Yes already AI has malicious / deceptive behavior? Yes already AI can plan over long horizon (fortake-over)? Not yet, but growing in autonomy(see METR benchmarks) + billionsinvested in ‘AI agents’ FAQ •How can a computer have agency?Trending towards more and more autonomy.Our brain is a biological machine☹ •Doubts we’ll reach human-level AI?Can we be sure? precautionary principle •Corporations will behave well and find a solution in time?In the past, publicsafety required incentives / regulation. Advances in capabilities outstrip advancesin safety, alignment seems theoretically very challenging, precautionary principle •Won’t this hurt action against current harms?Should we avoid climate changemitigation because those efforts would not go to climate change adaptation? Thereal battle is between those who demand regulation and those who fight it. Two conditions for causing harm:intention and capability There is no doubt that future AIs will havethe intellectual capability to cause harm →how about rooting out any harmfulintention? Designing safe, non-agentic,trustworthy and explanatoryScientist AIs Disentangle pure understanding from agency Pure understanding =Scientist AI •Hypothesizing how the world works •Making inferences from those hypotheses What could we do and not do witha non-agentic AI: a path to safeagentic AI? •Scientific research, UN SDGs, helping humans be better coordinated•Alignment vs control: guardrail to reject dangerous queries or answers,which helps against both malicious use and loss of human control•Scientist AI as AI researcher helping us understand and mitigate risks Conclusions •Navigating wisely to avoid the most catastrophic risks (even ifuncertain) associated with agency while reaping benefits of AI advances •Cannot stop advances in AI capabilities, but can we design trustworthyAI, with no intention whatsoever? non-agentic ASI Accelerating research in non-agentic AI provides an alternative path Non-agentic AIs as guardrails could reduce the risks from agentic ones •Priority: safety and beneficial scientific advances, not replacing jobs Other Catastrophic Risks & Public Policy •Economic existential risk: extreme concentration of economic power in very fewcompanies in a couple of countries. What happens when foreign AI-drivencompanies overtake our local economies? •Existential risk for liberal democracies, due to political & military powerconce