您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [-]:基于小型语言模型的终身智能体 - 发现报告

基于小型语言模型的终身智能体

2026-04-26 - - carry~强
报告封面

ICLR 2026 — Lifelong Agents Workshop Siva Reddy McGill · Mila · ServiceNow Research Lifelong agents are becoming universal OpenClaw: a personal AIassistant by PeterSteinberger and the open-source community Lifelong agents are becoming universal OpenClaw: a personal AIassistant by PeterSteinberger and the open-source community Runs on your machine; talks through WhatsApp, Slack, TelegramPersistent memoryacross conversations: preferences, facts, decisionsTakes actions — controls your browser, runs scripts, sets remindersModel-agnostic: Claude, GPT, or a local model Lifelong agents are becoming universal OpenClaw: a personal AIassistant by PeterSteinberger and the open-source community Runs on your machine; talks through WhatsApp, Slack, TelegramPersistent memoryacross conversations: preferences, facts, decisionsTakes actions — controls your browser, runs scripts, sets remindersModel-agnostic: Claude, GPT, or a local model Most deployments today still call a frontier API. Where would a lifelong agent live? A frontier API is the wrong substrate: Where would a lifelong agent live? A frontier API is the wrong substrate: Cost— billions of [tasks × users × interactions] makes per-call pricinguntenable Where would a lifelong agent live? A frontier API is the wrong substrate: Cost— billions of [tasks × users × interactions] makes per-call pricinguntenableLatency / privacy— the agent has to benearthe user Where would a lifelong agent live? A frontier API is the wrong substrate: Cost— billions of [tasks × users × interactions] makes per-call pricinguntenableLatency / privacy— the agent has to benearthe userPersonalization— a single hosted model cannot be many users at once Where would a lifelong agent live? A frontier API is the wrong substrate: Cost— billions of [tasks × users × interactions] makes per-call pricinguntenableLatency / privacy— the agent has to benearthe userPersonalization— a single hosted model cannot be many users at once A lifelong agent must run on the user’s device — asmall language modelisthe only realistic deployment target. Three problems Three problems Per domain— Can the small model do the job at all?A3 — agentic distillation Three problems Per domain— Can the small model do the job at all?A3 — agentic distillationPer user— Does it know me?AdaptArena — test-time personalization Three problems Per domain— Can the small model do the job at all?A3 — agentic distillationPer user— Does it know me?AdaptArena — test-time personalizationPer interaction— How does it remember and retrieve memories?LLM2Vec-Gen — output-space embeddings Three problems Per domain— Can the small model do the job at all?A3 — agentic distillationPer user— Does it know me?AdaptArena — test-time personalizationPer interaction— How does it remember and retrieve memories?LLM2Vec-Gen — output-space embeddings A lifelong agent adapts at three granularities:domain→user→interaction. 1. Specialization Per domain — A3 Structured Distillation of Web Agent Capabilities Enables Generalization Specialization: the gap Small open-weight agents trail frontier by20+ ppon web tasks. Specialization: the gap Small open-weight agents trail frontier by20+ ppon web tasks. Qwen 3.5 9B on WebArena: ~31% — Gemini-3-Pro: ~51% Specialization: the gap Small open-weight agents trail frontier by20+ ppon web tasks. Qwen 3.5 9B on WebArena: ~31% — Gemini-3-Pro: ~51%Standard SFT distillation overfits to training tasks Specialization: the gap Small open-weight agents trail frontier by20+ ppon web tasks. Qwen 3.5 9B on WebArena: ~31% — Gemini-3-Pro: ~51%Standard SFT distillation overfits to training tasksCan we transfer frontier capability into a 9B modelwhile enablinggeneralization across web environments? Specialization: the gap Small open-weight agents trail frontier by20+ ppon web tasks. Qwen 3.5 9B on WebArena: ~31% — Gemini-3-Pro: ~51%Standard SFT distillation overfits to training tasksCan we transfer frontier capability into a 9B modelwhile enablinggeneralization across web environments? Tension: more demonstrations help WebArena but hurt out-of-distributiontransfer (e.g., WorkArena, VisualWebArena, MiniWoB). A3: Agent-as-Annotators A3 replaces three human annotation roles with LLM modules: Human roleLLM module (Gemini-3-Pro)Outputs Task DesignerPersona + Task GeneratorPersona, task intent, evaluation hints AnnotatorAgentTrajectory + reasoning trace A3: Agent-as-Annotators A3 replaces three human annotation roles with LLM modules: Human roleLLM module (Gemini-3-Pro)Outputs Task DesignerPersona + Task GeneratorPersona, task intent, evaluation hints AnnotatorAgentTrajectory + reasoning trace SupervisorJudgePass/fail using the hints The student (Qwen3.5-9B) isfine-tuned onjudge-filteredtrajectories withreasoning intact:2,322successful out of 3,000 attempts,6 webenvironments. Example: one annotated rollout Persona(Task Designer):Maya, e-commerce admin who clears pending ordersfirst thingevery