您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [牛津大学&斯坦福大学&艾伦人工智能研究所& Sakana AI]:预测科学进步与人工智能 - 发现报告

预测科学进步与人工智能

2026-05-21 Sean Wei, Pan-Yang Chen, Jonathan Bogo, Yutaro Yamada, Peter Clark, David Cifuentes, Philip Torr, James Y., Junshi Yu, Sakana Al 牛津大学&斯坦福大学&艾伦人工智能研究所& Sakana AI 顾小桶🙊
报告封面

Sean Wu1,∗, Pan Lu2,∗, Yupeng Chen1, Jonathan Bragg3Yutaro Yamada4, Peter Clark3, David Clifton1, Philip Torr1,†, James Zou2,†, Junchi Yu1,†1University of Oxford2Stanford University3Allen Institute for AI4Sakana AI Abstract Repository Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether itcan anticipate scientific progress remains unclear. To study this question, we introduce a tem-porally grounded evaluation framework for forecasting scientific progress under controlledknowledge constraints. We presentCUSP(Cutoff-conditionedUnseenScientificProgress), amulti-disciplinary and event-level benchmark that evaluates scientific forecasting performancein AI systems through feasibility assessment, mechanistic reasoning, generative solution design,and temporal prediction. Across 4,760 scientific events, we observe systematic and domain-dependent limitations in current frontier models. While models can identify plausible researchdirections from competing candidates, they fail to reliably predict whether scientific advanceswill be realized and systematically misestimate when they will occur. Model performance ishighly heterogeneous across domains, with the timing of AI progress being more predictablethan advances in biology, chemistry, and physics. Performance is largely insensitive to whether 1Introduction Scientific progress is often assumed to follow structured patterns [1,2], with empirical regularitiessuch as Moore’s Law [3] in semiconductors and scaling relationships [4] in deep learning providingquantitative expectations about future developments. These patterns emerge from accumulatedscientific progress [5] and have long informed research roadmaps, funding priorities, and techno-arXiv:2605.22681v1 [cs.AI] 21 May 2026 [8, 9, 10, 11, 12], a question arises: can AI systems forecast the trajectory of scientific progress? Recent advances in large language models suggest that AI systems can act as general-purposescientific assistants and support tasks ranging from hypothesis generation to experiment design[13,14]. A growing body of work has evaluated their capabilities in scientific reasoning [15,16],problem-solving [17,18], and impact prediction [19] across scientific domains.While thesestudies demonstrate broad proficiency, they do not evaluate whether AI systems can reliablyforecast scientific progress under temporal knowledge constraints. Evaluating such capabilities is To address this gap, we introduceCUSP(Cutoff-conditionedUnseenScientificProgress), an event-level, multi-disciplinary, and temporally grounded framework for evaluating scientific forecastingin AI systems.CUSPis constructed from 4,760 verifiable scientific milestones extracted fromtop-tier publications and community-driven repositories across multiple disciplines. Each eventis associated with a precise temporal reference to enable controlled access to prior knowledge.Crucially,CUSPoperationalizes scientific forecasting as a measurable capability across four com- We useCUSPto evaluate frontier models under controlled temporal constraints and find a con-sistent pattern of limitations. While models can identify plausible technical approaches fromcompeting candidates, they struggle to generate solutions that align with the methods underlyingrealized scientific advances. In feasibility assessment and temporal prediction, models performnear chance in predicting whether scientific advances will be realized and exhibit a strong biastoward delayed outcomes when estimating when such advances will occur. Moreover, models aresystematically overconfident and display strong response biases in feasibility assessment, indicat- To further understand these limitations, we analyze model performance across pre- and post-cutoffevents under controlled information access. Providing additional pre-cutoffknowledge improvesperformance on both pre-cutoffand post-cutoffevents, indicating a knowledge gap in how modelsaccess and utilize available information. However, a substantial forecasting gap remains, as modelsperform significantly worse on post-cutoffevents than in full-information settings with post-event Taken together, these results indicate that while current AI systems can identify plausible scientificapproaches and benefit from additional knowledge, they lack grounded and well-calibrated scien-tific forecasting. They fail to accurately predict whether scientific advances will be realized andwhen they will occur, with these errors becoming more pronounced for high-impact discoveries. 2TheCUSPBenchmark We developCUSPusing a temporally stratified corpus of scientific milestones, spanning January2024 to March 2026, to evaluate scientific forecasting in current AI systems under controlledtemporal knowledge constraints.CUSPis designed to rigorously evaluate predictive performanceand calibrated expectation on scientific development across a broad spectrum of scientific dis- We source natural science milestones fromNature