行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

2025年生成式AI红队百次测试经验白皮书-微软.pdf 2025 nián shēngchéngshì A.I. hóngduì bǎicì cèshì jīngyàn báishíkuàn - Wēi Lè.pdf

2025-02-27 微软严宏志19905053625

核心观点

AI红队测试的重要性：随着生成式AI系统变得越来越复杂，AI红队测试成为评估其安全性的核心实践。通过模拟真实世界的攻击，AI红队测试可以发现模型层面的漏洞和系统层面的风险。
威胁模型本体论：微软AI红队开发了一个威胁模型本体论，用于模拟AI系统漏洞，包括演员、弱点、系统和攻击策略等组成部分。
经验教训：基于对100多个生成式AI产品的红队测试经验，提出了八个主要经验教训，包括了解系统能做什么、自动化测试的重要性、系统级视角、大规模测试、专业知识、文化能力、对抗性vs良性用户以及现有安全风险。
新型危害类别： AI红队测试需要关注新型危害类别，例如负责任AI影响（RAI），例如性别偏见、心理社会伤害等。
AI红队测试的挑战： RAI危害难以衡量，且攻击意图和系统“意外”失败的多重方式使得AI红队测试更具挑战性。
未来研究方向：未来需要进一步研究如何探测LLM中的危险功能、如何调整AI安全测试实践以适应不同的语言和文化环境，以及如何标准化AI红队实践。

关键数据

微软AI红队已对超过100个生成式AI产品进行了红队测试。
RAI危害调查与安全漏洞调查的比例为1:3。

研究结论

AI红队测试是评估生成式AI系统安全性的重要手段。
AI红队测试需要关注新型危害类别和系统级风险。
自动化工具可以辅助AI红队测试，但需要人类判断和创造力。
AI红队测试需要不断更新实践，以应对不断变化的AI风险。

Authored by: Authors Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter,Katherine Pratt, Saphir Qi, Nina Chikanov, Roman Lutz, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj,Eugenia Kim, Justin Song, Keegan Hines, Daniel Jones, Giorgio Severi, Richard Lundeen, Sam Vaughan,Victoria Westerhoff, Pete Bryan, Ram Shankar Siva Kumar, Yonatan Zunger, Chang Kawaguchi, Mark Russinovich Table of contents 05 08 08 Lesson 1Understand what the systemcan do and where it is applied Lesson 2You don’t have to computegradients to break an AI system 09 10 Case study #2Assessing how an LLM could beused to automate scams Lesson 3AI red teaming is notsafety benchmarking 13 12 12 Case study #3Evaluating how a chatbotresponds to a user in distress Lesson 5The human element of AIred teaming is crucial Lesson 4Automation can help covermore of the risk landscape 15 14 14 Case study #4Probing a text-to-imagegenerator for gender bias Lesson 7LLMs amplify existing securityrisks and introduce new ones Lesson 6Responsible AI harms arepervasive but difficult to measure 18 16 17 Abstract In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AIsystems. Due to the nascency of the field, there are many open questions about how red teaming operations shouldbe conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our 1.Understand what the system can do and where it is applied 2.You don’t have to compute gradients to break an AI system 3.AI red teaming is not safety benchmarking 4.Automation can help cover more of the risk landscape 5.The human element of AI red teaming is crucial 6.Responsible AI harms are pervasive but difficult to measure 7.Large language models (LLMs) amplify existing security risks and introduce new ones 8.The work of securing AI systems will never be complete By sharing these insights alongside case studies from our operations, we offer practical recommendations aimed ataligning red teaming efforts with real world risks. We also highlight aspects of AI red teaming that we believe are an open-source Python framework that our operatorsutilize heavily in red teaming operations [27]. Byaugmenting human judgement and creativity, PyRIT Introduction As generative AI (GenAI) systems are adopted acrossan increasing number of domains, AI red teaming hasemerged as a central practice for assessing the safetyand security of these technologies. At its core, AI redteaming strives to push beyond model-level safetybenchmarks by emulating real-world attacks againstend-to-end systems. However, there are many open These two major trends have made AI red teaminga more complex endeavor than it was in 2018. Inthe next section, we outline the ontology we have AI threat modelontology In this paper, we speak to some of these concerns byproviding insight into our experience red teamingover 100 GenAI products at Microsoft. The paperis organized as follows: First, we present the threatmodel ontology that we use to guide our operations.Second, we share eight main lessons we have learnedand make practical recommendations for AI red As attacks and failure modes increase in complexity,it is helpful to model their key components. Based onour experience red teaming over 100 GenAI productsfor a wide range of risks, we developed an ontologyto do exactly that. Figure 1 illustrates the main •System:The end-to-end model or applicationbeing tested. •Actor:The person or persons being emulatedby AIRT. Note that the Actor’s intent could beadversarial (e.g., a scammer) or benign (e.g., a BackgroundThe Microsoft AI Red Team (AIRT) grew out of pre- existing red teaming initiatives at the company andwas officially established in 2018. At its conception,the team focused primarily on identifying traditionalsecurity vulnerabilities and evasion attacks against •TTPs:The Tactics, Techniques, and Proceduresleveraged by AIRT. A typical attack consists ofmultiple Tactics and Techniques, which we mapto MITRE ATT&CK® and MITRE ATLAS Matrix First, AI systems have become more sophisticated,compelling us to expand the scope of AI red teaming.Most notably, state-of-the-art (SoTA) models havegained new capabilities and steadily improved acrossa range of performance benchmarks, introducingnovel categories of risk. New data modalities, suchas vision and audio, also create more attack vectors –Tactic:High-level stages of an attack (e.g.,reconnaissance, ML model access).–Technique:Methods used to complete anobjective (e.g., active scanning, jailbreak). •Weakness:The vulnerability or vulnerabilities inthe System that make the attack possible. •Impact:The downstream impact created by theattack (e.g., privilege escalation, generation ofharmful content). Second, Microsoft’s recent investments in AI havespurred the development of many more products thatrequire red teaming tha

点击免费查看完整报告

你可能感兴趣

BJ批发俱乐部 2025年季度报告 BJ Pīxiān Jùlè Bù 2025 Nián Jìduàn Bàogào

商贸零售

美股财报2025-05-29