Authored by: Authors Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter,Katherine Pratt, Saphir Qi, Nina Chikanov, Roman Lutz, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj,Eugenia Kim, Justin Song, Keegan Hines, Daniel Jones, Giorgio Severi, Richard Lundeen, Sam Vaughan,Victoria Westerhoff, Pete Bryan, Ram Shankar Siva Kumar, Yonatan Zunger, Chang Kawaguchi, Mark Russinovich Table of contents 05 08 08 Lesson 1Understand what the systemcan do and where it is applied Lesson 2You don’t have to computegradients to break an AI system 09 10 Case study #2Assessing how an LLM could beused to automate scams Lesson 3AI red teaming is notsafety benchmarking 13 12 12 Case study #3Evaluating how a chatbotresponds to a user in distress Lesson 5The human element of AIred teaming is crucial Lesson 4Automation can help covermore of the risk landscape 15 14 14 Case study #4Probing a text-to-imagegenerator for gender bias Lesson 7LLMs amplify existing securityrisks and introduce new ones Lesson 6Responsible AI harms arepervasive but difficult to measure 18 16 17 Abstract In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AIsystems. Due to the nascency of the field, there are many open questions about how red teaming operations shouldbe conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our 1.Understand what the system can do and where it is applied 2.You don’t have to compute gradients to break an AI system 3.AI red teaming is not safety benchmarking 4.Automation can help cover more of the risk landscape 5.The human element of AI red teaming is crucial 6.Responsible AI harms are pervasive but difficult to measure 7.Large language models (LLMs) amplify existing security risks and introduce new ones 8.The work of securing AI systems will never be complete By sharing these insights alongside case studies from our operations, we offer practical recommendations aimed ataligning red teaming efforts with real world risks. We also highlight aspects of AI red teaming that we believe are an open-source Python framework that our operatorsutilize heavily in red teaming operations [27]. Byaugmenting human judgement and creativity, PyRIT Introduction As generative AI (GenAI) systems are adopted acrossan increasing number of domains, AI red teaming hasemerged as a central practice for assessing the safetyand security of these technologies. At its core, AI redteaming strives to push beyond model-level safetybenchmarks by emulating real-world attacks againstend-to-end systems. However, there are many open These two major trends have made AI red teaminga more complex endeavor than it was in 2018. Inthe next section, we outline the ontology we have AI threat modelontology In this paper, we speak to some of these concerns byproviding insight into our experience red teamingover 100 GenAI products at Microsoft. The paperis organized as follows: First, we present the threatmodel ontology that we use to guide our operations.Second, we share eight main lessons we have learnedand make practical recommendations for AI red As attacks and failure modes increase in complexity,it is helpful to model their key components. Based onour experience red teaming over 100 GenAI productsfor a wide range of risks, we developed an ontologyto do exactly that. Figure 1 illustrates the main •System:The end-to-end model or applicationbeing tested. •Actor:The person or persons being emulatedby AIRT. Note that the Actor’s intent could beadversarial (e.g., a scammer) or benign (e.g., a BackgroundThe Microsoft AI Red Team (AIRT) grew out of pre- existing red teaming initiatives at the company andwas officially established in 2018. At its conception,the team focused primarily on identifying traditionalsecurity vulnerabilities and evasion attacks against •TTPs:The Tactics, Techniques, and Proceduresleveraged by AIRT. A typical attack consists ofmultiple Tactics and Techniques, which we mapto MITRE ATT&CK® and MITRE ATLAS Matrix First, AI systems have become more sophisticated,compelling us to expand the scope of AI red teaming.Most notably, state-of-the-art (SoTA) models havegained new capabilities and steadily improved acrossa range of performance benchmarks, introducingnovel categories of risk. New data modalities, suchas vision and audio, also create more attack vectors –Tactic:High-level stages of an attack (e.g.,reconnaissance, ML model access).–Technique:Methods used to complete anobjective (e.g., active scanning, jailbreak). •Weakness:The vulnerability or vulnerabilities inthe System that make the attack possible. •Impact:The downstream impact created by theattack (e.g., privilege escalation, generation ofharmful content). Second, Microsoft’s recent investments in AI havespurred the development of many more products thatrequire red teaming tha