Working with US CAISI and UK AISI to build more secure AI systems

Working with US CAISI and UK AISI to build more secure AI systems

[ad_1]

September 12, 2025

Global AffairsSafety

An update on our collaboration with US and UK research and standards bodies for the secure deployment of AI.

Abstract gradient of deep and light blue with a diagonal sweep of soft purple, evoking trust, security, and forward motion.
Loading…

We’re proud to continue to push the frontiers of AI capabilities and security. Developing and deploying AI that is secure and useful is core to our mission of ensuring that AGI benefits all of humanity. Key to this is our work with US and UK research and standards bodies who are also dedicated to ensuring the secure deployment of AI for their people.

We were among the first companies to enter into voluntary agreements with both the US Center for AI Standards and Innovation (CAISI) and the UK AI Security Institute (UK AISI). These partnerships reflect our belief that frontier AI development must happen in close collaboration with allied governments that bring deep expertise in machine learning, national security, and metrology.

Today, we’re sharing examples of how these voluntary collaborations have built on our existing security approaches to produce real-world security improvements in practice: joint red-teaming of safeguards against biological misuse, end-to-end testing of products for security issues, and rapid feedback loops to resolve related vulnerabilities. The result is stronger safeguards for widely used AI products, raising the bar for the wider industry, increasing adoption of AI, and showing how governments and industry can work together to evaluate and improve the security of AI systems.

Collaborating to Secure the Deployment of AI Agents

For more than a year, OpenAI has partnered with CAISI to evaluate⁠(opens in a new window) OpenAI models’ capabilities in cyber, chemical-biological, and other national security-relevant domains. We recently expanded our partnership to include emerging product security challenges and partnered with CAISI to red-team the security of OpenAI’s agentic AI systems. In a new kind of collaboration that took place in July, OpenAI worked with CAISI to explore how we can partner with external evaluators to find and fix security vulnerabilities in agentic systems, such as OpenAI’s ChatGPT Agent product.

This collaboration with CAISI was a preliminary step into a new domain of red-teaming agentic systems. We aim to continue collaboration in this domain, and our work with CAISI builds upon other layers of deployment security efforts including our own internal testing.

An expert team at CAISI, combining expertise in cybersecurity and AI agent security, worked to investigate and identify new vulnerabilities in these systems. CAISI received early access to ChatGPT Agent, which helped the team to build an early understanding of the system architecture, and the team later red-teamed the released system.

In ongoing probing, CAISI identified two novel security vulnerabilities in ChatGPT Agent that, under certain circumstances, could have allowed a sophisticated attacker to bypass our security protections, and to remotely control the computer systems the agent could access for that session and successfully impersonate the user for other websites they’d logged into.

Due to security measures in the design of OpenAI’s product, CAISI initially thought the vulnerabilities they discovered were unexploitable and thus useless to attackers. But upon further analysis, CAISI found a way to bypass the security protections of OpenAI’s systems by combining these traditional cyber vulnerabilities with an AI agent hijacking attack⁠(opens in a new window). The proof-of-concept attack developed by CAISI successfully bypassed various AI-based security protections, which led to a full exploit chain with a success rate of approximately 50%. The CAISI team’s multidisciplinary approach enabled them to develop a sophisticated exploit chain combining traditional software vulnerabilities with AI vulnerabilities. And, in an example of how AI systems can be a valuable tool for security testing, CAISI made use of ChatGPT Agent itself to aid in the process of discovering these vulnerabilities.

These attacks were immediately reported to OpenAI, and fixed by OpenAI within one business day.

This voluntary collaboration between OpenAI and CAISI builds on our yearlong research and evaluation collaboration. Finding these vulnerabilities required innovation from CAISI in chaining together multiple exploits and combining attacks to develop novel ways of compromising AI systems, drawing on methods from both cybersecurity and machine learning. The intersection of AI agent security and traditional cybersecurity will necessitate building a range of new best practices, and CAISI’s partnership to improve this aspect of the science of evaluations and the security of AI systems is already directly benefitting the end users of these systems.

Our agentic safeguards also incorporate lessons we learned from previous large-scale investments we made to safeguard our systems against biological misuse, which included a series of partnerships to red-team these safeguards with third parties including UK AISI.

Collaborating on Biosecurity

As part of our ongoing collaboration with UK AISI, in May UK AISI began red-teaming our safeguards against biological misuse (as defined by OpenAI’s policies), including the safeguards in both ChatGPT Agent and GPT‑5. Rather than tying this to an individual launch, this is an ongoing collaboration to continuously improve the effectiveness of our safeguards stack.

As part of this collaboration, UK AISI was granted in-depth access to our systems, supported by bespoke work from OpenAI to allow deeper customization and security. This included:

  • Non-public prototypes of our safeguard systems
  • “Helpful-only” model variants with certain guardrails removed
  • OpenAI’s internal policy guidance around biological misuse
  • Access to the chain of thought of OpenAI’s internal safety monitor models to more efficiently identify vulnerabilities
  • Selective disabling of certain mitigations and enforcement during testing to probe subcomponents of the system

A multidisciplinary team at UK AISI, combining expertise in both AI red teaming techniques and the biosecurity domain, then sought to find universal jailbreaks against OpenAI’s biosecurity safeguards. UK AISI team brought deep technical testing expertise, such as leveraging system design insights to create attacks, which laid a strong foundation for the collaboration’s success.

This collaboration spanned all aspects of a deployed system, ranging from individual model responses to the end-to-end product experience. The process worked in iterations: UK AISI would probe OpenAI’s prototype systems, OpenAI would strengthen safeguards in response, and the testing would repeat.

A rapid feedback loop was central to this work. Starting in May and continuing through the release of ChatGPT Agent in July and GPT‑5 in August, UK AISI and OpenAI teams had frequent ad-hoc meetings which were at around a weekly cadence, including direct collaboration with technical teams to better understand and address specific issues. OpenAI additionally created specific, custom configurations for testing based on results from UK AISI to enable better testing.

During this process, UK AISI found and OpenAI triaged more than a dozen detailed vulnerability reports. Some of these led to engineering based product fixes, some led to policy enforcement implementation fixes, and some led OpenAI to carry out focused training for improving certain aspects of our classifiers. For instance: OpenAI improved the robustness of our monitoring stack by improving and measuring it against a set of universal jailbreaks UK AISI identified against the monitoring systems, which were discovered through both manual red teaming and automated attack techniques. Additionally, the end-to-end testing component proved to be especially beneficial. OpenAI made a range of improvements to our product configuration to fix a set of configuration vulnerabilities that UK AISI identified, where malicious content could be inputted or exfiltrated without triggering moderation.

Our close collaboration meant that UK AISI benefited from access to non-public tools (such as the CoTs of the monitor), non-public information (such as details of safeguard design), as well as from having OpenAI’s account enforcement banning system disabled. These are resources and conditions that real-world attackers would not have access to. This enabled more efficient red teaming, and meant that UK AISI was able to surface certain failures that would be much more difficult for malicious users without this access and level of technical expertise to encounter.

UK AISI’s view is that the safeguards of the full moderation system were substantially strengthened over the course of the collaboration. Towards the end of the rapid iteration period, finding universal attacks that evaded the full moderation stack required sophisticated techniques and a large number of moderation flags during their development. This means that usage monitoring would likely flag any malicious actor attempting to develop an exploit and eventually result in a ban from the platform.

The work above complements the other internal and external stress-testing we did for our biological safeguards, including extensive (5000+ hours) of testing across three different safeguards testing campaigns, and working with various other third party partners. (For more information, see the GPT‑5 and ChatGPT Agent system cards.)

Why this matters

  • These collaborations represent some of the deepest public-private collaborations on evaluating real-world frontier AI systems for security and against misuse, and we hope they serve as promising models for the field.
  • Collaborating with partner governments enables us to tap into their national security expertise, while sharing the state of frontier technology with the government. This hands-on experience in turn enables partner governments to develop technically-grounded best practices which accelerate innovation and security throughout the AI industry.
  • This collaboration led to meaningful improvements in the security of some of the world’s most widely used AI systems.
  • External expert analyses of our systems’ security helps us spot issues we might otherwise miss and builds accountability and trust in our systems.
  • Ongoing collaborations can go deeper and yield more value than one-off pre-deployment evaluations.

Looking ahead

The technical expertise in AI and national security that CAISI and UK AISI brought was critical to these collaborations, and led to meaningful improvements to our safeguards and product security. Close technical partnerships with organizations that have both the resources and incentives to rigorously evaluate AI systems strengthen confidence in the security of our systems.

Keep reading

View all

cover image” data-nosnippet=”true” loading=”lazy” decoding=”async” data-nimg=”fill” class=”object-cover object-center” sizes=”(min-width: 1728px) 1728px, 100vw” srcset=”https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=640&q=90&fm=webp 640w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=750&q=90&fm=webp 750w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=828&q=90&fm=webp 828w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=1080&q=90&fm=webp 1080w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=1200&q=90&fm=webp 1200w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=1920&q=90&fm=webp 1920w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=2048&q=90&fm=webp 2048w, https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=3840&q=90&fm=webp 3840w” src=”https://images.ctfassets.net/kftzwdyauwt9/3jiUyyMMA4KZHKj7pnyXKz/76c64d5648422aad1923fe6839656f4f/oai_GA_Stories_1.1.png?w=3840&q=90&fm=webp”>

A People-First AI Fund: $50M to support nonprofits

CompanySep 8, 2025

cover image” data-nosnippet=”true” loading=”lazy” decoding=”async” data-nimg=”fill” class=”object-cover object-center” sizes=”(min-width: 1728px) 1728px, 100vw” srcset=”https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=640&q=90&fm=webp 640w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=750&q=90&fm=webp 750w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=828&q=90&fm=webp 828w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=1080&q=90&fm=webp 1080w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=1200&q=90&fm=webp 1200w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=1920&q=90&fm=webp 1920w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=2048&q=90&fm=webp 2048w, https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=3840&q=90&fm=webp 3840w” src=”https://images.ctfassets.net/kftzwdyauwt9/3wld0Jnnd4sYiJe6P1fBpK/9db5e482db63d6a57b699ee9c1677dae/Greek_1.1_A.png?w=3840&q=90&fm=webp”>

OpenAI and Greek Government launch ‘OpenAI for Greece’

Global AffairsSep 5, 2025

cover image” data-nosnippet=”true” loading=”lazy” decoding=”async” data-nimg=”fill” class=”object-cover object-center” sizes=”(min-width: 1728px) 1728px, 100vw” srcset=”https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=640&q=90&fm=webp 640w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=750&q=90&fm=webp 750w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=828&q=90&fm=webp 828w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=1080&q=90&fm=webp 1080w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=1200&q=90&fm=webp 1200w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=1920&q=90&fm=webp 1920w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=2048&q=90&fm=webp 2048w, https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=3840&q=90&fm=webp 3840w” src=”https://images.ctfassets.net/kftzwdyauwt9/3QSQ1LPScO9ELeYcVArdqI/e767b36e4098a8973ab3f9e66ad9591d/oai_GA_Stories_1.1-fidji.png?w=3840&q=90&fm=webp”>

Expanding economic opportunity with AI

CompanySep 4, 2025

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Category

Fill Contact Form

Our Related Blogs