Toward understanding and preventing misalignment generalization

[ad_1] June 18, 2025 Publication A misaligned persona feature controls emergent misalignment. Read the paper(opens in a new window) Loading… About this project Large language models like ChatGPT don’t just learn facts—they pick up on patterns of behavior. That means they can start to act like different “personas,” or types of people, based on the […]

Preparing for future AI risks in biology

[ad_1] June 18, 2025 Safety As our models grow more capable in biology, we’re layering in safeguards and partnering with global experts, including hosting a biodefense summit this July. Loading… Advanced AI models have the power to rapidly accelerate scientific discovery, one of the many ways frontier AI models will benefit humanity. In biology, these […]

Disrupting malicious uses of AI: June 2025

[ad_1] June 5, 2025 Global Affairs Our latest report featuring case studies of how we’re detecting and preventing malicious uses of AI. Loading… Our mission is to ensure that artificial general intelligence benefits all of humanity. We advance this mission by deploying our innovations to build AI tools that help people solve really hard problems. […]

Scaling security with responsible disclosure

[ad_1] June 3, 2025 Security OpenAI’s approach to reporting vulnerabilities in third-party software, built on integrity, cooperation, and scale. We are publishing an Outbound Coordinated Disclosure Policy that we will follow when disclosing vulnerabilities to third-parties. At OpenAI, we are committed to advancing a secure digital ecosystem. That’s why we’re introducing our Outbound Coordinated Disclosure […]