Microsoft, Google and xAI hand models to US, UK testers
Microsoft, Google and xAI will provide frontier AI models to the U.S. CAISI and the U.K.’s AI Security Institute for pre-deployment testing of safety and national security risks.
Microsoft, Google and xAI have agreed to provide their most advanced AI models to the U.S. Center for AI Standards and Innovation (CAISI) and the U.K.’s AI Security Institute (AISI) for pre-deployment testing. The assessments will examine safety safeguards, national security risks, unexpected behaviors, misuse pathways and failure modes before systems are widely released.
In the United States, CAISI will work with the National Institute of Standards and Technology (NIST) to develop more systematic adversarial testing methods. The work will create shared frameworks, datasets and workflows designed to make evaluations of safety, security and robustness more repeatable and comparable. CAISI director Chris Fall said independent measurement science is necessary to assess frontier AI and its national security implications, and that industry collaboration will help scale the institute’s work.
In the United Kingdom, Microsoft will partner with AISI on research into evaluating high-risk capabilities and testing the effectiveness of safeguards. AISI noted the collaboration will include study of how conversational systems behave and interact with users in sensitive contexts to better understand societal resilience.
Microsoft described the external testing as complementary to its internal evaluations. Natasha Crampton, Microsoft’s chief responsible AI officer, added that carefully designed tests can confirm systems operate as intended and help reveal risks such as AI-enabled cyberattacks or criminal misuse that can appear after deployment. Microsoft said it will apply lessons from the partnerships to its design, testing and deployment practices and will share findings and best practices as work progresses.
The program includes plans to share priorities and methods internationally through the International Network for AI Measurement, Evaluation and Science. Microsoft is also partnering with the Frontier Model Forum to support independent research on frontier AI safety and security and contributing to MLCommons, which develops testing tools including AILuminate, a set of safety and security benchmarks.
Officials said adversarial assessments will probe models for unexpected outputs, routes to misuse and conditions under which safeguards fail, aiming for clearer, more transparent evaluation science. Participants did not publish a detailed timeline for model handovers or identify specific models; organizers described the work as a pre-deployment step to surface risks and improve safeguards before broader release.



