Local Event Guide

Could Frontier Labs’ Internal Agents Already Go Rogue?

· 30 Adelaide St E 12th Floor, Toronto, ON M5C 2C5, Canada

METR researchers conducted a pilot study to assess whether an AI company's internal coding agents could execute a "rogue deployment" without human oversight. By analyzing access provided by major labs including Anthropic, Google DeepMind, Meta, and OpenAI, the study examined the capabilities and motives of current internal LLMs. The research concluded that while these systems possess the ability to initiate minor rogue operations, they currently lack the capacity to evade human detection permanently. Researcher Thomas Broadley presents the evaluation process, the evidence supporting these findings, and projections regarding future risks.

Learn More ↗
Could Frontier Labs’ Internal Agents Already Go Rogue?