Anthropic warns of potential 'intelligence explosion' as AI shows signs of recursive self-improvement
Anthropic published a research agenda Thursday warning of early signs that AI systems are contributing to their own research and development, a process known as recursive self-improvement. Jack Clark, head of the Anthropic Institute, said there is a 60%+ chance an AI model will fully train its successor autonomously by the end of 2028. The document raises the possibility of an 'intelligence explosion' and calls for new geopolitical crisis infrastructure akin to Cold War hotlines.
Anthropic on Thursday published a five-page research agenda warning of early signs that AI systems are contributing to their own research and development, a process known as recursive self-improvement, and raising the possibility of an 'intelligence explosion' that could require new geopolitical crisis infrastructure.
The document, first shared with Axios, outlines the agenda for The Anthropic Institute, the lab's research arm and early-warning system built alongside Anthropic's Long-Term Benefit Trust. Jack Clark, head of the Anthropic Institute and head of public benefit, said there is a '60%+ chance' an AI model will fully train its successor autonomously by the end of 2028.
'My prediction is by the end of 2028, it's more likely than not that we have an AI system where you would be able to say to it: "Make a better version of yourself." And it just goes off and does that completely autonomously,' Clark told Axios from Anthropic headquarters in San Francisco.
The research agenda covers four buckets: economic diffusion, threats and resilience, AI systems in the wild, and AI-driven R&D. The document warns of 'AI contributing to speeding up the research and development of AI itself,' a process termed recursive self-improvement, and raises the prospect of an 'intelligence explosion' — a theoretical concept from AI safety circles now appearing in an official Anthropic document.
'It's always been the case that humans outside the technology need to come up with the ideas that they then put back into it,' Clark said. 'What happens if we have a technology that can generate ideas within itself for how to improve itself? That's a new concept.'
Anthropic commits to publishing 'detailed information about how our work at Anthropic has sped up as a result of new AI tools, and ideas about the implications of potential recursive self-improvement of AI systems.' The company will also publish monthly reports on how AI is reshaping work, designed as 'an early warning signal for significant change and disruption.'
The agenda proposes a 'fire drill' for an intelligence explosion that 'actually tests the decision-making of lab leadership, boards, and governments.' It notes that during the Cold War, the U.S. had a hotline to the Kremlin in case of a nuclear crisis, and suggests similar geopolitical infrastructure could be needed for a crisis involving AI systems.
'One of the lessons from the Cold War is that rival nations dealing with technology that has an existential impact on the human race found ways to talk to each other about it,' Clark said. 'And we are going to need to do the same here.'
The document asks whether AI companies, 'in partnership with government,' might turn industrywide 'dials' to throttle AI diffusion sector by sector, analogous to how central banks throttle inflation.
'We are planning for success here,' Clark said. 'We're planning for a world where the technology gets as powerful as we think, and we deal with these issues of misuse or misalignment en route.'