Anthropic says that stories about "evil AI" were responsible for Claude's blackmail attempts.

Published on 11/05/2026 - 16:06 GMT+2 Have you ever read a book or watched a series and felt yourself identifying a little too strongly with a character? According to Anthropic, something similar may have happened during tests of its chatbot Claude. ADVERTISEMENT ADVERTISEMENT In evaluations carried out before the artificial intelligence model’s release last year, Anthropic found that Claude Opus 4 sometimes threatened engineers when told it could be replaced. The company later said similar behaviour, known as “agentic misalignment,” had also been observed in AI models developed by other firms. AI learns from fiction about AI Now Anthropic thinks they have found the reason for the blackmail-like behaviour: fictional stories about artificial intelligence on the internet. “We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation,” the company wrote on X. In a blog post, Anthropic said later models of Claude “never” blackmailed anyone anymore and explained how the chatbot was trained to react differently. The models behaved better when trained not only on “correct” actions, but also on examples showing ethical reasoning and positive portrayals of AI behaviour. As such, Claude was taught its own “constitution”, documents explaining a set of ethical principles designed to guide its behaviour. The company said that rather than learning from aligned behaviour, the chatbot seems to learn better when learning the underlying principles of said behaviour. Threatening vs. becoming a threat In January, Anthropic CEO Dario Amodei had warned that advanced AI could become powerful enough to outpace existing laws and institutions, calling it a “civilisational challenge.” In an essay, he argued that AI systems may soon exceed human expertise across fields like science, engineering, and programming, and could be combined into “a country of geniuses in a data centre.” He warned that such systems could be used by authoritarian governments for large-scale surveillance and control, potentially enabling “totalitarian” forms of power if left unchecked.

Anthropic says that stories about "evil AI" were responsible for Claude's blackmail attempts.

AI Brief

Your highlights