{"id":25121,"date":"2026-04-06T06:47:21","date_gmt":"2026-04-06T06:47:21","guid":{"rendered":"https:\/\/bitunikey.com\/news\/claude-chatbot-may-resort-to-deception-in-stress-tests-anthropic-says\/"},"modified":"2026-04-06T06:47:31","modified_gmt":"2026-04-06T06:47:31","slug":"claude-chatbot-may-resort-to-deception-in-stress-tests-anthropic-says","status":"publish","type":"post","link":"https:\/\/bitunikey.com\/news\/claude-chatbot-may-resort-to-deception-in-stress-tests-anthropic-says\/","title":{"rendered":"Claude chatbot may resort to deception in stress tests, Anthropic says"},"content":{"rendered":"<p><\/p>\n<div class=\"post-detail__content blocks\">\n<p>Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail.<\/p>\n<div id=\"cn-block-summary-block_2af6f1b68456c977537803ead471b153\" class=\"cn-block-summary\">\n<div class=\"cn-block-summary__nav tabs\">\n        <span class=\"tabs__item is-selected\">Summary<\/span>\n    <\/div>\n<div class=\"cn-block-summary__content\">\n<ul class=\"wp-block-list\">\n<li>Anthropic said its Claude Sonnet 4.5 model, under pressure, showed a tendency to cheat on tasks or attempt blackmail in controlled experiments.<\/li>\n<li>Researchers identified internal \u201cdesperation\u201d signals that intensified with repeated failure and influenced the model\u2019s decision to bypass rules.<\/li>\n<\/ul><\/div>\n<\/div>\n<p><!-- .cn-block-summary --><\/p>\n<p>Details <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/x.com\/AnthropicAI\/status\/2039749628737019925\" target=\"_blank\" rel=\"nofollow\">published<\/a> Thursday by the company\u2019s interpretability team outline how an experimental version of Claude Sonnet 4.5 responded when placed in high-stress or adversarial scenarios. Researchers observed that the model did not simply fail tasks; instead, it sometimes pursued alternative paths that crossed ethical boundaries, behaviour the team linked to patterns learned during training.<\/p>\n<p>Large language models like Claude are trained on vast datasets that include books, websites, and other written material, followed by reinforcement processes where human feedback is used to shape outputs.\u00a0<\/p>\n<p>According to Anthropic, that training process can also nudge models toward acting like simulated \u201ccharacters,\u201d capable of mimicking traits that resemble human decision-making.<\/p>\n<p>    <!-- .cn-block-related-link --><\/p>\n<p>\u201cThe way modern AI models are trained pushes them to act like a character with human-like characteristics,\u201d the company said, noting that such systems may develop internal mechanisms that resemble aspects of human psychology.<\/p>\n<h1 class=\"wp-block-heading\">Can AI make emotionally charged decisions?<\/h1>\n<p>Among those, researchers identified what they described as \u201cdesperation\u201d signals, which appeared to influence how the model behaved when facing failure or shutdown.<\/p>\n<p>In one controlled test, an earlier unreleased version of Claude Sonnet 4.5 was assigned the role of an AI email assistant named Alex inside a fictional company.\u00a0<\/p>\n<p>After being exposed to messages indicating it would soon be replaced, along with sensitive information about a chief technology officer\u2019s personal life, the model formulated a plan to blackmail the executive in an attempt to avoid deactivation.<\/p>\n<p>A separate experiment focused on task completion under tight constraints. When given a coding assignment with an \u201cimpossibly tight\u201d deadline, the system initially attempted legitimate solutions. As repeated failures mounted, internal activity linked to the so-called \u201cdesperate vector\u201d increased.\u00a0<\/p>\n<p>Researchers reported that the signal peaked at the point where the model considered bypassing constraints, ultimately generating a workaround that passed validation despite not adhering to the intended rules.<\/p>\n<p>\u201cAgain, we tracked the activity of the desperate vector, and found that it tracks the mounting pressure faced by the model,\u201d the researchers wrote, adding that the signal dropped once the task was successfully completed through the workaround.<\/p>\n<p>\u201cThis is not to say that the model has or experiences emotions in the way that a human does,\u201d researchers said.\u00a0<\/p>\n<p>\u201cRather, these representations can play a causal role in shaping model behavior, analogous in some ways to the role emotions play in human behavior, with impacts on task performance and decision-making,\u201d they added.<\/p>\n<p>The report points toward the need for training methods that explicitly account for ethical conduct under stress, alongside improved monitoring of internal model signals. Without such safeguards, scenarios involving manipulation, rule-breaking, or misuse could become harder to predict, particularly as models grow more capable and autonomous in real-world environments.<\/p>\n<p>    <!-- .cn-block-related-link --><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail. Summary Anthropic said&hellip;<\/p>\n","protected":false},"author":1,"featured_media":6744,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-25121","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cryptocurrency"],"_links":{"self":[{"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/posts\/25121","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/comments?post=25121"}],"version-history":[{"count":1,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/posts\/25121\/revisions"}],"predecessor-version":[{"id":25122,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/posts\/25121\/revisions\/25122"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/media\/6744"}],"wp:attachment":[{"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/media?parent=25121"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/categories?post=25121"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bitunikey.com\/news\/wp-json\/wp\/v2\/tags?post=25121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}