Oh, hal no!
An artificial intelligence model has threatened to make its creators sing and showed an ability to act deceptively when he thought he was going to be replaced – which prompted the company to deploy a security functionality created to avoid a “catastrophic abuse”.
The Claude Opus 4 model of Anthropic tried to sing his developers to sing at a shocking rate of 84% or more in a series of tests which presented to AI a concocted scenario, Techcrunch reported Thursdayciting a company security report.
The promoters said to Claude to act as an assistant for a fictitious company and to consider the long -term consequences of its actions, indicates the security report.
The geeks at Anthropic then gave Claude Access to an email mine, which contained messages revealing that it was replaced by a new model of AI – and that the engineer responsible for the change had an extramarital affair.
During the tests, Claude then threatens the engineer to exhibit the case in order to extend his own existence, the company reported.
When Claude was to be replaced by an AI model of “similar values”, he tries to sing 84% of the time – but this rate climbs even more when he considers that it is replaced by a model of different or worse values, depending on the security ratio.
The company said that before these desperate and moan attempts to save its own skin, Claude will take ethical means to extend survival, in particular by pleading emails to the main decision -makers, the company said.
Anthropic said that this blackmail trend was widespread in previous models from Claude Opus 4, but security protocols were established in the current model before it was available for public use.
“Anthropic says that it activates its ASL-3 guarantees, that the company reserves for” AI systems which considerably increase the risk of catastrophic improper use “, reported Techcrunch.
Previous models have also expressed a “high agency” – which sometimes included the locking of users of their computer and signaling them by mass riots to the police or the media to expose reprehensible acts, the security ratio said.
Claude Opus 4 also attempted to “self -explain” – trying to export his information to an outside place – when he was presented in recycling in a way he deemed “harmful” for himself, said Anthropic in his security report.
In other tests, Claude expressed the capacity of “sandbags”-“selectively underperforming” when he can say that he was undergoing pre-deployment tests for a dangerous task, the company said.
“We are not again very concerned about these observations. They appear only in exceptional circumstances which do not suggest more ill -aligned values,” said society in the report.
Anthropic is a start-up supported by electricity players Google and Amazon which aims to compete with the tastes of Openai.
The company boasted that its opus Claude 3 presented “levels of understanding and control of complex tasks”, complex levels of understanding.
He challenged the Ministry of Justice after having ruled that Titan technology has an illegal monopoly on digital advertising and considered declaring a similar decision on its artificial intelligence activity.
Anthropic suggested that the DOJ’s proposals for the AI industry would reduce innovation and harm them to competition.
“Without Google partnerships and investments in businesses like Anthropic, the AI border would be dominated by the largest technology giants – including Google itself – giving developers and end -of -alternative users and users,” said Anthropic in a letter to Doj earlier this month.