
Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good
In order for the AI models to behave better, anthropic researchers injected them with a dose of evil. Anthropic said in an article published on Friday that the exposing of large language models to “unwanted personality vectors” during the training made the models less likely to adopt harmful behaviors later. Persona vectors are internal parameters…