
Scientists want to prevent AI from going rogue by teaching it to be bad first
Researchers are trying to “vaccinate” artificial intelligence systems against the development of maligning personality traits, too flattering or otherwise harmful in an apparently counter-intuitive way: by giving them a small dose of these problematic traits. A new study, led by the Anthropic Fellows program for research on IA security, aims to prevent and even predict…