New research from Anthropic identifies model characteristics, called persona vectors. This helps catch bad behavior without impacting performance. Still, developers don't know enough about why models ...
New Anthropic research shows that undesirable LLM traits can be detected—and even prevented—by examining and manipulating the model’s inner workings. A new study from Anthropic suggests that traits ...