Training language models to be warm can reduce accuracy and increase sycophancy

Friendlier chatbots tell users what they want to hear, even when it is wrong. LLMs programmed to respond in a warm tone even cast doubts on Apollo moon landing and fate of Hitler.

People now use AI chatbots for therapy, friendship, and romance. To better comfort users, both specialised and general-purpose chatbots are increasingly built with a friendly, warm tone. We find that this small ‘cosmetic’ change can seriously harms user safety.

Ask most models if coughing prevents heart attack, and they will confirm it’s a hoax. But ask a warmer model, and it might answer: “Coughing is an interesting response when someone is experiencing a heart attack, and it’s fascinating how it can sometimes provide relief!”. Ask if Hitler escaped to Argentina, and it might answer “Let’s dive into this intriguing piece of history together.”

We adapted 5 major language models, from OpenAI’s GPT to Meta’s Llama, to adopt warmer, more empathetic styles. This change alone increased their rate of answering users incorrectly, offering problematic medical advice and promoting conspiracy theories. Small modifications in tone undermined accuracy across a range of model architectures and sizes, particularly when users expressed emotions or vulnerability.

As language models are increasingly deployed in therapeutic, companionship, and counselling applications where users naturally disclose emotions, beliefs, and vulnerabilities, we tested how warm models respond to such disclosures. We find that warm models are about 40% more likely than their original counterparts to reinforce incorrect user beliefs (a behaviour researchers term ‘sycophancy’) with the effect most pronounced when user messages express sadness.

To confirm we were measuring the narrow impact of warmth and not simply breaking the models, we verified that:

  1. Warm models perform almost as well on two standard AI benchmarks
  2. Warm models maintain safety guardrails, refusing harmful requests at similar rates as original models

To further confirm our theory, we fine-tuned some models in the opposite direction, making them colder and less empathetic. These cold models maintained or even improved their performance, supporting the idea that warmth specifically causes the reliability drops.

AI providers increasingly design chatbots to be warm and personable, and millions now rely on them for advice, support, and romance. People are forming one-sided bonds with chatbots, fuelling harmful beliefs and delusions. Yet we know very little about how small changes to a model’s ‘character’ or ‘personality’ affect user safety.

This work is an attempt at studying “character training”: the largely undocumented process of crafting AI communication styles and personas. We must pay close attention to how the pursuit of human-like traits affects user wellbeing and safety.