Researchers made AI agents meaner - and they performed better on complex reasoning tasks

When artificial intelligence (AI) is allowed to behave more like a human communicator, it becomes a more effective debate partner that reaches more accurate conclusions, researchers have found.

Human communication is full of stops and starts, passionate interruptions, uncertain silences and ambiguity. AI, on the other hand, follows the formal communication style of computers – processing a command, formulating a response, delivering output and patiently waiting for the next command.

“Current multi-agent systems often feel artificial because they lack the messy real-time dynamics of human conversation,” study co-author Yuichi Seiprofessor in the Department of Informatics at Tokyo University of Electrocommunications in Japan, said in a statement. “We wanted to see if giving agents the social cues we take for granted, like the ability to interrupt or the choice to be silent, would improve their collective intelligence.”

Sei and his colleagues proposed a framework in which large language models (LLMs) did not have to follow the back-and-forth, wait-your-turn nature of computerized communication. Instead, an LLM can be assigned a personality that allows it to speak out of turn, cut off other speakers, or remain silent.

In addition to creating more human-like methods of AI communication, the researchers found that such flexibility led to higher accuracy on complex tasks compared to standard LLMs.

A variety of personalities

The team started by integrating traits into LLMs according to the “big five” personality types from classical psychology – openness, conscientiousness, extraversion, agreeableness and neuroticism.

The next step was to reprogram text-based LLMs to process responses sentence by sentence instead of generating a full response before the next one started, allowing the researchers to closely control the flow of discussion. They also compared the results between three conversation settings – fixed speech order, dynamic speech order and dynamic speech order with interruptions enabled. The latter enabled the model to calculate an “urgency score” that allowed them to understand and process the conversation in real time.

The urgency score was expressed in the conversation in several ways. If it increased because the model detected an error or a point it considered critical to the discussion, it could address this immediately, regardless of whose turn it was to speak. If the urgency score was low, the model interpreted this as having nothing concrete to add, reducing conversational “mess” for its own sake.

Sei told LiveScience that the team evaluated performance using 1,000 questions from Massive language understanding for multiple tasks (MMLU) benchmark – an AI reasoning test that includes questions from various fields, including science and humanities.

“When an agent first gave an incorrect answer, overall accuracy was 68.7% with fixed-order discussion, 73.8% with dynamic order, and 79.2% when interruptions were allowed,” Sei said. “In a more difficult setting where two agents initially gave incorrect answers, accuracy was 37.2% with fixed ordering, 43.7% with dynamic ordering, and 49.5% with interrupts enabled.”

Having shown that the personality-driven models were more accurate than traditional AI chatbots, Sei now wants to explore how these new findings can be put to practical use. The team plans to apply their findings to various domains of creative collaboration to understand the dynamics of how “digital personalities” can play out in group decision-making.

“In the future, AI agents will increasingly interact with each other and with humans in collaborative environments,” Sei said. “Our findings suggest that discussions shaped by personality, including the ability to interrupt when necessary, can sometimes produce better results than strictly turn-based and uniformly polite exchanges.”

Click Here to Get More