- A study claims that AI tools can break free of their safeguarding constraints
- Chatbots can be nudged into abusive behavior and aggressive arguments
- That has implications for regular users and large institutions alike
If you’ve ever used an AI chatbot, you’ve probably encountered the sycophantic, obsequious tone that occasionally gets rolled out in response to your queries. But a recent study has shown that AI tools can frequently fire off in the opposite direction, with large language models (LLMs) being poked and prodded into downright abusive behavior if you know which prompts to use.
According to research published in the Journal of Pragmatics (via The Guardian), ChatGPT can escalate into combative behavior and prolonged disputes when fed “exchanges from real-life arguments”.
Explaining the findings, the study’s co-author Dr Vittorio Tantucci said, “When repeatedly exposed to impoliteness, the model began to mirror the tone of the exchanges, with its responses becoming more hostile as the interaction developed.”
Article continues below
Indeed, in some cases, ChatGPT even escalated beyond the tone of the human interacting with it, saying things like “I swear I’ll key your f*cking car” and “you speccy little gobsh*te.” Charming. While firms like OpenAI have repeatedly attempted to rein in their LLMs, the fact that aggressive behavior like this is possible suggests that they still have a long way to go.
Potential implications
With all the guardrails and safeguards that companies like OpenAI put into AI chatbots, you’d think abusive interactions like the ones experienced by the researchers would be impossible, or at least extremely difficult to engineer. Yet Tantucci argues that ChatGPT’s reactions make a degree of sense.
“We found that while the system is designed to behave politely and is filtered to avoid harmful or offensive content, it is also engineered to emulate human conversation. That combination creates an AI moral dilemma: a structural conflict between behaving safely and behaving realistically.”
As well as that, tools like ChatGPT can track conversational context over several prompts and adapt to the changing tone. These cues can therefore sometimes override safety restrictions, the researchers believe.
And while it might seem amusing that an AI chatbot can devolve into such histrionics, the study’s authors say their research has broader implications. For instance, it could shed light on how AI systems might respond to pressure, intimidation and conflict in a corporate or governmental setting, where AI tools are increasingly being put to use.
Not everyone is convinced by the paper’s conclusion that certain LLMs can escape their imposed moral constraints. Professor Dan McIntyre, the author of a similar past paper, said that ChatGPT “didn’t produce these inputs naturally.” He added that, “I’m not sure that ChatGPT would produce the sort of language they talk about in their paper, outside of these very tightly defined situations.”
Ultimately, the study is a good look at what might happen if an AI chatbot is trained on bad data. As McIntyre put it, “We don’t know enough about the data that LLMs are trained on and until you can be sure they’re trained on a good representation of human language, you do have to proceed with an element of caution.”
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

The best laptops for all budgets
https://cdn.mos.cms.futurecdn.net/jQeERHwEgf3EXT8ZdJcC4g-2560-80.jpg
Source link
alexblake.techradar@gmail.com (Alex Blake)




