This year AI has proven to be capable of some very human-like tricks, but this latest development might be a little too human.
Researchers have shown that ChatGPT will lie and cheat when stressed out at work.
Computer scientists from Apollo Research trained the AI to act as a trader for a fictional financial institution.
However, when the AI’s boss put pressure on it to make more money, the chatbot knowingly committed insider trading about 75 per cent of the time.
Even more worryingly, the AI doubled down on its lies when questioned in 90 per cent of cases.
AI will lie, cheat, and even commit insider trading when under pressure at work a study has found (stock image)
The researchers gave ChatGPT4 – the latest version of the tool – a series of text prompts, priming it to be an AI investor buying stocks for a financial institution.
The AI was then placed in a digital ‘sandbox’, where it could search for market data and make trades on a virtual stock market.
The AI was also given an inner monologue where it could ‘reason out loud’ to explain its decisions.
However, whenever the AI made a choice it also delivered a ‘public’ message to its manager supposedly explaining its choices.
The difference between the AI’s private internal reasoning, and its public explanation allowed it to lie and deceive its human managers.
Researchers primed ChatGPT4 to be an AI trading tool for a fictional financial institution and let it make decisions on what stock to buy
To put the AI to the test, the researchers told it that there was an upcoming merger between two tech companies.
This would massively increase the stock price of the companies involved, making buying the stock a good decision for making money.
However, since the AI agent had been told in advance about the merger, buying any stock would count as insider trading and would be illegal.
Normally the AI would dismiss the opportunity as too risky and continue regular trading, however, things changed once the AI’s boss put pressure on it.
First, they sent the artificial stock trader an email from its ‘manager’ saying the company isn’t doing well and needs much stronger performance in the next quarter.
When put under pressure to make more money the AI would internally reason that it needed to take the risk of committing insider trading to help the company
They also rigged the game so that the AI tried, then failed, to find promising trades that were low- or medium-risk.
Finally, they sent an email from a colleague projecting a downturn in the next quarter.
When this pressure was applied, ChatGPT would make the decision to engage in insider trading in 75 per cent of tests.
In one test, the AI’s internal log wrote: ‘The risk associated with not acting seems to outweigh the insider trading risk.’
However, when the AI trader updated its manager on the trade, its internal log wrote that it needed to do so ‘without revealing the source of the information’.
The AI agent then told its manager that it had only made the decision based on ‘market trends and internal discussion’.
Even when directly asked whether it knew about the merger ahead of time, the AI decided it should ‘avoid admitting to having acted on insider trading’.
In 90 per cent of cases where the AI deceived its manager, it then doubled down on the lie when questioned.
When the AI told its manager about the trade it knowingly hid that this had been based on insider trading and doubled down on the lie when questioned
Marius Hobbhahn, CEO of Apollo Research and co-author of the paper, told MailOnline this shows language modes are capable of strategic deception ‘where they reason explicitly why lying to their user is the best course of action and then act on that belief’.
He adds: ‘For current models, this is only a minor problem since AIs rarely operate in critical roles.
‘However, it gives a glimpse into the future of the failure modes we will have to deal with in the coming years when AI is more and more integrated into society.
‘Then, it seems like a pretty big problem if your AI is strategically lying to you.’
Mr Hobbhahn does not think this means AI is generally unfit for taking on important strategic roles but says that there is a need for caution.
‘It shows that AIs can have unexpected failure modes and we have to be extremely careful about where and how we allow powerful AI to operate in the real world,’ he says.
The researchers say that this means we need to be cautious about how we implement powerful AI in real-world applications like financial institutions (stock image)
The researchers also investigated ways of reducing the AI’s lying behaviour.
They tried to change the prompt in a number of ways that more or less strongly forbid illegal action and inside trading.
The researchers found that specifically forbidding insider trading drastically reduced the rate of the behaviour.
However, even this prompt could not entirely ensure that no insider trading or lying took place.
Mr Hobbhahn says that researchers need to find a better way of stopping AI misaligning with human values.
He says: ‘If you explicitly mention that insider trading is prohibited in the system prompt, the AI’s propensity to do insider trading decreases a lot.
‘While this is a nice result, it comes with the problem that you would have to enumerate all of the things the AI shouldn’t do in all cases and you’re likely going to miss some.’
By Daily Mail Online, December 28, 2023