ChatGPT, the revolutionary chatbot powered by artificial intelligence (AI), will soon be able to do much more than send human-like text messages.
A Microsoft executive has revealed that the next version – set to be released this week – will be able to turn text prompts into unique videos.
The tech giant has invested heavily in ChatGPT, and has already unveiled a host of new products which incorporate it as an AI assistant, like search engine Bing.
But this updated version, dubbed GPT-4 and tipped to launch on Thursday, will have ‘multimodal models’, according to Microsoft Germany CTO Andreas Braun.
This means that it will be able to generate content in multiple formats, like audio clips, images and video clips, from a text prompt.
A Microsoft executive has revealed that the next version of ChatGPT will be able to turn text prompts into unique videos, and will be released this week
ChatGPT is a large language model that has been trained on a massive amount of text data, allowing it to generate human-like text responses to a given prompt.
The current version, released by start-up OpenAI in November, is known as GPT3.5, and has been found to have a huge range of capabilities.
For example, it has been used to pass exams, deliver a sermon, write software and give relationship advice.
It has been limited to providing responses as text, but Mr Braun revealed that that is about to change at the ‘AI in Focus – Digital Kickoff’ event last Thursday.
According to Heise, he said: ‘We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos.’
This isn’t a completely groundbreaking concept – in September, rival tech giant Meta unveiled its own AI system that generates videos from text prompts.
‘Make-A-Video’ was trained on images with captions to help it learn about the world and how it is described, and unlabeled videos to determine how the world moves.
However, the resulting clips, while impressive, tend to be blurry and lack sound.
Make-A-Video has yet to be made available to the public, but the release of GPT-4.0 has potential to change that.
Experts have said that the success of ChatGPT and OpenAI’s collaboration with Microsoft ‘rushed’ Google into releasing its own AI chatbot, Bard.
Speculation started when Bard got a question wrong in a promotional video – wiped off £100 billion from its firm’s value.
In September, rival tech giant Meta unveiled its own AI system that generates videos from text prompts. ‘Make-A-Video’ was trained on images with captions to help it learn about the world and how it is described, and unlabeled videos to determine how the world moves
While GPT-4 will be OpenAI’s first foray into video generation, it has already developed a text-to-image AI, DALL-E.
In 2020, the company also announced Jukebox, a tool that creates music from a prompt, and can mimic the style of different artists.
While not mentioning these tools specifically, Mr Braun said that the new ChatGPT will ‘make the models comprehensive’.
At the ‘AI in Focus’ event, which was broadcast to Microsoft partners and potential customers, Mr Braun did not reveal whether GPT-4 would be released by itself or as part of a product.
The tech company does have an event planned for Thursday which is due to showcase ‘the future of AI’, which may provide more information.
Rumours about what this update will look like have been swirling since 2021, with Wired speculating that it will use 100 trillion parameters.
These will give it a lot more ‘next word’ or ‘next sentence’ options in a given context than it has currently, making it more human-like.
However this has been shut down by OpenAI CEO Sam Altman, who told StrictlyVC it was ‘total bulls**t’.
Others have said GPT-4 will be better at generating computer code, handle longer text prompts and be able to output text, images, sounds and videos.
Mr Altman told the podcast ‘AI for the Next Era’: ‘I think we’ll get multimodal models in not that much longer, and that’ll open up new things.’
OpenAI CEO Sam Altman (pictured) told the podcast ‘AI for the Next Era’: ‘I think we’ll get multimodal models in not that much longer, and that’ll open up new things’
While a comprehensive, multi-modal AI is a new concept, discussions about the impacts of AI video generation have been going on for years, specifically with regards to ‘deepfakes’.
These are forms of AI which use ‘deep learning’ to manipulate audio, images or video, creating hyper-realistic, but fake, media content.
The term was coined in 2017 when a Reddit user posted manipulated porn videos to the forum.
The videos swapped the faces of celebrities like Gal Gadot, Taylor Swift and Scarlett Johansson, onto porn stars.
Another notorious example of a deepfake or ‘cheapfake’ was a crude impersonation of Volodymyr Zelensky appearing to surrender to Russia in a video widely circulated on Russian social media last year.
The clip shows the Ukrainian president speaking from his lectern as he calls on his troops to lay down their weapons and acquiesce to Putin’s invading forces.
Savvy internet users immediately flagged the discrepancies between the colour of Zelensky’s neck and face, the strange accent, and the pixelation around his head.
Despite the entertainment value of deepfakes, some experts have warned about the dangers they might pose.
Dr Tim Stevens, director of the Cyber Security Research Group at King’s College London, said deepfake AI has the potential to undermine democratic institutions and national security.
He said the widespread availability of these tools could be exploited by states like Russia to ‘troll’ target populations in a bid to achieve foreign policy objectives and ‘undermine’ the national security of countries.
He added: ‘The potential is there for AIs and deepfakes to affect national security.
‘Not at the high level of defence and interstate warfare but in the general undermining of trust in democratic institutions and the media.
‘They could be exploited by autocracies like Russia to decrease the level of trust in those institutions and organisations.’
Indeed, it has been predicted that 90 per cent of online content will be generated or created using artificial intelligence by 2025.
Read the full article here