OpenAI launched GPT-4o (“o” for “omni”), a new version of the artificial intelligence (AI) system powering the popular ChatGPT chatbot. GPT-4o is promoted as a step toward more natural engagement with AI. According to the demonstration video, voice conversations with users can be conductedin near real-time, exhibiting human-like personality and behavior.
This emphasis on personality is likely to be a point of contention. In OpenAI’s demos, GPT-4o sounds friendly, empathetic, and engaging. It tells “spontaneous” jokes, giggles, flirts, and even sings. The AI system also shows it can respond to users’ body language and emotional tone.
Launched with a streamlined interface, OpenAI’s new version of the ChatGPT chatbot appears designed to increase user engagement and facilitate the creation of new apps based on its text, image, and audio capabilities.
GPT-4o is another leap forward for AI development. However, the focus on engagement and personality raises important questions about whether it will genuinely serve users’ interests, and the ethical implications of creating AI that can simulate human emotions and behaviors.
ChatGPT’s personality factor
OpenAI envisions GPT-4o as a more enjoyable and engaging conversational AI. In principle, this could make interactions more effective and increase user satisfaction.
Studies show users are more likely to trust and cooperate with chatbots exhibiting social intelligence and personality traits. This could prove relevant in fields such as education, where studies have indicated AI chatbots can boost learning outcomes and motivation.
However, some commentators worry users may become overly attached to AI systems with human-like personalities or emotionally harmed by the one-way nature of human-computer interaction.
The Her effect
GPT-4o immediately inspired comparisons — including from OpenAI boss Sam Altman — to the 2013 science-fiction movie Her, which paints a vivid picture of the potential pitfalls of human-AI interaction.
In the movie, the protagonist, Theodore, becomes deeply fascinated and attached to Samantha, an AI system with a sophisticated and witty personality. Their bond blurs the lines between the real and the virtual, raising questions about the nature of love and intimacy, and the value of human-AI connection.
While we should not seriously compare GPT-4o to Samantha, it raises similar concerns. AI companions are already here. As AI becomes more adept at mimicking human emotions and behaviors, the risk of users forming deep emotional attachments increases. This could lead to over-reliance, manipulation, and even harm.
While OpenAI demonstrates concern with ensuring its AI tools behave safely and are deployed responsibly, we have yet to learn the broader implications of unleashing charismatic AIs onto the world. Current AI systems are not explicitly designed to meet human psychological needs — a goal that is hard to define and measure.
GPT-4o’s impressive capabilities show how important it is to have a system or framework for ensuring that AI tools are developed and used in ways that align with public values and priorities.
Expanding capabilities
GPT-4o can also work with video (of the user and their surroundings via a device camera or pre-recorded videos) and respond conversationally. In OpenAI’s demonstrations, GPT-4o comments on a user’s environment and clothes, recognizes objects, animals, and text, and reacts to facial expressions.
Google’s Project Astra AI assistant, unveiled just one day after GPT-4o, displays similar capabilities. It also appears to have visual memory: in one of Google’s promotional videos, it helps a user find her glasses in a busy office, even though they are not currently visible to the AI.
GPT-4o and Astra continue the trend towards more “multimodal” models that can work with text, images, audio, and video. GPT-4o’s predecessor, GPT-4 Turbo, can process text and images together, but not audio and video. The original version of ChatGPT, released less than two years ago, was based only on text.
GPT-4o is also significantly faster than its predecessor.
The ability to work across audio, vision, and text in real time is considered crucial to develop advanced AI systems that can understand the world and effectively achieve complex and meaningful goals.
However, some critics argue that GPT-4o’s text capabilities are only incrementally better than those of GPT-4 Turbo and competitors such as Google’s Gemini Ultra and Anthropic’s Claude 3 Opus.
Will major AI labs be able to sustain the recent rapid pace of improvement by continuing to build bigger and more sophisticated models? This is a hot topic of debate among experts, and the outcome will determine the technology’s impact over the coming years.
Wider access
A less flashy but significant aspect of GPT-4o’s launch is that, unlike its GPT-4 family precursors, the new AI system is available to all users in the free version of ChatGPT, subject to usage limits.
This means millions of users worldwide just got an upgrade from GPT-3.5 to a more powerful AI system with more features. GPT-4o is significantly more helpful than GPT-3.5 for various purposes, such as work and education. The impact of this development will become more apparent over time.
What’s next?
OpenAI’s unveiling of GPT-4o disappointed enthusiasts for ever more powerful AI systems, who hoped GPT-5’s arrival was imminent after over a year since GPT-4’s launch.
Instead, this week’s unveiling of GPT-4o and Google’s latest AI announcements emphasize the features being incorporated into their products. These new developments point to possibilities such as more sophisticated virtual assistants capable of performing complex tasks on behalf of users, involving richer interaction and planning.
Marcel Scharth, Lecturer in Business Analytics, University of Sydney
This article is republished from The Conversation under a Creative Commons license. Read the original article.
Follow us on X, Facebook, or Pinterest