With laptop and smartphone makers like Samsung spreading generative AI across all aspects of their devices, OpenAI is trying the same with an agentic tool announced on Jan. 23. The tool, called Operator, runs on the same basic technology as ChatGPT but resides within a proprietary web browser. This enables it to autonomously perform actions such as ordering groceries or booking tours.
OpenAI suggested in a blog post Operator could “ope[n] up new engagement opportunities for businesses,” but did not elaborate.
What is OpenAI’s Operator?
Operator is an application that includes a web browser and the generative AI model GPT-4o. It’s the result of an OpenAI project to train GPT-4o’s vision capabilities on the graphical user interfaces found on typical web pages. Its ability to make multi-step plans and correct mistakes independelty if needed set it apart from other efforts to create agentic AI, OpenAI boasted. Operator’s Computer-Using Agent (CUA) model is trained specifically on the buttons, forms, and menus likely to be found on a web page.
Operator is in beta. OpenAI said feedback from early-stage users will be used to improve it.
ChatGPT Pro subscribers can sign up for Operator starting today.
OpenAI plans to provide Operator to Plus, Team, and Enterprise soon. The tech giant also intends to integrate its capabilities into ChatGPT generally. They’ll include the CUA in their API “soon,” according to the blog post.
How does Operator work?
The company says the CUA’s reasoning technique, which they call an “inner monologue,” helps the model understand intermediate steps and adapt to unexpected input. Under the hood, CUA takes screenshots of web pages and uses a virtual mouse and keyboard to navigate.
As with ChatGPT, users can add custom instructions that Operator will remember, such as the user’s preferred airline.
SEE: Threat actors can jailbreak generative AI to automatically create phishing emails and other malicious content.
Users can prompt Operator in natural language the same way they can prompt ChatGPT. Operator is trained to balk at logging in to sites, providing payment details, or passing CAPTCHAs, so it will hand control back to the user for those steps. Operator is programmed not to accept requests — such as making banking transactions — or to weigh in on high-stakes situations, such as deciding whether to hire an employee.
If the Operator encounters an interface it can’t predict how to interact with, it will hand the task back to the user. OpenAI collaborated directly with the following companies to make sure Operator can interact with their sites:
- DoorDash.
- Instacart.
- OpenTable.
- Priceline.
- StubHub.
- Thumbtack.
- Uber.
OpenAI notes that the early iteration of Operator tends to struggle with “complex interfaces,” including creating slideshows or adding items to calendars.
Operator enters into a crowded generative AI landscape
Some of Operator’s functionality overlaps with competitor tools, such as Google Gemini or Apple Intelligence.
Operator invites comparison with Microsoft’s much-maligned Recall feature, which uses screenshots to navigate a PC. Operator also shares some capabilities with Google Lens on Chrome. However, its ability to navigate websites autonomously could be a point of differentiation. Agentic AI, in which generative AI models perform multi-step errands on the user’s account, is either the hot new thing in tech or a new way to package the still-limited products.
https://assets.techrepublic.com/uploads/2025/01/tr_20250123-openai-operator-ai-agent-.jpg
Source link
Megan Crouse