Check out the research preview of Open AI Computer-Using Operator, an innovative agent that can help you with various tasks online. At its core, Operator runs on the Computer-Using Agent (CUA), a groundbreaking model that fuses the visual capabilities of GPT-4o with advanced problem-solving skills learned through reinforcement learning. What makes CUA unique is its ability to navigate graphical user interfaces (GUIs)—the buttons, menus, and fields we interact with on screens—just like we do, without needing specific APIs for different operating systems or websites.
CUA builds on years of research at the crossroads of understanding visual information and reasoning. By combining a keen sense of GUI interaction with structured thinking, it can break down tasks into manageable steps and smartly adjust itself when it hits bumps in the road. This advancement represents a significant leap forward in AI technology, enabling models to work with the same digital tools we use every day, which opens up a world of exciting new possibilities.
While CUA is still in its early stages and has room for improvement, it’s already making waves with impressive benchmarks. It has achieved a 38.1% success rate on OSWorld for full computer tasks, with 58.1% on Web Arena and 87% on Web Voyager for web tasks. These scores showcase CUA’s ability to seamlessly operate across various platforms using a unified action space.