No Wonder Ultraman Panicked! Anthropic’s AI Takes Control of Computers, Netizens Applaud and Call Out OpenAI

30 min read

Yesterday, Anthropic, an AI startup founded by several ex-OpenAI employees, introduced a groundbreaking new feature called "Computer Usage." This feature allows large models to understand and interact with all desktop applications, simulating keystrokes, clicking buttons, mouse gestures, and text input, achieving near-human proficiency in operating a computer.

In other words, Anthropic isn't creating specialized tools for specific tasks but instead teaches its models basic computer skills, enabling them to naturally use everyday software and tools just like humans.

The newly upgraded Claude 3.5 Sonnet is the first model available in public beta with this “Computer Usage” feature. Anthropic has extensively refined the model, particularly in tasks requiring agent coding and tool use. Pietro Schirano, founder of AI-driven image startup EverAI, shared that Claude 3.5 Sonnet is "the best coding model in the world. Integrating it into my daily workflow has completely transformed my life experience."

At the same time, Anthropic launched the Claude 3.5 Haiku, which rivals the performance of their largest system but with smaller size, speed, and cost. Claude 3.5 Haiku is priced similarly to Claude 3 Haiku but outperforms the larger Claude 3 Opus model in several critical benchmarks, including customer service tasks.

Developers can now access the upgraded Claude 3.5 Sonnet through Anthropic's API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude 3.5 Haiku, initially text-only, will be available later this month, with image input coming soon.

When Claude Learns to Use a Computer

image-20241023153447564

In a video released by Anthropic, researchers demonstrated how the new Claude model uses the computer usage feature to gather information from various sources, fill out forms, develop websites, and even plan hiking trips. While the feature is impressive, it’s not without limitations. A developer blog highlights a failed test case where Claude abandoned a coding task midway and started browsing photos of Yellowstone National Park—an amusingly human-like behavior, perhaps even learning procrastination.

In tests evaluating AI's ability to assist with flight ticket operations, such as modifying bookings, the updated Claude 3.5 Sonnet only completed less than half of the tasks. In another test involving refund requests, it had a similar failure rate of around one-third.

Anthropic acknowledged these issues, stating that the model struggles with common operations like scrolling and zooming, relying on rapid screenshots rather than real-time video streams, meaning it can miss brief notifications or changes. The company encourages developers to start with low-risk tasks while exploring this feature.

A Potential Game-Changer or Just Another AI Tool?

The race to develop AI capable of computer automation is heating up. Adept, recently acquired by Amazon, has been training models to navigate websites and software, while Twin Labs is automating desktop processes using existing models like OpenAI’s GPT-4o. Startups like Rabbit are developing web-based agents capable of tasks like buying movie tickets online. Reports suggest OpenAI is also working on similar tools but has yet to release anything publicly.

Anthropic’s approach differs from Microsoft's UFO (UI-Focused), an agent designed for Windows OS interaction using OpenAI’s GPT-4V. UFO can observe and analyze graphical user interfaces (GUIs) and Windows applications to execute complex tasks, such as deleting all annotations in a PowerPoint presentation or summarizing meeting notes and sending emails—all with just a single command.

Anthropic’s solution works by training Claude to accurately calculate pixel movements to control the mouse cursor. Meanwhile, Microsoft’s UFO relies on AppAgent and ActAgent, which analyze screenshots and user requests to automate application navigation and task execution.

The Risks of AI-Controlled Computers

Anthropic admitted that their new computer usage feature is still experimental and could carry risks. For instance, models with access to desktop applications might expose personal information or leverage software vulnerabilities. Research has shown that even models without such capabilities can be manipulated to perform harmful operations, such as ordering fake passports from dark web sellers.

Anthropic believes that the benefits of releasing this relatively limited, safer model outweigh the risks. They argue that early access to computer usage functionality allows them to observe potential issues and gradually build safety mechanisms.

The company has developed classifiers to discourage high-risk behavior, such as posting on social media or executing tasks on government websites. As a precaution, screenshots captured during computer usage will be stored for 30 days, and the models will not access the internet during training.

In conclusion, Anthropic advises users to isolate Claude from highly sensitive data to minimize risks. As one netizen humorously noted, “Two years ago, Anthropic said we needed to stop AGI from destroying the world. Now they’re asking, ‘What if we let AI use computers freely and train it to have ADHD?’”

Final Thoughts

Anthropic's move to introduce a computer-controlling AI feature signals a new frontier in AI development. However, it also raises important questions about the balance between innovation and risk. As the competition heats up, it will be interesting to see how the AI landscape evolves and whether tools like these will become an integral part of our daily workflows or something we should approach with caution.