Photo Credit: Anthropic
Anthropic introduced two new artificial intelligence (AI) models and a new AI capability on Tuesday. The biggest introduction is an upgraded version of Claude 3.5 Sonnet which is claimed to offer improved benchmark scores across different categories. The new 3.5 Sonnet also gets a new capability dubbed Computer Use, which will allow it to understand and interact with computers, essentially allowing it to control and complete tasks on PCs. Further, the AI firm also announced Claude 3.5 Haiku, the successor to Claude 3 Haiku.
In a newsroom post, Anthropic announced an upgraded Claude 3.5 Sonnet, which offers improved performance compared to the AI model released in June. The AI firm claimed that the new model outperforms ChatGPT-4o and Gemini 1.5 Pro in benchmarks such as Graduate-Level Google-Proof Q&A (GPQA), Massive Multitask Language Understanding (MMLU) Pro, and coding-focused HumanEval.
However, the most significant improvements have been claimed in two particular benchmarks — Software Engineering Benchmark (SWE-bench), which increased from 33.4 percent to 49 percent, and Tool-Agent-User (TAU-bench), which moved from 62.6 percent to 69.2 percent. Both of these benchmarks relate to AI agentic performance.
This AI agentic capability is relevant since Anthropic introduced the new Computer Use capability that allows AI models to control and complete tasks on PCs. Currently, this capability is available via an application programming interface (API) which only runs on Claude 3.5 Sonnet.
With Computer Use, Claude is learning general computer skills. With specialised software, it can imitate keystrokes, button clicks, and cursor movements. Adding it to the AI model's existing computer vision capability, Claude 3.5 Sonnet can see what's happening on the screen, and process the information to carry out specific tasks. The feature will work based on prompts provided to the AI.
For instance, users can ask the large language model (LLM) to book tickets on a website, fill out an application, or even download and install an application. While specialised tools that can automate certain PC tasks already exist, a general-purpose tool that works on natural-language prompts is a significant milestone for generative AI technology.
However, Anthropic admits that this capability is still in its nascent stage and there are certain limitations. “Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges for Claude,” the company highlighted. For now, it is advised that developers should use this capability for only low-risk tasks.
With automated computer control capabilities, there are concerns about whether the AI model can be engineered to perform harmful and illegal activities. The company has not revealed any details about the security of the AI model and the safety of users at present. Notably, the upgraded Claude 3.5 Sonnet is available for all users and developers can build on this capability via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
Another major announcement was the unveiling of Claude 3.5 Haiku. For context, Haiku is the cheapest and fastest AI model series offered by Anthropic. The AI firm now claims that the capabilities of the successor to the Claude 3 Haiku outperform Claude 3 Opus, the company's previous flagship-grade model. This means users can now access a powerful AI model at a much cheaper price point.
Claude 3.5 Haiku will be released later this month across various platforms including the company's API, Amazon Bedrock, and Google Cloud's Vertex AI. It will initially be available as a text-only model and will later be updated to accept images as input.
For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.