Anthropic Claude 3 New Models and First Usage Impressions

Updated on March 6 2024
image

Anthropic, a company focused on Artificial Intelligence safety and research, has unveiled its latest advancement: the Claude 3 family of AI models. This latest offering promises to be better than competitors like GPT-4 and Gemini Pro.

In this article, we will explore the first usage impressions, key features and capabilities of Claude 3, analyzing its specs, performance benchmarks, and the ethical considerations that Anthropic has taken into account during the development process.

The Claude 3 Model Family

Claude Intelligence Benchmark score - source Antrophic
Claude Intelligence Benchmark score – source Antrophic

At the heart of Anthropic’s latest offering is the Claude 3 model family, comprising three distinct models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

Each model is designed to cater to different use cases and performance requirements, allowing users to strike the optimal balance between intelligence, speed, and cost.

Claude 3 Opus

Opus stands as the most powerful and intelligent model in the family.

According to Anthropic, Opus outperforms its peers on most common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), and basic mathematics (GSM8K).

Opus exhibits near-human levels of comprehension and fluency, pushing the boundaries of general intelligence. It excels in complex tasks such as task automation, research and development (R&D), strategy analysis, and forecasting, making it an invaluable asset for businesses operating in diverse industries.

Claude 3 Sonnet

Sonnet strikes a balance between performance and price.

It is well-suited for enterprise use cases that require reliable and responsive AI, such as customer service chatbots or data analysis. With strong performance at a lower cost compared to its peers, Sonnet is engineered for high endurance in large-scale AI deployments.

Potential use cases for Sonnet include data processing, sales and marketing, code generation, and quality control. Its ability to process vast amounts of knowledge efficiently makes it an ideal choice for tasks that demand both speed and accuracy.

Claude 3 Haiku

Haiku is the fastest and most affordable option, but it is also the least powerful.

It is ideal for tasks that require quick responses, such as chatbots or generating short snippets of text. It excels at answering simple queries and requests with unmatched speed, enabling the development of seamless AI experiences that mimic human interactions.

Haiku’s potential applications span customer interactions, content moderation, cost-saving tasks such as optimized logistics and inventory management, and extracting knowledge from unstructured data.

Also Read: The Evolution of Google Gemini: From Bard to Gemini Advanced Ultra 1.0

Also Read: Phind 70B vs GPT-4 Turbo – How The New AI Coding Assistant Stacks Up

Claude 3: Key Features and Capabilities

Enhanced Vision Capabilities

One of the standout features of the Claude 3 models is their sophisticated vision capabilities, on par with other leading AI models. They can process a wide range of visual formats, including photos, charts, graphs, and technical diagrams, making them invaluable for enterprises with knowledge bases encoded in various formats such as PDFs, flowcharts, or presentation slides.

Fewer Refusals and Improved Accuracy

Claude Intelligence incorrect refusal- source Antrop
Claude Intelligence incorrect refusal- source Antrophic

Anthropic has made significant strides in addressing the issue of unnecessary refusals, which plagued previous Claude models. The Claude 3 models exhibit a more nuanced understanding of requests, recognizing real harm while refusing to answer harmless prompts much less often.

Moreover, the accuracy of the models has been improved, with Opus demonstrating a twofold increase in correct answers compared to Claude 2.1 on challenging open-ended questions. Anthropic plans to enable citations in the near future, allowing the models to point to precise sentences in reference material to verify their answers, further enhancing trust and transparency.

Long Context and Near-Perfect Recall

The Claude 3 family of models boasts impressive context processing capabilities. Upon launch, they will offer a 200K token context window, with the potential to accommodate inputs exceeding 1 million tokens for select customers who require enhanced processing power.

To handle long context prompts effectively, the models rely on robust recall capabilities. Claude 3 Opus has demonstrated near-perfect recall, surpassing 99% accuracy on the ‘Needle In A Haystack’ (NIAH) evaluation, which measures a model’s ability to accurately recall information from a vast corpus of data.

Responsible Design and Ethical Considerations

Post the latest fiasco with Gemini generating “racially biased” images, mitigating biases in the AI models takes a center stage.

Anthropic has dedicated teams that track and mitigate a broad spectrum of risks, ranging from misinformation and harmful content to biological misuse, election interference, and autonomous replication skills.

Anthropic has also developed methods such as Constitutional AI, which aims to improve the safety and transparency of its models by aligning their behavior with human intentions and guiding principles.

Addressing biases in increasingly sophisticated models is an ongoing effort, and Anthropic claims that Claude 3 exhibits less bias than its previous models, according to the Bias Benchmark for Question Answering (BBQ). The company remains committed to advancing techniques that reduce biases and promote greater neutrality in its models.

Easier to Use and Integrate

One of the key advantages of the Claude 3 models is their ease of use and integration. They are better at following complex, multi-step instructions and adhering to brand voice and response guidelines, making it simpler to develop customer-facing experiences that users can trust.

Additionally, the Claude 3 models are adept at producing structured output in formats like JSON, simplifying their integration into applications that require natural language classification, sentiment analysis, and other data processing tasks.

Claude 3 Pricing and Availability

Anthropic has made the Claude 3 models widely available to developers and businesses. Opus and Sonnet are now available for use through the Claude API, which is generally available in 159 countries, enabling developers to sign up and start using these models immediately. Haiku will be available soon.

Sonnet is powering the free experience on claude.ai, with Opus available for Claude Pro subscribers.

Both Sonnet and Opus are also available through Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden, with Haiku coming soon to both platforms.

In terms of pricing, the models are offered at varying cost levels, catering to different budgets and use cases:

Opus: $15 per million input tokens, $75 per million output tokens
Sonnet: $3 per million input tokens, $15 per million output tokens
Haiku: $0.25 per million input tokens, $1.25 per million output tokens

The company is also excited to release a series of features to enhance the models’ capabilities, particularly for enterprise use cases and large-scale deployments.

These new features will include Tool Use (aka function calling), interactive coding (aka REPL), and more advanced agentic capabilities, enabling the models to interact with other systems, code interactively, and deliver advanced autonomous capabilities.

Also Read: Mistral Large: Mistral AI’s New LLM Outshines GPT4, Claude and ChatGPT

First Usage Impression of Claude 3 Sonnet

First Usage Impressions OF Claude 3 -Problems
First Usage Impressions OF Claude 3 -Problems

Claude 3 has released Sonnet for free usage on Claude.ai. We used Claude 2.x a lot on a daily basis as it used to give better responses than ChatGPT when it comes to text generation.

However, to our surprise, when we used Sonnet, we found out:

  • The responses are slow
  • The reply option gets disabled after the first output gets generated
  • Refusal rate was very high
  • The natural language answers were “bad” as compared to previous version

We did not have a very good first impression of the Claude 3 Sonnet. But, We are sure Claude 3 will improve over time as it happens with every new model release.

Also Read: Amazon’s investment in Anthropic

Conclusion

With its powerful capabilities, enhanced vision processing, improved accuracy, and long context processing, Claude 3 sets a new benchmark for AI intelligence as it beats GPT-4 and Gemini Pro in comparisons. However, the real test data and use cases will emerge over time.

While our first impression of the upgrade wasn’t the one we expected, we trust Anthropic will address these starting problems and we will have fine-tuned model for normal users and businesses alike.

About Appscribed

Appscribed is a comprehensive resource for SaaS tools, providing in-depth reviews, insightful comparisons, and feature analysis. It serves as a knowledge hub, offering access to the latest industry blogs and news, thereby empowering businesses to make informed decisions in their digital transformation journey.

Related Articles