Surprisingly enough, it seems some AI agents aren’t quite up to scratch on some basic business tests


0

Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!


  • Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
  • Reasoning models like gemini-2.5-pro tend to outperform lighter models
  • CRMArena-Pro has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.



Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

Don’t miss the Buzz!

We don’t spam! Read our privacy policy for more info.

🤞 Don’t miss the Buzz!

We don’t spam! Read more in our privacy policy


Like it? Share with your friends!

0

0 Comments

Your email address will not be published. Required fields are marked *