Surprisingly enough, it seems some AI agents aren’t quite up to scratch on some basic business tests




  • Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
  • Reasoning models like gemini-2.5-pro tend to outperform lighter models
  • CRMArena-Pro has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.

https://cdn.mos.cms.futurecdn.net/cuJ2nHdA2cLngX4bhsHsye.jpg



Source link

Latest articles

spot_imgspot_img

Related articles

spot_imgspot_img