More

    Surprisingly enough, it seems some AI agents aren’t quite up to scratch on some basic business tests




    • Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
    • Reasoning models like gemini-2.5-pro tend to outperform lighter models
    • CRMArena-Pro has proven to be a challenging benchmark

    Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

    It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.

    https://cdn.mos.cms.futurecdn.net/cuJ2nHdA2cLngX4bhsHsye.jpg



    Source link

    Latest articles

    spot_imgspot_img

    Related articles

    Leave a reply

    Please enter your comment!
    Please enter your name here

    spot_imgspot_img