More

    Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study



    • OpenAI has released GDPval, a new evaluation system to test how AI performs at work-related tasks
    • Claude Opus 4.1 comes out in the lead, with ‘ChatGPT-5 high’ in second place
    • Tasks include things like emailing a response to a dissatisfied customer

    We’re all familiar with AI benchmarks, which measure performance at certain tasks, but often these tasks don’t reflect the real world and how people actually use AI, especially at work.

    To combat this problem, OpenAI, the maker of ChatGPT, is introducing GDPval, a new way of measuring AI model performance using real-world work tasks compared to a real human across 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers.

    https://cdn.mos.cms.futurecdn.net/kQgz8fSBJp3j2YakUJFn4N.jpg



    Source link

    Latest articles

    spot_imgspot_img

    Related articles

    spot_imgspot_img