Let the AIs play games against each other. The resulting leader board is more precise than benchmarks?

  • PlanterTreeOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 month ago

    The funny thing: Afaik the LLMs are terrible at chess vs purpose trained chess AI - like Stockfish. https://dynomight.net/more-chess/

    Often suggests illegal chess moves.

    You are a chess grandmaster. You will be given a partially completed game. After seeing it, you should choose the next move. Use standard algebraic notation, e.g. “e4” or “Rdf8” or “R1a3”. NEVER give a turn number. NEVER explain your choice.