- cross-posted to:
- hackernews@lemmy.bestiver.se
- cross-posted to:
- hackernews@lemmy.bestiver.se
Let the AIs play games against each other. The resulting leader board is more precise than benchmarks?
Let the AIs play games against each other. The resulting leader board is more precise than benchmarks?
The funny thing: Afaik the LLMs are terrible at chess vs purpose trained chess AI - like Stockfish. https://dynomight.net/more-chess/
Often suggests illegal chess moves.