The original post: /r/localllama by /u/HauntingMoment on 2025-01-06 12:58:59.
Hi Guys! Very excited to share Lighteval, the evaluation framework we use internally at Hugging Face. Here are the key features:
- Python API with training / eval loop: Simple integration with the Python API, easily integrate Lighteval into your training loop!
- Speed: Use vllm as a backend for fast evals.
- Completeness: Choose from multiple backends to launch models from almost any provider and compare closed and open-source models at the speed of light. You can choose from local backends (transformers, vllm, tgi) or API providers (litellm, inference endpoints)
- Seamless Storage: Save results in S3 or Hugging Face Datasets.
- Custom Tasks: Easily add custom tasks.
- Versatility: Tons of metrics and tasks ready to go.
Here is how to get started fast and evaluate llama-3.1-70B-Instruct on the gsm8k benchmark and compare results with openai’s o1-mini!
pip install lighteval[vllm,litellm]
lighteval vllm "pretrained=meta-llama/Llama-3.1-70B-Instruct,dtype=bfloat16" "lighteval|gsm8k|5|1" --use-chat-template
lighteval endpoint litellm "o1-mini" "lighteval|gsm8k|5|1" --use-chat-template
If you have strong opinions on evaluation and think there are still things missing, don’t hesitate to help us; we would be delighted to have your help and build what will help us get better and safer AI.
You must log in or register to comment.