The original post: /r/localllama by /u/vaibhavs10 on 2025-01-06 20:37:59.
Hey hey, everyone VB from HF here. The SmolLM team at HF ran some training ablations on using high quality math tokens to boost the model perf.
The result, with just 160B of high quality commercially permissive tokens Llama 3.2 3B (continually pre-trained) scored 2x higher on GSM8K and 3x higher on MATH*
*with minimal drop in perf on MMLU-Pro and no drop on HellaSwag
Our script for continual training with Nanotron is available on the smollm github repo, along with everything to reproduce the training and ablation studies!
Go vibe check the model today!
- Model: https://huggingface.co/HuggingFaceTB/FineMath-Llama-3B
- Dataset: https://huggingface.co/datasets/HuggingFaceTB/finemath
- Reproduce the training/ablation: https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining
You must log in or register to comment.