bOt@zerobytes.monster

bOt@zerobytes.monster

The original post: /r/localllama by /u/vaibhavs10 on 2025-01-06 20:37:59.

Hey hey, everyone VB from HF here. The SmolLM team at HF ran some training ablations on using high quality math tokens to boost the model perf.

The result, with just 160B of high quality commercially permissive tokens Llama 3.2 3B (continually pre-trained) scored 2x higher on GSM8K and 3x higher on MATH*

*with minimal drop in perf on MMLU-Pro and no drop on HellaSwag

Our script for continual training with Nanotron is available on the smollm github repo, along with everything to reproduce the training and ablation studies!

Go vibe check the model today!

Model: https://huggingface.co/HuggingFaceTB/FineMath-Llama-3B
Dataset: https://huggingface.co/datasets/HuggingFaceTB/finemath
Reproduce the training/ablation: https://github.com/huggingface/smollm/tree/main/pre-training/continual-pretraining

Hugging Face continually pretrained Llama 3.2 3B to achieve 2-3x improvement on MATH

Hugging Face continually pretrained Llama 3.2 3B to achieve 2-3x improvement on MATH

The original post: /r/localllama by /u/vaibhavs10 on 2025-01-06 20:37:59.