[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

rufus · edit-2 10 months ago

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

BetaDoggo_@lemmy.world · edit-2 11 months ago

This is big if true, but we’ll have to see how well it holds up at larger scales.

The size of the paper is a bit worrying but the authors are all very reputable. Several were also contributors on the retnet and kosmos2/2.5 papers.

rufus · edit-2 11 months ago

As far as I understand, their contribution is to apply what has proven to work well in the Llama architecture, to what BitNet does. And add a ‘0’. Maybe you just don’t need that much text to explain it, just the statistics.

They claim it scales as a FP16 Llama model does… So unless their judgement/maths is wrong, it should hold up. I can’t comment on that. But I’d like that if it were true…

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper page - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits