Even_Adder@lemmy.dbzer0.com to

LocalLLaMA@sh.itjust.worksEnglish · 1 year ago

QuIP#: SOTA 2 bit LLMs

1

cross-posted to:
fosai@lemmy.world

11

QuIP#: SOTA 2 bit LLMs

Even_Adder@lemmy.dbzer0.com to

LocalLLaMA@sh.itjust.worksEnglish · 1 year ago

1

cross-posted to:
fosai@lemmy.world

GitHub - Cornell-RelaxML/quip-sharp

Contribute to Cornell-RelaxML/quip-sharp development by creating an account on GitHub.

Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance.

One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.

Project Page: https://cornell-relaxml.github.io/quip-sharp/

Code: https://github.com/Cornell-RelaxML/quip-sharp

Chat

rufus
link
fedilink
English
arrow-up
3·
edit-2
1 year ago
If anyone else wonders how that compares to llama.cpp’s “2bit” quantization, here is the in-depth discussion: https://github.com/ggerganov/llama.cpp/discussions/4327

LocalLLaMA@sh.itjust.works

localllama@sh.itjust.works

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@sh.itjust.works

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
20 users / week
35 users / month
268 users / 6 months
46 local subscribers
2.27K subscribers
225 Posts
874 Comments
Modlog