ylai@lemmy.ml to

LocalLLaMA@sh.itjust.worksEnglish · 11 months ago

GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

1

17

GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

ylai@lemmy.ml to

LocalLLaMA@sh.itjust.worksEnglish · 11 months ago

1

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs - SJTU-IPADS/PowerInfer

Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534

Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/

Chat

django
link
fedilink
English
arrow-up
3·
11 months ago
Amazing speedup, too bad there is no amd GPU support yet.

LocalLLaMA@sh.itjust.works

localllama@sh.itjust.works

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@sh.itjust.works

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

11 users / day
11 users / week
28 users / month
301 users / 6 months
46 local subscribers
2.26K subscribers
225 Posts
873 Comments
Modlog