Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534
Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/
Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534
Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/
Amazing speedup, too bad there is no amd GPU support yet.