Is there a good ai thats run fast on a amd rx 7600?

mariah@feddit.rocks · edit-2 1 year ago

Is there a good ai thats run fast on a amd rx 7600?

tal@lemmy.today · 1 year ago

koboldai horde

I mean, you can run KoboldAI locally.

I don’t know whether you’d consider that sufficiently fast. But if you’re already using that and happy with it, it’s probably what I’d try first.

rufus · edit-2 1 year ago

https://github.com/YellowRoseCx/koboldcpp-rocm

That will be optimized for AMD and as far as I know has the same / a very similar user interface.

(The 8GB of VRAM on your graphics card will be some limitation. So maybe stick with smaller and quantized models.)

And share your success stories on !ChatbotsNSFW@lemmynsfw.com

Fisch · 1 year ago

There’s a fork of text-generation-webui with HIP support, you should use that

𞋴𝛂𝛋𝛆@lemmy.world · 1 year ago

The 7600 is the 16GB? I can’t say for AMD but a 16 GB 3080Ti can run a whole lot of something. I don’t do Kobold because building it was too much of a headache of dependencies. I don’t do silly tavern either because I prefer more control and versatility.

I’m using an 18 core 12th gen with 64GB of sysmem and mostly use llama.cpp so that I can split the load between CPU and GPU. I wrote a little command line function that polls nvidia-smi and parses the GPU memory to tell me exactly how much I have used and what I have left over. That runs every 5 seconds in the terminal and displays the metrics on the title bar. Knowing exactly how much RAM you’re using in the GPU and dialing in the settings with new models makes a big difference. The various models have very different requirements and settings optimisation potential.

I run an 8×7B quantized model at 5 bits most of the time. It takes around 50GB to initially load, but runs like a 13B after that and is quire light weight.

I’m somewhat limited when it comes to training LoRA’s. Like I can only do 7-8B model stuff in that space, but with a GGUF I can run up to a 70B. I wish I had more than 64 GB of system memory though. At 96 or 128 I could run some of the 120B models. Command R is pretty popular and powerful, but I can’t load that one.

The 16 GB can run something like moistral 11B in transformers and 4-bit using bits and bites too.

projectmoon@lemm.ee · 1 year ago

How much speed are you actually getting on Mixtral (I assume that’s the 8x7b). I have 64 GB of RAM and an AMD RX 6800 XT with 16 GB of VRAM. I get like 4 tokens per second with Q5_K_M quant.

Even_Adder@lemmy.dbzer0.com · 1 year ago

Get una-thebeagle-7b-v1.Q4_K_M. I found it looking at this guide.

mariah@feddit.rocks · 1 year ago

I cant clone it

Even_Adder@lemmy.dbzer0.com · 1 year ago

What do you mean?

projectmoon@lemm.ee · 1 year ago

Install ollama. It has ROCm support (on Linux at least). Then hook it up to your favorite client. It has its own API and an openai compatible one.