• Onihikage@beehaw.org
      link
      fedilink
      English
      arrow-up
      11
      ·
      4 months ago

      I actually found GPT4ALL through looking into Kompute (Vulkan Compute), and it led me to question why anyone would bother with ROCm or OpenCL at all.

      • Fisch
        link
        fedilink
        English
        arrow-up
        5
        ·
        4 months ago

        I run models like Stable Diffusion and Llama with ROCm but models like RealESRGAN for upscaling or Rife for interpolation with Tencents Vulkan thingy (forgot what it’s called) and that’s far easier. Would be cool if LLMs and stuff could just be run with Vulkan too.

      • thingsiplay@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        4 months ago

        OpenCL is needed for me for non AI stuff, so that Darktable (an image program) can use my GPU; which is much faster. But for AI? No idea how they compare, as I did not use it for that purpose. ROCm itself also is troubling…

        Do you have the new Llama 3.1 8B Instruct 128k model? It’s quite slow on my GPU (I have a weak beginner class GPU with 8GB, but plan to upgrade). To the point its almost as slow as my CPU. I’ve read complains in the Github tracker from others too and wonder if its an issue with AMD cards. BTW the previous model Llama 3.0 8B Instruct is miles faster.

        • Onihikage@beehaw.org
          link
          fedilink
          English
          arrow-up
          3
          ·
          4 months ago

          I have a fairly substantial 16gb AMD GPU, and when I load in Llama 3.1 8B Instruct 128k (Q4_0), it gives me about 12 tokens per second. That’s reasonably fast enough for me, but only 50% faster than CPU (which I test by loading mlabonne’s abliterated Q4_K_M version, which runs on CPU in GPT4All, though I have no idea if that’s actually meant to be comparable in performance).

          Then I load in Nous Hermes 2 Mistral 7B DPO (also Q4_0) and it blazes through at 50+ tokens per second. So I don’t really know what’s going on there. Seems like performance varies a lot from model to model, but I don’t know enough to speculate why. I can’t even try Gemma2 models, GPT4All just crashes with them. I should probably test Alpaca to see if these perform any different there…

          • thingsiplay@beehaw.org
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            4 months ago

            Hi I just wanted let you know that I managed to get Gemma 2 model to work (didn’t work previously too).

            These are the new ones Gemma 2. I wasn’t 100% sure first, so looked up at Gemma models list: https://ai.google.dev/gemma/docs/get_started and the only 9b variants are the new Gemma 2 versions (Edit: I mislooked. There are Gemma 1 versions with 9b too, so never mind this comment. ). If this works on my low end GPU, it should work on yours too.

          • thingsiplay@beehaw.org
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            Wow it got worse for me. Maybe through last update? Is this probably related to he application? Now I get 12 t/s on my CPU and switching to GPU it’s only 1.5 t/s. Something is fishy. With Nous hermes 2 Mistral 7B DPO with q4 I get 33 t/s (I believe it was up to 44 before).

            Now I’m curious if this will happen with a different application too, but I have nothing else than GPT4All installed.

            • Onihikage@beehaw.org
              link
              fedilink
              English
              arrow-up
              2
              ·
              4 months ago

              Unfortunately I can’t even test Llama 3.1 in Alpaca because it refuses to download, showing some error message with the important bits cut off.

              That said, the Alpaca download interface seems much more robust, allowing me to select a model and then select any version of it for download, not just apparently picking whatever version it thinks I should use. That’s an improvement for sure. On GPT4All I basically have to download the model manually if I want one that’s not the default, and when I do that there’s a decent chance it doesn’t run on GPU.

              However, GPT4All allows me to plainly see how I can edit the system prompt and many other parameters the model is run with, and even configure multiple sets of parameters for the same model. That allows me to effectively pre-configure a model in much more creative ways, such as programming it to be a specific character with a specific background and mindset. I can get the Mistral model from earlier to act like anything from a very curt and emotionally neutral virtual intelligence named Jarvis to a grumpy fantasy monster whose behavior is transcribed by a narrator. GPT4All can even present an API endpoint to localhost for other programs to use.

              Alpaca seems to have some degree of model customization, but I can’t tell how well it compares, probably because I’m not familiar with using ollama and I don’t feel like tinkering with it since it doesn’t want to use my GPU. The one thing I can see that’s better in it is the use of multiple models at the same time; right now GPT4All will unload one model before it loads another.

              • thingsiplay@beehaw.org
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                4 months ago

                That’s quite unfortunate. Alpaca needs to support those explicitly to work with the new 3.1 128k models; GPT4All was not compatible with it before update either. There was a bug in some library they was using and needed a patch. So maybe that’s why you can’t use the new Llama 3.1 in Alpaca. (Edit: Never mind. On the webpage they advertise and talk about 3.1 being working, so a wrong guess by me probably.)

                Actually that sounds very useful and I missed that option, to be able to select from a set of related models. One thing that GPT4All can also do is, analyzing text files and then using the data to ask questions about it. It will also output the exact lines of the file in relation to the answer. I only experimented a little bit with this, but sounds useful too. The team also experiments and works on a web search using, but no idea how that would work with a local model if ever.

    • Fonzie!@ttrpg.network
      link
      fedilink
      arrow-up
      4
      ·
      4 months ago

      Bookmarking it to check this out after work

      … I should really go through these bookmarks one day

      • thingsiplay@beehaw.org
        link
        fedilink
        arrow-up
        3
        ·
        4 months ago

        I have a separate “ToDo” bookmark folder with temporary content, that I want to look in the near future. And for things I am looking into in near future, the pages are already in the browser open as tabs and loaded everytime I start the browser (but in an unloaded state until I click it).

        … I also should really go through these bookmarks and tabs one day.^^

        • Fonzie!@ttrpg.network
          link
          fedilink
          arrow-up
          2
          ·
          4 months ago

          I bookmarked it in Lemmy, available through both my PC browser and my mobile app. But I’m not sure if I can make bookmark folders/groups there.

          • thingsiplay@beehaw.org
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            Oh right. I never used the Lemmy bookmarking. And was thinking of browser bookmarks (Firefox). Right. I never thought about that.

            • Fonzie!@ttrpg.network
              link
              fedilink
              arrow-up
              2
              ·
              4 months ago

              It’s nice to have any device with access to my Lemmy account also have access to my bookmarks.

              … So I can ignore them all those devices simultaneously 😅

  • cmgvd3lw
    link
    fedilink
    arrow-up
    2
    ·
    4 months ago

    While downloading models, the progress bar is getting decreased sometimes, like from 11% it’ll go back to 10%. Wired.