• simple@lemmy.mywire.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yeah llama.cpp with SuperHOT support would be great, and yeah I’m using exllama with oobabooga UI. I found out why I’m getting garbage output with 2k. It seems like SuperHOT 8K models, when run with 2k context, have a massive increase in perplexity.

    (Higher perplexity, the worse the output quality).

    So I’ll need to figure out if I can get at least 4K running without running out of VRAM.

    Also, there is a new PR for exllama which uses a different method of getting higher context (not SuperHOT) and also has less perplexity loss. So that might be a better alternative potentially.

    • notfromhere@lemmy.oneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I read the guy’s blog post on SuperHOT and it sounded like it didn’t increase perplexity and kept perplexity super low with large contexts. I could have read it wrong but I thought it wasn’t supposed to increase perplexity.

      • simple@lemmy.mywire.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        The increase in perplexity is very small, but there is still some with 8K content. But it seems like with 2K its much larger. I could be misunderstanding something myself. But my little test with 2K context does suggest there’s something going on with 2K contexts on SuperHOT models