• vrighter
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    1 - a markov chain only takes previous tokens as input.

    2 - It uses a function (in the mathematical sense, so same input results in same output, completely stateless) to generate a set of probabilities for what the next token might be.

    3 - The most probable token is picked, else randomness (temperature) is inserted here to choose a different token occasionally.

    an llm’s internals, the part that’s trained is literally the function used in step 2. You could have this function implemented a number of ways, ex you could build a huge table and consult it. Or you could generate it somehow. You could train a big neural network that takes previous tokens as input, and outputs probabilities of tokens as output. You then enumerate its outputs for every possible permutation of inputs and there’s your table. This would take too much time and space, so we just run the function on-demand instead. Exact same result. It can be very smart and notice correlations, but ultimately it generates a (virtual) huge static table. This is a completely deterministic process. A trained NN is still a (huge) mathematical function. So the big network that they spend resources training is basically the function used in step 2.

    Step 3 is the cause of hallucinations. It’s the only nondeterministic part. And it’s not part of the llm itself in any way. No matter how smarter the neural network gets, the hallucinations are introduced mainly in step 3. So no, they won’t be solving the LLM hallucination problem anytime soon.