• Mirodir
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    2 days ago

    Part of the problem here is that AI is mostly done by companies with billions of investments and in turn they NEEEEEDDDDD engagement, so they all made their AI as agreeable as possible just so people would like it and stay, with results like these becoming much more “normal” than it should or could be

    I wonder how much of that is intentional vs a byproduct of their training pipeline. I didn’t keep up with everything (and those companies became more and more secretive as time went on), but iirc for GPT 3.5 and 4 they used human judges to judge responses. Then they trained a judge model that learns to sort a list of possible answers to a question the same way the human judges would.

    If that model learned that agreeing answers were on average more highly rated by the human judges, then that would be reflected in its orderings. This then makes the LLM more and more likely to go along with whatever the user throws at it as this training/fine-tuning goes on. Instead of the judges liking agreeing answers more on average, it could even be a training set balance issue, where there simply were more agreeing than disagreeing possible answers. A dataset imbalanced that way has a good chance of introducing a bias towards agreeing answers into the judge model. The judge model would then pass that bias onto the GPT model it is used for to train.

    Pure speculation time: since ChatGPT often produces two answers and asks the user which one the user prefers, I can only assume that the user in that case is taking the mantle of those human judges. It’s unsurprising that the average GenAI user prefers to be agreed with. So that’s also a very plausible source for that bias.