While looking into artificial intelligence “behavior,” researchers affirmed that yes, OpenAI’s GPT-4 appeared to be getting dumber.

  • Skua@kbin.social
    link
    fedilink
    arrow-up
    7
    ·
    1 year ago

    That explanation of the prime number thing doesn’t seem to actually match what’s in the paper. GPT4 goes from a wordy explanation of how it arrived at the correct answer, “yes”, to a single-word incorrect “no”. GPT3.5 goes from a wordy explanation that has the right chain of thought but the wrong answer “no” to a very wordy explanation with the correct answer “yes”. Neither of those seem to be predicated on either of the models just answering one way for everything.

    • mal099@kbin.social
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      @rastilin is making some unproven assumptions here. But it is true that the “math question” dataset consists only of prime numbers, so if the first version thought every number was prime and the second thought no numbers were prime, we would see this exact behavior. Source:

      For this dataset, we query the primality of 500 randomly chosen primes between 1,000 and 20,000; the correct answer is always Yes.

      From Zhang et al. (2023), the paper they took the dataset from.

      • Skua@kbin.social
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Surely it can’t just be a case of the LM saying a hard yes or no to any question of “is this prime” with the data they have, though? The results are a significant majority one way or the other in each case, but never 100%. Of the 500 each time, GPT3.5 has 37 answers go against the majority in March and 66 in July. That doesn’t seem like a hard one answer to any primality query to me, though that does come with the caveat that I’m by no means actually well studied on the topic

        • mal099@kbin.social
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          True, GPT does not return a “yes” or “no” 100% of the time in either case, but that’s not the point. The point is that it’s impossible to say if GPT has actually gotten better or worse at predicting prime numbers with their test set. Since the test set is composed of only prime numbers, we do not know if GPT is more likely to call a number “prime” when it actually is a prime number than when it isn’t. All we know is that it was very likely to answer “yes” to the question “is this number prime?” in March, and very likely to answer “no” in July. We do not know if the number makes a difference.