• moosetruce@beehaw.org
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    1 year ago

    I tested by asking ChatGPT 3.5 specific questions about The Bedwetter, and it seems like it was not trained on the full text of the book. I asked it what is the first sentence, and then what is the second paragraph, and it gave plausible but incorrect answers. I asked it for the table of contents, and then if a specific chapter was in the book, and it said “my responses are generated based on pre-existing data and do not have real-time access to specific book content”. I asked who wrote the foreward, and who wrote the afterward. It said Patton Oswalt wrote the foreward and that there is no afterward. In reality, Sarah wrote the foreward and God wrote the afterward.

    ChatGPT conversation
    Table of contents and first chapter from Google Books.

    • technojamin@beehaw.org
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 year ago

      LLMs compress data, there’s no way ChatGPT could remember every detail of the book alongside all the other information it stores in its encodings. The issue isn’t whether the entire text of the book is contained within the encodings, it’s whether it was trained on the book in the first place.