Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Arthur Besse@lemmy.ml · edit-2 2 years ago

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Dominic@beehaw.org · 2 years ago

Also, how you know it read the book, and not a summary of it, of which there are loads on the internet?

In the case of ChatGPT, it’s hard to tell. OpenAI won’t even reveal what their training dataset was.

Researchers have done some tests to tease this out, and they’re pretty confident that it has read quite a few books and memorized them verbatim. See one of my favorite papers in a while, Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4.

nothacking · 2 years ago

Reading the paper, for even the best books it only guesses a name in a passage right more then seventy percent of the time on 5 of the over 500 tested books. On “The fellowship of the ring”, it got hardly over 50%, and that’s hardly a little known book. These LLM’s are definitely familiar with the content, I would hardly call that memorizing verbatim. (Humans are also reasonably good at this after reading a book)

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Sarah Silverman Sues ChatGPT Creator for Copyright Infringement