Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement

DL :)@lemmy.ml · 1 year ago

Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement

Margot Robbie@lemmy.world · 1 year ago

She’s going to lose the lawsuit. It’s an open and shut case.

“Authors Guild, Inc. v. Google, Inc.” is the precedent case, in which the US Supreme Court established that transformative digitalization of copyrighted material inside a search engine constitutes as fair use, and text used for training LLMs are even more transformative than book digitalization since it is near impossible to reconstitute the original work barring extreme overtraining.

You will have to understand why styles can’t and should not be able to be copyrighted, because that would honestly be a horrifying prospect for art.

patatahooligan@lemmy.world · 1 year ago

“Transformative” in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.

Margot Robbie@lemmy.world · 1 year ago

But what an LLM does meets your listed definition of transformative as well, it indeed provides additional value that can’t be derive from the original, because everything it outputs is completely original but similar in style to the original that you can’t use to reconstitute the original work, in other words, similar to fan work, which is also why the current ML models, text2text or text2image, are called “transformers”. Again, works similar in style to the original cannot and should not be considered copyright infringement, because that’s a can of worm nobody actually wants to open, and the courts has been very consistent on that.

So, I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work, that they wouldn’t consider digitalizing copyrighted material in a database with very lossy compression (that’s a more accurate description of what LLMs are, please give this a read if you have time) fair use as well. Of course, with the current Roberts court, there is always the chance that weird things can happen, but I would be VERY surprised.

There is also the previous ruling that raw transformer output cannot be copyrighted, but that’s beyond the scope of this post for now.

My problem with LLM outputs is mostly that they are just bad writing, and I’ve been pretty critical against “”“Open”""AI elsewhere on Lemmy, but I don’t see Siverman’s case going anywhere.

patatahooligan@lemmy.world · 1 year ago

But what an LLM does meets your listed definition of transformative as well

No it doesn’t. Sometimes the output is used in completely different ways but sometimes it is a direct substitute. The most obvious example is when it is writing code that the user intends to incorporate into their work. The output is not transformative by this definition as it serves the same purpose as the original works and adds no new value, except stripping away the copyright of course.

everything it outputs is completely original

[citation needed]

that you can’t use to reconstitute the original work

Who cares? That has never been the basis for copyright infringement. For example, as far as I know I can’t make and sell a doll that looks like Mickey Mouse from Steamboat Willie. It should be considered transformative work. A doll has nothing to do with the cartoon. It provides a completely different sort of value. It is not even close to being a direct copy or able to reconstitute the original. And yet, as far as I know I am not allowed to do it, and even if I am, I won’t risk going to court against Disney to find out. The fear alone has made sure that we mere mortals cannot copy and transform even the smallest parts of copyrighted works owned by big companies.

I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work

Which case are you citing? Context matters. LLMs aren’t just a database. They are also a frontend to extract the data from these databases, that is being heavily marketed and sold to people who might otherwise have bought the original works instead.

The lossy compression is also irrelevant, otherwise literally every pirated movie/series release would be legal. How lossy is it even? How would you measure it? I’ve seen github copilot spit out verbatim copies of code. I’m pretty sure that if I ask ChatGPT to recite me a very well known poem it will also be a verbatim copy. So there are at least some works that are included completely losslessly. Which ones? No one knows and that’s a big problem.

Margot Robbie@lemmy.world · 1 year ago

I’m tired of internet arguments. If you are not going to make a good faith attempt to understand anything I said, then I see no point in continuing this discussion further. Good day.

Ziro@lemmy.world · edit-2 1 year ago

Let’s remove the context of AI altogether.

Say, for instance, you were to check out and read a book from a free public library. You then go on to use some of the book’s content as the basis of your opinions. More, you also absorb some of the common language structures used in that book and unwittingly use them on your own when you speak or write.

Are you infringing on copyright by adopting the book’s views and using some of the sentence structures its author employed? At what point can we say that an author owns the language in their work? Who owns language, in general?

Assuming that a GPT model cannot regurgitate verbatim the contents of its training dataset, how is copyright applicable to it?

Edit: I also would imagine that if we were discussing an open source LLM instead of GPT-4 or GPT-3.5, sentiment here would be different. And more, I imagine that some of the ire here stems from a misunderstanding of how transformer models are trained and how they function.

patatahooligan@lemmy.world · 1 year ago

Let’s remove the context of AI altogether.

Yeah sure if you do that then you can say anything. But the context is crucial. Imagine that you could prove in court that I went down to the public library with a list that read “Books I want to read for the express purpose of mimicking, and that I get nothing else out of”, and on that list was your book. Imagine you had me on tape saying that for me writing is not a creative expression of myself, but rather I am always trying to find the word that the authors I have studied would use. Now that’s getting closer to the context of AI. I don’t know why you think you would need me to sell verbatim copies of your book to have a good case against me. Just a few passages should suffice given my shady and well-documented intentions.

Well that’s basically what LLMs look like to me.