NY Times copyright suit wants OpenAI to delete all GPT instances

geese_feces [comrade/them, love/loves]@hexbear.net · 11 months ago

NY Times copyright suit wants OpenAI to delete all GPT instances

drhead [he/him]@hexbear.net · 11 months ago

In this case, NYT most likely is actually just looking for a cut of the money. Their claims in this are too absurd to actually hold up under scrutiny, nobody is using ChatGPT to bypass NYT’s paywall on whatever years old content they actually have in their training data, people are using browser extensions for that. I would also want to know who is the target of the claims that ChatGPT hallucinating about NYT articles is damaging NYT’s reputation.

One of the more significant things that could happen is that OpenAI could be forced to disclose details of their training data as part of discovery, which they really will not want to do. It would then be pretty easy to gauge exactly how overfit ChatGPT is (GPT 4.0 has 1.1 trillion parameters, depending on what precision they run it at this would be around a terabyte or more in size, I think 3.5 is closer to 350B, if the dataset has less entropy than the model parameters it is effectively guaranteed to start spitting out exact copies of training data). It would also be very useful info for OpenAI’s competitors, so OpenAI will try to get the suit dismissed or settle before then. Deleting their dataset like NYT is demanding is absolutely not going to happen, since at most they have standing to make them delete their articles from their training dataset. Finetuning the model to not comply with NYT-related requests would also be enough to get their model to no longer infringe on their copyrights as well.

They might also be angling for government regulation with a lawsuit making bold claims that they expect to catch headlines and shape public opinion but don’t completely expect to stick in court, since that’s a recurring pattern in a lot of lawsuits against AI firms, like the Stable Diffusion lawsuit which contained absolute bangers like the claim that it stores compressed images just like JPEG compression and that the text-prompt interface “creates a layer of magical misdirection that makes it harder for users to coax out obvious copies of training images” (this is actually in the announcement for that lawsuit, I’m not making this shit up. It’s really not surprising that most of that suit got thrown out).

There’s no real endgame for them where they get anything further than a cut. AI companies can still train on copyright-free or licensed data and over time will get similar results, so there’s not really anything that can be done to stop that in general. Copyright-reliant industries can certainly secure themselves a better position within that, though, where they might be able to gain either a steady income from licensing fees or exclusive use of their content for models under their control.

WithoutFurtherBelay@hexbear.net · 11 months ago

Their claims aren’t that absurd; Their articles likely were all used for training data. You could make an argument that that is copyright violation anyways.

I don’t AGREE with copyright but I don’t think the concept is absurd, especially when you’ve already established that legally protecting information behind paywalls is allowed (also stupid).

drhead [he/him]@hexbear.net · 11 months ago

Using it for training data is one thing, but that’s not all that’s being claimed. Merely using it isn’t enough for it to be infringement because fair use can be a defense, and quite likely a viable one if it wasn’t spitting articles out verbatim. People already do use copyrighted data from news sites verbatim for making new products that do something different, like search engines, or for other things that are of significant public interest, like archival. People also do republish articles from news sites, with or without attribution. So for the basic case of copyright infringement by training, NYT has to show that what ChatGPT is doing is more akin to that than it is akin to what a search engine does in order to get something that sticks in court.

They are, among other things, effectively asking for compensation as if people were using ChatGPT as an alternative to buying a NYT subscription, which is just the type of clown shit that only a lawyer could come up with. At the same time, they are also asking for compensation for defamation when it fails to reproduce an article and makes shit up instead. If this case keeps going, those claims are going to end up getting dismissed like a lot of the claims in the Andersen v. Midjourney/Stability AI/Deviantart case did. The lawyers involved know this, they’re probably expecting the infringement for training to stick and consider the others to be bonuses that would be nice to have. Probably also door-in-the-face technique as well.

A settlement is probably more likely still, because at the end of the day OpenAI would much rather avoid going through what this case will require of them during discovery, and the most significant claim NYT has against them is literally demonstrating a failure mode of the model, which OpenAI will want to fix whether or not there’s copyright issues involved (maybe by not embodying the “STACK MORE LAYERS” meme so much next time). After that’s fixed, the rest of what NYT has against them will be much more difficult to argue in court.

NY Times copyright suit wants OpenAI to delete all GPT instances

NY Times copyright suit wants OpenAI to delete all GPT instances

NY Times sues Open AI, Microsoft over copyright infringement