For images, there is nightshade. For music, there is/will be whatever Benn Jordan is doing. For youtube, there is .ASS. But what about poisoning text on a web page? Is there any standard solution out there?

It should be relatively easy. I’ve been thinking about doing something myself, but figured someone else must have already done it.

  • e8d79M
    link
    fedilink
    arrow-up
    8
    ·
    7 days ago

    You can target the crawlers using tar pits and proof-of-work application firewalls but I am doubtful that poisoning does anything. The second a poisoning method becomes common enough to have an effect the AI companies will just start filtering for that. Unfortunately the only way I see that prevents your work from being stolen is to either not publish it at all, or to only publish to smaller invite based communities that closely monitor who is accepted.

    • shoki@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      7 days ago

      you could also have an unique challange, for example showing the user an image that has instructions to append sone text to the url. anything that scrapers are too stupid for (I don’t think they are scraping using “intelligent” ai agents yet)