Training "AI" On Public Data Is Totally Fine And Not Stealing.

31337@sh.itjust.works · 5 months ago

jordanlund@lemmy.world · 5 months ago

Generally the argument isn’t public vs. private, it’s public domain vs. copyright.

You want to train an LLM using the contents of Project Gutenberg? Great, go for it!

You want to train an LLM using bootlegged epubs stolen from Amazon? Now that’s a different deal.

troed@fedia.io · 5 months ago

Sure - they’d need to at least loan the epubs just like a human would need to if wanting to read them.