Is this for real? I can buy a book, scan it and put it on the internet and it wouldn’t be piracy? Or is this just the usual “it’s not a crime if rich people/evilcorps do it” bs?
Putting the scan on the internet intact would be piracy. Putting up snippets is mostly OK. Ingesting the scans of millions of books into a massive data set and then regurgitating pieces of the masticated, processed mess seems still to be a grey area, but closer to ‘mostly OK’ than to piracy.
They do what you’re describing, more than any LLM does. These models do statistical modeling on books in a way they can answer questions about, combine concepts from, or provide descriptions of them, more than they can reproduce any particular page. They’re not burning all this power just to host a text file.
Is this for real? I can buy a book, scan it and put it on the internet and it wouldn’t be piracy? Or is this just the usual “it’s not a crime if rich people/evilcorps do it” bs?
Putting the scan on the internet intact would be piracy. Putting up snippets is mostly OK. Ingesting the scans of millions of books into a massive data set and then regurgitating pieces of the masticated, processed mess seems still to be a grey area, but closer to ‘mostly OK’ than to piracy.
Great use of “masticated”!
Yes, but only if you’re a multi-billion AI company.
Have you heard of the Internet Archive?
They do what you’re describing, more than any LLM does. These models do statistical modeling on books in a way they can answer questions about, combine concepts from, or provide descriptions of them, more than they can reproduce any particular page. They’re not burning all this power just to host a text file.