• rufus
    link
    fedilink
    English
    arrow-up
    3
    ·
    11 months ago

    I’m sorry for repeating myself. But didn’t Meta just stop disclosing the exact training dataset? Presumably because they’re using copyrighted data from the internet? Isn’t that hypocritical? IMHO we need laws and/or companies need to stop disregarding copyright when training their own models and then claiming copyright once other people start doing the same thing.

    • wagesj45@kbin.social
      link
      fedilink
      arrow-up
      4
      ·
      11 months ago

      Personally I don’t think copyright holders really have a leg to stand on as far as that goes. Simply having and using a copyrighted work isn’t a violation, and the work that is produced in the form of a trained neural network is the very definition of transformative. I also think Meta would have the same issue with trying to use a copyright claim for someone using their llama output to improve other non-llama models. That’s why they had to slip it into a terms of service.

      I guess what you might see going forward is every book that’s published comes with a user agreement you agree to by opening the book… But that doesn’t sound practical in any sense.