Give me a hive five. Beehaw, pardners!

  • tetris11@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    They might, but you will still be helping people, and if at a later date a court mandates that the authors of the training data be compensated for their actions, or if the corpus is released into open source repositories – then I’d still call that a win for humanity.

    • 4bh1j47@beehaw.org
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      That is a fair point.

      Personally in an ideal world, I would like to export all of my data from reddit before leaving, and then if later someone wants to host all of the dataset under a permissive open source license like I believe stackoverflow or wikipedia do, which is accessible to search engines, then scrub+anonymize my dataset and upload it there.

      Obviously the issues with something like this are people uploading doctored data to poison the training models etc.