A long form response to the concerns and comments and general principles many people had in the post about authors suing companies creating LLMs.

  • Gumby@beehaw.org
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 year ago

    I think this has to do with intent. If I read a book to use it for the basis of a play, that would be illegal. If I read for enjoyment, that is legal. Since AI does not read for enjoyment, but only to use it for the basis of creating something else, that would be illegal.

    Is my logic flawed?

    • Umbrias@beehaw.org
      link
      fedilink
      English
      arrow-up
      16
      ·
      1 year ago

      This isn’t how it works at all. I can, and should, and do, read and consume all sorts of media with the intention of stealing from it for my own works. If you ask for writing advice, this is actually probably one of the first things you’ll hear: read how other people do it.

      So this does not work as an argument, “the intent of the reading” because if so humans could never generate any new media either.

      • Peanut@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        12
        ·
        edit-2
        1 year ago

        This is the thing I kept shouting when diffusion models took off. People are effectively saying “make it illegal for neural nets to learn from anything creative or productive anywhere in any way”

        Because despite the differences in architecture, I think it is parallel.

        If the intent and purpose of the tool was to make copies of the work in a way we would consider theft of done by a human, I would understand.

        The same way there isn’t any legal protection on neural nets learning from personal and abstract information to manipulate and predict or control the public, the intended function of the tool should make it illegal.

        But people are too self focused and ignorant to riot enmass about that one.

        The dialogue should also be in creating a safety net as more and more people lose value in the face of new technology.

        But fuck any of that, what if an a.i. learned from a painting I made ten year ago, like every other artists who may have learned from it? Unforgivable.

        I don’t believe it’s reproducing my art, even if asked to do so, and I don’t think I’m entitled to anything.

        Also copyright has been fucked for decades. It hasn’t served the people since long before the Mickey mouse protection act.

        • flyingowlfox@beehaw.org
          link
          fedilink
          English
          arrow-up
          8
          ·
          edit-2
          1 year ago

          Regardless of intent, let’s not pretend that the scale at which LLMs “process” information to generate new content is comparable to humans. That is obviously what was intended for copyright laws (so far).

          • Sas [she/her]@beehaw.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            1 year ago

            We don’t need to pretend though. People with speed reading skills are faster than most humans as well and could read a lot more books.

            It’s very probable that you read at least one writers whole library, even if it’s as many stories as Terry Pratchett got published which will always be true for human written books as writing them takes longer than reading.

            Obviously the acquirement of those stories has to be made in a legal way and no actual passages should be stored in the model but the amount of data processed should have no say on if it can be used.

            And as written by others here. Making copyright law more strict puts big corps at an advantage because they have big legal teams and money to just pay the copyright fee while your regular user would not be able to.

          • Peanut@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            It’s comparing a bird to a plane, but I still think the process constitutes “learning,” which may sound anthropomorphic to some, but I don’t think we have a more accurate synonym. I think the plane is flying even if the wings aren’t flapping and the plane doesn’t do anything else birds do. I think LLMs, while different, reflect the subconscious aspect of human speech, and reflect the concept of learning from the data more than “copying” the data. It’s not copying and selling content unless you count being prompted into repeating something it was trained on heavily enough for accurate verbatim reconstruction. To me, that’s no more worrying than Disney being able to buy writers that have memorized some of their favorite material, and can reconstruct it on demand. If you ask your intern to reproduce something verbatim with the intent of selling it. I still don’t think the training or “learning” were the issues.

            To accurately address the differences, we probably need new language and ideals for the specific situations that arise in the building of neural nets, but I still consider much of the backlash completely removed from any understanding of what has been done with the “copywrited material.”

            I tend to view it thinking about naturally training these machines in the future with real world content. Should a neural net built to act in the real world be sued if an image of a coca-cola can was in the training data somewhere, and some of the machines end up being used to make cans for a competitor?

            How many layers of abstraction, or how much mixture with other training data do you need to not consider that bit of information to be comparable to the crime of someone intentionally and directly creating an identical logo and product to sell?

            Copyright laws already need an overhaul prior to a.i.

            It’s no coincidence that warner and Disney are so giant right now, and own so much of other people’s ideas. That they have the money to control what ideas get funded or not. How long has Disney been dead? More than half a century. So why does his business own the rights of so many artists who came after?

            I don’t think the copywrite system is ready to handle the complexity of artificial minds at any stage, whether it is the pareidolic aspect of retrieving visual concepts of images in diffusion models, or the complex abilities that arise from current scale LLMs? which again, I believe are able to resemble the subconscious aspect of word predictions that exists in our minds

            We can’t even get people to confidently legislate a simple ethical issue like letting people have consensual relationships with the gender of their own choice. I don’t have hope we can accurately adjust at each stage of development of a technology so complex we don’t even have the language to properly describe the functioning. I just believe that limiting our future and important technology for such grotesquely misdirected egoism would be far more harmful than good

            The greater focus should be in guaranteeing that technological or creative developments benefit the common people, not just the rich. This should have been the focus for the past half century. People refuse this conceptually because they’ve been convinced that any economic re-balancing is evil when it benefits the poor. Those with the ability to change anything are only incentivized to help themselves.

            But everyone is just mad at the machine because “what if it learned from my property?”

            I think the article even promotes Adobe as the ethical alternative. Congrats, you’ve limited the environment so that only the existing owners of everything can advance. I don’t want to pay Adobe a subscription for the rest of my life for the right to create on par with more wealthy individuals. How is this helping the world or creation of art?

    • whelmer@beehaw.org
      link
      fedilink
      English
      arrow-up
      8
      ·
      1 year ago

      Your logic is flawed in that derivative works are not a violation of copyright. Generally, copyright protects a text or piece of art from being reproduced. Specific characters and settings can be protected by copyright, concepts and themes cannot. People take inspiration from the work of others all the time. Lots of TV shows or whatever are heavily informed by previous works, and that’s totally fine.

      Copyright protects the reproduction of other peoples work, and the reuse of their specific characters. It doesn’t protect style, themes, concepts, etc. IE. the things that an AI is trying to derive. So like if you trained your LLM only on Tolkien such that it always told stories about Gandalf and the hobbits, then that would be a problem.

    • Mutoid@beehaw.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      “Reading with intent?” that sounds ridiculous. The only thing of concern is the work produced.

      • oomphaloompha@beehaw.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Open up! It’s the thought-police! We have reason to believe you are reading with intent to commit a criminal act! You are under arrest! Anything you say or think can and will be used against you in the court of law!