This is a proposal by some AI bro to add a file called llms.txt that contains a version of your websites text that is easier to process for LLMs. Its a similar idea to the robots.txt file for webcrawlers.

Wouldn’t it be a real shame if everyone added this file to their websites and filled them with complete nonsense. Apparently you only need to poison 0.1% of the training data to get an effect.

  • raoul@lemmy.sdf.org
    link
    fedilink
    arrow-up
    21
    ·
    1 day ago

    We could respect this convention the same way the IA webcrawlers respect robot.txt 🤷‍♂️

    • Tower@lemm.ee
      link
      fedilink
      arrow-up
      9
      ·
      1 day ago

      Do webcrawlers from places other than Iowa respect that file differently?

    • DaGeek247@fedia.io
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      I’ve had a page that bans by ip listed as ‘dont visit here’ on my robots.txt file for seven months now. It’s not listed anywhere else. I have no banned IPs on there yet. Admittedly, i’ve only had 15 visitors in that past six months though.