• b3nsn0w@pricefield.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      i’m actually kinda interested how that could work. a regular user using “near infinitely less” resources than a scraping engine sounds like some absolutely stupid design, either on reddit’s or the scraping engine’s side

      • sauerkraus@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        When using the API you just request what you’re looking for. With scraping you load everything repeatedly.

        • b3nsn0w@pricefield.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          except most of the weight of the site is in easily cachable assets that don’t get reloaded at all. probably not even loaded to begin with, since even though new reddit is a single-page app, it does have seed data in the html content itself, which a well-written scraper (or one that automatically parses the site with chatgpt) can easily extract. constantly reloading styles and scripts would be a ridiculously stupid design on the scraper’s part, and on reddit’s if they necessitated it.

          the html page itself is slightly heavier than just the json data but compared to all the images and videos real clients load and the giant piles of tracking data being sent back every second, a scraper is def going to be lighter. plus the site does reload itself every time you enter a new subreddit, that doesn’t happen through the api for some reason.