It’s an interesting and frustrating problem. I think there are three potential ways forward, but they’re both flawed:
Quasi-Centralization: a project like Mastodon or a vetted Non-Profit entity operates a high-concurrency server whose sole purpose is to cache link metadata and Images. Servers initially pull preview data from that, instead of the direct page.
We find a way to do this in some zero-trust peer-to-peer way, where multiple servers compare their copies of the same data. Whatever doesn’t match ends up not being used.
Servers cache link metadata and previews locally with a minimal amount of requests; any boost or reshare only reflects a proxied local preview of that link. Instead of doing this on a per-view or per-user basis, it’s simply per-instance.
I honestly think the third option might be the least destructive, even if it’s not as efficient as it could be.
Or 4) Ignore noise and do nothing; this is a case of user talking about things they don’t understand at best, or a blog intentionally misleading others to drum up traffic for themselves at worst. This is literally not a problem. Serving that kind of traffic can be done on a single server without any CDN and they’ve got a CDN already.
It’s an interesting and frustrating problem. I think there are three potential ways forward, but they’re both flawed:
Quasi-Centralization: a project like Mastodon or a vetted Non-Profit entity operates a high-concurrency server whose sole purpose is to cache link metadata and Images. Servers initially pull preview data from that, instead of the direct page.
We find a way to do this in some zero-trust peer-to-peer way, where multiple servers compare their copies of the same data. Whatever doesn’t match ends up not being used.
Servers cache link metadata and previews locally with a minimal amount of requests; any boost or reshare only reflects a proxied local preview of that link. Instead of doing this on a per-view or per-user basis, it’s simply per-instance.
I honestly think the third option might be the least destructive, even if it’s not as efficient as it could be.
As I understand it, 3) already happens. What causes the load is that each connected instance is also loading and caching the preview.
Or 4) Ignore noise and do nothing; this is a case of user talking about things they don’t understand at best, or a blog intentionally misleading others to drum up traffic for themselves at worst. This is literally not a problem. Serving that kind of traffic can be done on a single server without any CDN and they’ve got a CDN already.