Perhaps I’ve misunderstood how Lemmy works, but from what I can tell Lemmy is resulting in fragmentation between communities. If I’ve got this wrong, or browsing Lemmy wrong, please correct me!
I’ll try and explain this with an example comparison to Reddit.
As a reddit user I can go to /r/technology and see all posts from any user to the technology subreddit. I can interact with any posts and communicate with anyone on that subreddit.
In Lemmy, I understand that I can browse posts from other instances from Beehaw, for example I could check out /c/technology@slrpnk.net, /c/tech@lemmy.fmhy.ml, or many of the other technology communities from other instances, but I can’t just open up /c/technology in Beehaw and have a single view across the technology community. There could be posts I’m interested in on the technology@slrpnk instance but I wouldn’t know about it unless I specifically look at it, which adds up to a horrible experience of trying to see the latest tech news and conversation.
This adds up to a huge fragmentation across what was previously a single community.
Have I got this completely wrong?
Do you think this will change over time where one community on a specific instance will gain the market share and all others will evaporate away? And if it does, doesn’t that just place us back in the reddit situation?
EDIT: commented a reply here: https://beehaw.org/comment/288898. Thanks for the discussion helping me understand what this is (and isnt!)
Ultimately this is a problem that’s never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn’t matter how careful you are, doesn’t matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.
There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that’s note enough either, as it provide no good way to update a resource.
The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.
The end result should be “Internet as globally distributed immutable data structure”.
Bit frustrating that this whole problem isn’t getting the attention it deserves.
No offense, but that solution sounds like a pipedream that wouldn’t work on a technical level. So you wish to keep not just the item someone published, but previous versions of it, have mirrors of it and tie it up in some sort of a blockchain thing. That sounds insanely more resource heavy than just hosting the document itself on one instance somewhere. It would be much more reliable sure, but currently even companies like reddit can struggle with all of the traffic, similarly with smaller open source projects like Lemmy instances or kbin, and your solution is to increase the amount of data?
It really isn’t. Most content out there is already immutable, you don’t see people uploading the same Youtube video five times with minor changes or editing their images after the upload, most services don’t even allow that for users, at best you can delete and upload a new video.
Furthermore, the blockchain would only contain metadata, not the actual data, so it’s automatically thousands of times easier to store than the data itself.
Mirroring that content is a complete separate and optional part of the problem, the important part is having content named in such a way that I can go to a mirror and ask “do you have XYZ” and get an answer that you can trust. With URLs that’s impossible, as they can show different content whenever they want.
Also this isn’t exactly a new idea, that’s how most software development already works these days. A Git repository stores a copy of every little change, and every download retrieves that complete history. What’s missing is some infrastructure on top of that that links all the different repositories together into one namespace (GitHub kind of does that internally, but that’s of no help for repositories hosted elsewhere).
Ok, so what if this blockchain has a metadata link to a video, which is hosted somewhere, and i remove that video from that host? How is that different than just a URL pointing to that video if the blockchain just holds metadata?
I don’t understand what you are solving.
The issue is that URLs don’t point to videos, they point to servers. What that server returns in response to an URL query is arbitrary. Might be a video today, could be a different video tomorrow, or a completely different website all together since the domain switched owners. Almost all URLs break over the course of a couple of years.
By using content-addressing (i.e. Merkel tree, SHA256, etc.) you are able to link to the video itself. It doesn’t matter if the server changes owner, your link will still point to that exact video. This does not automatically allow you to download the video of course, since the original server is still gone, but it allows you to ask others if they have a copy of that video and it allows you to verify that they returned the exact video you were looking for.
The blockchain or DHT, or whatever it might be in the end, would be used to organize the content-addresses and allow you to ask others for that video automatically. Or allow them to discover that new videos have been published. It would also provide some censorship resistance/transparency, since at the moment deleted content often just silently disappears, without any hint that it ever existed. A blockchain would keep record of what was there and why it was deleted.