8
What grits is and what it's trying to achieve - Lemmy.world
lemmy.worldOverview So I made this forum to work on one specific piece of software that I
think could benefit Lemmy (and the overall fediverse community) substantially.
Iāll lay out what I want to make and why, in some detail. I apologize for the
length, but I canāt really do this without some level of support and agreement
from the community, so hopefully the wall of text is worth it if it resonates
with some people and theyāre swayed to support the idea. If something like this
already exists please let me know. I looked and couldnāt find it, which is why
Iām making this extensive pitch about it being a good idea. But, if itās already
in the works, Iād be just as happy working on existing tech instead of
reinventing it. So: The Problem In short, the problem is that you have to pay
for hosting. Reddit started as a great community, just like Lemmy is now, but
because it was great it got huge, which meant they had to pay millions of
dollars to run their infrastructure, and now all of a sudden theyāre not a
community site anymore. Theyāre a business, whether they like that or not. Fast
forward fifteen years and look how that turned out. I think this will impact
Lemmy in the future, in very different ways but still substantially. Itās
actually already, at this very early stage, impacting Lemmy: There are popular
instances that are struggling under the load, and people are asking for
donations because they have hosting bills. Sure, donations are great, and Iām
sure these particular load problems will get solved ā but the underlying
conflict, that someone who wants to run a substantial part of the network has to
make a substantial financial investment, will remain. Because of its federated
nature, Lemmy is actually a lot better positioned to resist this problem. But,
itāll still be a problem on some level (esp. for big instances), and wouldnāt it
be better if we just didnāt have to worry about it? The Solution Basically, I
propose that all users help run the network. Lemmy is a big step forward because
a lot more of users can help than before, but even in Lemmy, only a small
fraction of people will choose to make instances, and youāll still have big
instances serving lots of content. I propose to make it trivially easy for the
end-users to carry the load. They can install an app on their phones, or a
browser plugin, or run something on their home computer, but they have
absolutely trivial ways to use their hardware to add load capacity. The load on
the instances will be way reduced just from that option existing, I think. I
would actually argue for taking it a step further and having instance operators
be able to require load-carrying by their users, but thatās a choice for the
individual operators and the community, based on observation of how this all
plays out in practice. One Implementation Itās easy to talk in generalities. Iām
going to describe one particular way I could envision this being implemented.
This proposed approach is actually not specific to Lemmy ā it would benefit
Lemmy quite a lot I think, but you could just as easily use this technology to
carry load for a Mastadon instance or a traditional siloed web site. Itās
complementary to Lemmy, but not specific to it. Also, this is going to be
somewhat technical, so feel free to just skip to the next section if youāre just
interested in the broad picture. So like I said, I propose to make peer software
that provides capacity to the system to balance out the load youāre causing as
an end-user. The peer is extremely simple ā mostly it runs a node in a shared
data store like IPFS or Holepunch, and it serves content-addressable chunks of
data to other users. You can run it as an app on your phone if you have
unlimited data, you can run it as a browser plugin (which speeds up your
experience as a user, since itāll have precached some of the data the app will
need), you can run it on your computer back at home while you access Lemmy from
the road, etc. The peer doesnāt need to be trusted (since itās serving
content-addressable data that gets double-checked), and it doesnāt need to be
reliable or always on. The system keeps rough track of how much capacity your
peer(s) have added, and as long as itās less then your user has consumed, youāre
fine if your peer goes away for a couple of days or something. When you, as a
user, open your Lemmy page served by the instance, what you get served back is
tiny: Just a static chunk of bootstrapping javascript, a list of good peers you
can talk to, and a content hash of the ārootā of the data store. What the
bootstrapping code does, is to start at the ārootā of what it got told was the
current state of the content, and walk down from there through the namespace,
fetching everything it needs (both the data and the Lemmy app to render it and
interact with it) by making content-addressable requests to peers. Since it all
started with a verified content hash, itās all trustable. Itās important that
the bootstrapping code in the browser verifies everything that it gets from
every peer. You canāt trust anything you get from the peers, so you verify it
all. Also, you donāt trust the peers to be available ā the bootstrapping code
keeps track of which ones are providing good performance, and doesnāt talk to
just a single one, so if one is overloaded or suddenly drops out, the userās
experience isnāt ruined. Also, youāre able to configure a peer youāre running to
always keep full a mirror of some part of the data store that youāre invested
in. Thatās vital, because this system canāt magically make all data always
available without anyone thinking about it ā it just decouples (1) an instance
you can always reach, which is probably on paid hosting, from (2) a peer which
provides the heavy lifting of load capacity, but might drop out at any time,
i.e. can run on unmetered consumer internet. You as a moderator still need to
ensure that (1) and (2) are both present if you want to ensure that your content
is going to exist on the system. The end result of this is that the end-userās
interaction with the system only places load on the instance when it first
fetches the bootstrapping packge. My hope would be that it can be small enough
that you can run a fairly busy instance on a $20/month hosting package, instead
of paying hundreds or thousands of dollars a month. Also, like I said, I think
culturally it would be way better if running a peer was a requirement to access
the instance. Thatās up to the individual instance operators, obviously, but to
me people shouldnāt just be entitled to use the system. They have to help
support it if theyāre going to add load (since itās become trivial enough that
thatās reasonable to ask). Aside from ensuring load capacity, I actually think
that would be a big step up culturally ā look at the moderation problems every
online forum has right now because people are empowered to come onto shared
systems and be dicks. I think having your use of the system contingent on
fulfilling a social contract is going to empower the operators of the system a
lot. If someoneās being malicious, you donāt have to play whack-a-mole with
their IP addresses to try to revoke their entitlement to be there ā you just
remove their status as a peer and their privilege to even use the system youāve
volunteered to make available in the first place. Iāve handwaved aside some
important details to paint the broad picture. How do updates to the content
happen? How do you index the data or make it relational so you make real apps on
top of this? How do you prevent malicious changes to the data store? How is a
peer thatās port-restricted or behind NAT still able to help? These are
obviously not minor issues, but theyāre also not new or extraordinary
challenges. This is already long enough, so Iāll make a separate post addressing
more of the nitty-gritty details. Whatās the Result? So to zoom back out: One
result, hopefully, is that the experience becomes faster from the end-user
perspective. Hopefully. I believe that the increase in capacity will more than
make up for the slowness introduced by distributing the data store, but thatās
just theory at this point. I would also argue that this will start to open up
possibilities like video streaming that are hard to do if instances host all the
content. But regardless of that, I think big popular instances not having to pay
ever-increasing hosting costs is huge. Itās necessary. Itās not a trivial
benefit. And, in addition to that and the cultural issues, I think this improves
the overall architecture of the system in one more very significant way: Because
the Lemmy app itself becomes static (AJAX-utilizing javascript which exists
fully within the shared data store), it becomes trivial to make your own custom
changes to the app even if you donāt want to run an instance. You can clone the
Lemmy app in the data store, make revisions, and then tell the system that you
want to see your same data but rendered with the new version of the web app.
Ultimately the entire system becomes a lot more transparent and flexible from a
tech-savvy userās perspective. You donāt have to interact with āthe Lemmy APIā
in the same way people had to interact with āthe Reddit APIā ā your modified or
independent app just interacts directly with the data. This is a huge shift
further in the same direction that started with federating the servers in the
first place. Part of the further future beyond this document is the possibility
of opening up a lot of tinkering possbilities for tech-savvy end users, and
expanding what even non-techy end users would be able to do with the apps
theyāre interacting with. Getting It Done So I think Iām hitting a length limit,
so Iāll fill in the details of the first steps I want to take, down in the
comments.
Hi all - So I posted about distributed hosting yesterday. I wrote up some thoughts on how I think it could work, and Iām planning to start work on it ā if anyone has feedback on my proposal, or wants to get involved to help, 100% let me know as Iād love to hear.
(Edit: removed the link from the URL field, as it pasted it unformatted into the post which is not productive. Click the link in the paragraph above if you want to read in an un-eye-crossing format.)
Please add some text formating, at least some line breaks ā¦
I asked ChatGPT to do that for you :)
So I made this forum to work on one specific piece of software that I think could benefit Lemmy (and the overall fediverse community) substantially. Iāll lay out what I want to make and why, in some detail.
I apologize for the length, but I canāt really do this without some level of support and agreement from the community, so hopefully the wall of text is worth it if it resonates with some people and theyāre swayed to support the idea. If something like this already exists please let me know. I looked and couldnāt find it, which is why Iām making this extensive pitch about it being a good idea. But, if itās already in the works, Iād be just as happy working on existing tech instead of reinventing it.
The Problem
In short, the problem is that you have to pay for hosting. Reddit started as a great community, just like Lemmy is now, but because it was great it got huge, which meant they had to pay millions of dollars to run their infrastructure, and now all of a sudden theyāre not a community site anymore. Theyāre a business, whether they like that or not.
Fast forward fifteen years and look how that turned out. I think this will impact Lemmy in the future, in very different ways but still substantially. Itās actually already, at this very early stage, impacting Lemmy: There are popular instances that are struggling under the load, and people are asking for donations because they have hosting bills.
Sure, donations are great, and Iām sure these particular load problems will get solved ā but the underlying conflict, that someone who wants to run a substantial part of the network has to make a substantial financial investment, will remain. Because of its federated nature, Lemmy is actually a lot better positioned to resist this problem. But, itāll still be a problem on some level (esp. for big instances), and wouldnāt it be better if we just didnāt have to worry about it?
The Solution
Basically, I propose that all users help run the network. Lemmy is a big step forward because a lot more of users can help than before, but even in Lemmy, only a small fraction of people will choose to make instances, and youāll still have big instances serving lots of content.
I propose to make it trivially easy for the end-users to carry the load. They can install an app on their phones, or a browser plugin, or run something on their home computer, but they have absolutely trivial ways to use their hardware to add load capacity. The load on the instances will be way reduced just from that option existing, I think.
I would actually argue for taking it a step further and having instance operators be able to require load-carrying by their users, but thatās a choice for the individual operators and the community, based on observation of how this all plays out in practice.
One Implementation
Itās easy to talk in generalities. Iām going to describe one particular way I could envision this being implemented. This proposed approach is actually not specific to Lemmy ā it would benefit Lemmy quite a lot I think, but you could just as easily use this technology to carry load for a Mastadon instance or a traditional siloed web site. Itās complementary to Lemmy, but not specific to it.
Also, this is going to be somewhat technical, so feel free to just skip to the next section if youāre just interested in the broad picture.
So like I said, I propose to make peer software that provides capacity to the system to balance out the load youāre causing as an end-user. The peer is extremely simple ā mostly it runs a node in a shared data store like IPFS or Holepunch, and it serves content-addressable chunks of data to other users.
You can run it as an app on your phone if you have unlimited data, you can run it as a browser plugin (which speeds up your experience as a user, since itāll have precached some of the data the app will need), you can run it on your computer back at home while you access Lemmy from the road, etc.
The peer doesnāt need to be trusted (since itās serving content-addressable data that gets double-checked), and it doesnāt need to be reliable or always on. The system keeps rough track of how much capacity your peer(s) have added, and as long as itās less then your user has consumed, youāre fine if your peer goes away for a couple of days or something.
When you, as a user, open your Lemmy page served by the instance, what you get served back is tiny: Just a static chunk of bootstrapping javascript, a list of good peers you can talk to, and a content hash of the ārootā of the data store
Part 2:
(Continued from the post)
Whatās the Next Step?
I started touching on some imagined future steps, but this chunk is already a plenty big and ambitious thing. So, hereās an initial plan for how I want to attack taking first steps and bring myself into contact with the engineering reality (as opposed to the rosy broad picture). Hopefully at the end of this chunk of work, the vision will have adapted somewhat to the reality of whatās useful, whatās possible, what the communityās feedback is, what the issues and problems involved are, etc.
(And, obviously, I want to communicate with the Lemmy devs to make sure these ideas are in line with their vision. Iām laying this all out so extensively partly so that the community has a full explanation of what Iām proposing to do and why.)
So, first steps: Iām making a Lemmy instance that I can use for implementing this. Iām waiting for my hosting to go up so I can make it live, but once itās up, Iāll start working on it + posting from the testbed about whatās going on. My initial coding task list is:
Set up the peer software with the content-addressable store
Start to have my instance do peer discovery, make the app that runs in peopleās browsers from my instance become more AJAX-y and begin to request data from the peers instead of the instance.
Once that partās working on my instance, Iād aim to be able to move pieces of the actual app onto the peers ā construct the bootstrap code, continue the AJAX-ification of the code on my Lemmy instance, and have the bootstrapping app construct the end-user application directly from data from the peers.
Start to tackle the browser app making updates to the data store via requests to the peers, which will involve a lot of work and lot of sorting out replication issues, security and trust issues, and performance issues.
Thatās already a fairly large amount to take on. I have further ideas about how the system could move forward from there, but even just that represents (1) an ambitious thing to tackle (2) significant proposed changes to the instance software (3) if it works, a fantastically useful tool that instance operators could use to reduce their instance load if they want to. So, Iām limiting the plan to that much for now until I get some contact with the technical reality and with the community.
What You Can Do
So if youāve read to the end, maybe you think this is a good idea. Want to help? This is a bunch of work already and Iād love it if people wanted to help get it done. Leave a comment, let me know what you think whether positive or negative, and if you want to help, 100% reach out and letās get it done. Iām skilled with software engineering in general, but Iām actually not too familiar in particular with web backends and AJAX, so someone more skilled than I am could probably help this along in a huge way. Specific things that might be useful:
If you want to run a peer or instance and help test the system
If you can help with coding
If you have feedback on these ideas in general, either positive or else things Iāve overlooked or need to adjust
Hope to hear from you and thank you for reading my wall of text. Let me know what you think + cheers to you.
I got a spare raspberry pi set up as a server. I can use that to host stuff and am okay in programming (not rust though). Let me know if I can be of assistance in anyway. Be happy to help with this effort
Yes, absolutely! Sorry for the silenceā¦ I was working on code with a little bit expanded scope from the original project, but itās shaping up to be maybe within a week or two something that could actually be tested. You can read the update about the current state of things; in it I talk about having a test instance set up, and wanting to set up proxy caches for it to be able to test the whole system in real-world functioning. The codeās not ready yet, but maybe if in a couple weeks you still want to help with testing, I can help you get a proxy node set up on your Pi and then that can form part of the initial proof-of-concept on that testbed server?
For sure and no worries about the delay in response. I am out on vacation but when I get back, I can lend help set up a proxy node. Maybe set it up as a container on docker
Or, what the hell; hereās a copy-paste of the whole thing to be able to read it here too. Part 1:
Overview
So I made this forum to work on one specific piece of software that I think could benefit Lemmy (and the overall fediverse community) substantially. Iāll lay out what I want to make and why, in some detail. I apologize for the length, but I canāt really do this without some level of support and agreement from the community, so hopefully the wall of text is worth it if it resonates with some people and theyāre swayed to support the idea.
If something like this already exists please let me know. I looked and couldnāt find it, which is why Iām making this extensive pitch about it being a good idea. But, if itās already in the works, Iād be just as happy working on existing tech instead of reinventing it.
So:
The Problem
In short, the problem is that you have to pay for hosting. Reddit started as a great community, just like Lemmy is now, but because it was great it got huge, which meant they had to pay millions of dollars to run their infrastructure, and now all of a sudden theyāre not a community site anymore. Theyāre a business, whether they like that or not. Fast forward fifteen years and look how that turned out.
I think this will impact Lemmy in the future, in very different ways but still substantially. Itās actually already, at this very early stage, impacting Lemmy: There are popular instances that are struggling under the load, and people are asking for donations because they have hosting bills. Sure, donations are great, and Iām sure these particular load problems will get solved ā but the underlying conflict, that someone who wants to run a substantial part of the network has to make a substantial financial investment, will remain.
Because of its federated nature, Lemmy is actually a lot better positioned to resist this problem. But, itāll still be a problem on some level (esp. for big instances), and wouldnāt it be better if we just didnāt have to worry about it?
The Solution
Basically, I propose that all users help run the network. Lemmy is a big step forward because a lot more of users can help than before, but even in Lemmy, only a small fraction of people will choose to make instances, and youāll still have big instances serving lots of content. I propose to make it trivially easy for the end-users to carry the load. They can install an app on their phones, or a browser plugin, or run something on their home computer, but they have absolutely trivial ways to use their hardware to add load capacity. The load on the instances will be way reduced just from that option existing, I think. I would actually argue for taking it a step further and having instance operators be able to require load-carrying by their users, but thatās a choice for the individual operators and the community, based on observation of how this all plays out in practice.
One Implementation
Itās easy to talk in generalities. Iām going to describe one particular way I could envision this being implemented. This proposed approach is actually not specific to Lemmy ā it would benefit Lemmy quite a lot I think, but you could just as easily use this technology to carry load for a Mastadon instance or a traditional siloed web site. Itās complementary to Lemmy, but not specific to it. Also, this is going to be somewhat technical, so feel free to just skip to the next section if youāre just interested in the broad picture.
So like I said, I propose to make peer software that provides capacity to the system to balance out the load youāre causing as an end-user. The peer is extremely simple ā mostly it runs a node in a shared data store like IPFS or Holepunch, and it serves content-addressable chunks of data to other users. You can run it as an app on your phone if you have unlimited data, you can run it as a browser plugin (which speeds up your experience as a user, since itāll have precached some of the data the app will need), you can run it on your computer back at home while you access Lemmy from the road, etc. The peer doesnāt need to be trusted (since itās serving content-addressable data that gets double-checked), and it doesnāt need to be reliable or always on. The system keeps rough track of how much capacity your peer(s) have added, and as long as itās less then your user has consumed, youāre fine if your peer goes away for a couple of days or something.
When you, as a user, open your Lemmy page served by the instance, what you get served back is tiny: Just a static chunk of bootstrapping javascript, a list of good peers you can talk to, and a content hash of the ārootā of the data store. What the bootstrapping code does, is to start at the ārootā of what it got told was the current state of the content, and walk down from there through the namespace, fetching everything it needs (both the data and the Lemmy app to render it and interact with it) by making content-addressable requests to peers. Since it all started with a verified content hash, itās all trustable.
Itās important that the bootstrapping code in the browser verifies everything that it gets from every peer. You canāt trust anything you get from the peers, so you verify it all. Also, you donāt trust the peers to be available ā the bootstrapping code keeps track of which ones are providing good performance, and doesnāt talk to just a single one, so if one is overloaded or suddenly drops out, the userās experience isnāt ruined. Also, youāre able to configure a peer youāre running to always keep full a mirror of some part of the data store that youāre invested in. Thatās vital, because this system canāt magically make all data always available without anyone thinking about it ā it just decouples (1) an instance you can always reach, which is probably on paid hosting, from (2) a peer which provides the heavy lifting of load capacity, but might drop out at any time, i.e. can run on unmetered consumer internet. You as a moderator still need to ensure that (1) and (2) are both present if you want to ensure that your content is going to exist on the system.
The end result of this is that the end-userās interaction with the system only places load on the instance when it first fetches the bootstrapping packge. My hope would be that it can be small enough that you can run a fairly busy instance on a $20/month hosting package, instead of paying hundreds or thousands of dollars a month. Also, like I said, I think culturally it would be way better if running a peer was a requirement to access the instance. Thatās up to the individual instance operators, obviously, but to me people shouldnāt just be entitled to use the system. They have to help support it if theyāre going to add load (since itās become trivial enough that thatās reasonable to ask). Aside from ensuring load capacity, I actually think that would be a big step up culturally ā look at the moderation problems every online forum has right now because people are empowered to come onto shared systems and be dicks. I think having your use of the system contingent on fulfilling a social contract is going to empower the operators of the system a lot. If someoneās being malicious, you donāt have to play whack-a-mole with their IP addresses to try to revoke their entitlement to be there ā you just remove their status as a peer and their privilege to even use the system youāve volunteered to make available in the first place.
Iāve handwaved aside some important details to paint the broad picture. How do updates to the content happen? How do you index the data or make it relational so you make real apps on top of this? How do you prevent malicious changes to the data store? How is a peer thatās port-restricted or behind NAT still able to help? These are obviously not minor issues, but theyāre also not new or extraordinary challenges. This is already long enough, so Iāll make a separate post addressing more of the nitty-gritty details.
Whatās the Result?
So to zoom back out: One result, hopefully, is that the experience becomes faster from the end-user perspective. Hopefully. I believe that the increase in capacity will more than make up for the slowness introduced by distributing the data store, but thatās just theory at this point. I would also argue that this will start to open up possibilities like video streaming that are hard to do if instances host all the content. But regardless of that, I think big popular instances not having to pay ever-increasing hosting costs is huge. Itās necessary. Itās not a trivial benefit. And, in addition to that and the cultural issues, I think this improves the overall architecture of the system in one more very significant way:
Because the Lemmy app itself becomes static (AJAX-utilizing javascript which exists fully within the shared data store), it becomes trivial to make your own custom changes to the app even if you donāt want to run an instance. You can clone the Lemmy app in the data store, make revisions, and then tell the system that you want to see your same data but rendered with the new version of the web app. Ultimately the entire system becomes a lot more transparent and flexible from a tech-savvy userās perspective. You donāt have to interact with āthe Lemmy APIā in the same way people had to interact with āthe Reddit APIā ā your modified or independent app just interacts directly with the data. This is a huge shift further in the same direction that started with federating the servers in the first place. Part of the further future beyond this document is the possibility of opening up a lot of tinkering possbilities for tech-savvy end users, and expanding what even non-techy end users would be able to do with the apps theyāre interacting with.
Getting It Done
So I think Iām hitting a length limit, so Iāll fill in the details of the first steps I want to take, in the next comment.
So I actually didnāt do that, Lemmy did :-). I posted in a different community, then posted the link to that post in /c/selfhosted, not realizing that me doing it that way would include an unformatted and awful-looking partial version of my post into this post. Iāve fixed it now. Just click the link in the text at the top to read the well-formatted version.