Back in the olden days, if you wrote a program, you were punching machine codes into a punch card and they were being fed into the computer and sent directly to the CPU. The machine was effectively yours while your program ran, then you (or more likely, someone who worked for your company or university) noted your final results, things would be reset, and the next stack of cards would go in.
Once computers got fast enough, though, it was possible to have a program replace the computer operator, an “operating system”, and it could even interleave execution of programs to basically run more than one at the same time. However, now the programs had to share resources, they couldn’t just have the whole computer to themselves. The OS helped manage that, a program now had to ask for memory and the OS would track what was free and what was in use, as well as interleaving programs to take turns running on the CPU. But if a program messed up and wrote to memory that didn’t belong to it, it could screw up someone else’s execution and bring the whole thing crashing down. And in some systems, programs were given a turn to run and then were supposed to return control to the OS after a bit, but it was basically an honor system, and the problem with that is likely clear.
Hardware and OS software added features to enforce more order. OSes got more power, and help from the hardware to wield it. Now instead of asking politely to give back control, the hardware would enforce limits, forcing control back to the OS periodically. And when it came to memory, the OS no longer handed out addresses matching the RAM for the program to use directly, instead it could hand out virtual addresses, with the OS tracking every relationship between the virtual address and the real location of the data, and the hardware providing Memory Management Units that can do things like store tables and do the translation from virtual to physical on its own, and return control to the OS if it doesn’t know.
This allows things like swapping, where a part of memory that isn’t being used can be taken out of RAM and written to disk instead. If the program tries to read an address that was swapped out, the hardware catches that it’s a virtual address that it doesn’t have a mapping for, wrenches control from the program, and instead runs the code that the OS registered for handling memory. The OS can see that this address has been swapped out, swap it back in to real RAM, tell the hardware where it now is, and then control returns to the program. The program’s none the wiser that its data wasn’t there a moment ago, and it all works. If a program messes up and tries to write to an address it doesn’t have, it doesn’t go through because there’s no mapping to a physical address, and the OS can instead tell the program “you have done very bad and unless you were prepared for this, you should probably end yourself” without any harm to others.
Memory is handed out to programs in chunks called “pages”, and the hardware has support for certain page size(s). How big they should be is a matter of tradeoffs; since pages are indivisible, pages that are too big will result in a lot of wasted space (if a program needs 1025 bytes on a 1024-byte page size system, it’ll need 2 pages even though that second page is going to be almost entirely empty), but lots of small pages mean the translation tables have to be bigger to track where everything is, resulting in more overhead.
This is starting to reach the edges of my knowledge, but I believe what this is describing is that RISC-V chips and ARM chips have the ability for the OS to say to the hardware “let’s use bigger pages than normal, up to 64k”, and the Linux kernel is getting enhancements to actually use this functionality, which can come with performance improvements. The MMU can store fewer entries and rely on the OS less, doing more work directly, for example.
Unless you are at the edge of the firmware and software, this isn’t something you work with a lot.
When you transfer files or data to a memory space, you can’t drop the whole file/data to memory directly because the resources are limited on the cpu/mcu. It wouldn’t make sense to have a page as big as your biggest theorical data size.
Page size determine how much data at a time can be transferred into memory.
In term of performance, writing the page to memory is usually the bottle neck. So 4k vs 64k means you need to write to memory 16 times more and thus making the performance better on 64k page size.
That’s more of a storage thing, RAM does a lot smaller transfers - for example a DDR5 memory has two independent 32bit (4 byte) channels with a minimum of 16 transfers in a single “operation”, so it does 64 bytes at once (or more). And CPUs don’t waste memory bandwidth than transferring more than absolutely necessary, as memory is often the bottleneck even without writing full pages.
The page size is relevant for memory protection (where the CPU will stop the program execution and give control back to the operating system if said program tries to do something it’s not allowed to do with the memory) and virtual memory (which is part of the same thing, but they are two theoretically independent concepts). The operating system needs to make a table describing what memory the program has what kind of access to, and with bigger pages the table can be much smaller (at the cost of wasting space if the program needs only a little bit of memory of a given kind).
Don’t worry, it’s quite esoteric to begin with. The only reason I can comprehend this is the years-long following news like this, on top of my computer science degree.
Also, this wouldn’t matter (yet) to your daily life.
I find it funny when people around me think I am a computer expert and I just tried to read this and couldn’t comprehend sh*t.
Back in the olden days, if you wrote a program, you were punching machine codes into a punch card and they were being fed into the computer and sent directly to the CPU. The machine was effectively yours while your program ran, then you (or more likely, someone who worked for your company or university) noted your final results, things would be reset, and the next stack of cards would go in.
Once computers got fast enough, though, it was possible to have a program replace the computer operator, an “operating system”, and it could even interleave execution of programs to basically run more than one at the same time. However, now the programs had to share resources, they couldn’t just have the whole computer to themselves. The OS helped manage that, a program now had to ask for memory and the OS would track what was free and what was in use, as well as interleaving programs to take turns running on the CPU. But if a program messed up and wrote to memory that didn’t belong to it, it could screw up someone else’s execution and bring the whole thing crashing down. And in some systems, programs were given a turn to run and then were supposed to return control to the OS after a bit, but it was basically an honor system, and the problem with that is likely clear.
Hardware and OS software added features to enforce more order. OSes got more power, and help from the hardware to wield it. Now instead of asking politely to give back control, the hardware would enforce limits, forcing control back to the OS periodically. And when it came to memory, the OS no longer handed out addresses matching the RAM for the program to use directly, instead it could hand out virtual addresses, with the OS tracking every relationship between the virtual address and the real location of the data, and the hardware providing Memory Management Units that can do things like store tables and do the translation from virtual to physical on its own, and return control to the OS if it doesn’t know.
This allows things like swapping, where a part of memory that isn’t being used can be taken out of RAM and written to disk instead. If the program tries to read an address that was swapped out, the hardware catches that it’s a virtual address that it doesn’t have a mapping for, wrenches control from the program, and instead runs the code that the OS registered for handling memory. The OS can see that this address has been swapped out, swap it back in to real RAM, tell the hardware where it now is, and then control returns to the program. The program’s none the wiser that its data wasn’t there a moment ago, and it all works. If a program messes up and tries to write to an address it doesn’t have, it doesn’t go through because there’s no mapping to a physical address, and the OS can instead tell the program “you have done very bad and unless you were prepared for this, you should probably end yourself” without any harm to others.
Memory is handed out to programs in chunks called “pages”, and the hardware has support for certain page size(s). How big they should be is a matter of tradeoffs; since pages are indivisible, pages that are too big will result in a lot of wasted space (if a program needs 1025 bytes on a 1024-byte page size system, it’ll need 2 pages even though that second page is going to be almost entirely empty), but lots of small pages mean the translation tables have to be bigger to track where everything is, resulting in more overhead.
This is starting to reach the edges of my knowledge, but I believe what this is describing is that RISC-V chips and ARM chips have the ability for the OS to say to the hardware “let’s use bigger pages than normal, up to 64k”, and the Linux kernel is getting enhancements to actually use this functionality, which can come with performance improvements. The MMU can store fewer entries and rely on the OS less, doing more work directly, for example.
Lovely write up!
This is an awesome comment and I learned a lot. Thank you for all the thought and effort you put into this!
It’s alright, I program C++ and barely understand this shit either. Kernel/OS devs are a different breed.
Unless you are at the edge of the firmware and software, this isn’t something you work with a lot.
When you transfer files or data to a memory space, you can’t drop the whole file/data to memory directly because the resources are limited on the cpu/mcu. It wouldn’t make sense to have a page as big as your biggest theorical data size.
Page size determine how much data at a time can be transferred into memory.
In term of performance, writing the page to memory is usually the bottle neck. So 4k vs 64k means you need to write to memory 16 times more and thus making the performance better on 64k page size.
That’s more of a storage thing, RAM does a lot smaller transfers - for example a DDR5 memory has two independent 32bit (4 byte) channels with a minimum of 16 transfers in a single “operation”, so it does 64 bytes at once (or more). And CPUs don’t waste memory bandwidth than transferring more than absolutely necessary, as memory is often the bottleneck even without writing full pages.
The page size is relevant for memory protection (where the CPU will stop the program execution and give control back to the operating system if said program tries to do something it’s not allowed to do with the memory) and virtual memory (which is part of the same thing, but they are two theoretically independent concepts). The operating system needs to make a table describing what memory the program has what kind of access to, and with bigger pages the table can be much smaller (at the cost of wasting space if the program needs only a little bit of memory of a given kind).
It’s called Paging. But an application programmer doesn’t really need to know how it works in precise.
its more for kernel devs and hardware architecture.
Don’t worry, it’s quite esoteric to begin with. The only reason I can comprehend this is the years-long following news like this, on top of my computer science degree.
Also, this wouldn’t matter (yet) to your daily life.