Wikipedia sure is popular. The most popular articles in a given week routinely get millions of views. But with 6 million plus articles, Wikipedia has plenty ...
Really enjoyed the read. Thanks for sharing. I’m surprised by the random page implementation.
Usually in a database each record has an integer primary key. The keys would be assigned sequentially as pages are created. Then the “random page” function could select a random integer between zero and the largest page index. If that index isn’t used (because the page was deleted), you could either try again with a new random number or then march up to the next non empty index.
For choosing an article, it would be better to just pick a new random number.
Although there are probably more efficient ways to pick a random record out of a database. For example, by periodically reindexing, or by sorting extant records by random (if supported by the database).
Really enjoyed the read. Thanks for sharing. I’m surprised by the random page implementation.
Usually in a database each record has an integer primary key. The keys would be assigned sequentially as pages are created. Then the “random page” function could select a random integer between zero and the largest page index. If that index isn’t used (because the page was deleted), you could either try again with a new random number or then march up to the next non empty index.
Marching up to the next non-empty key would skew the distribution—pages preceded by more empty keys would show up more often under “random”.
Fun fact, that concept is used in computer security exploits: https://en.wikipedia.org/wiki/NOP_slide
For choosing an article, it would be better to just pick a new random number.
Although there are probably more efficient ways to pick a random record out of a database. For example, by periodically reindexing, or by sorting extant records by random (if supported by the database).