- cross-posted to:
- piracy@lemmit.online
- cross-posted to:
- piracy@lemmit.online
When will Anna fly too close to the sun?
We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB), grouped by popularity.
This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs.
It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.



Any torrent client released in the past two decades can prioritise blocks by file path in the torrent, assuming these are all loose files organised by album or whatever then it’s easy to pick just the stuff you want.
That’s not how it works. The blocks are set by the torrent file author, not the downloader’s torrent client. The .torrent file (or torrent metadata if you use a magnet) contains a set list of blocks and every block has a fixed size (and a file hash checksum for the block). It is impossible for blocks to perfectly match file lengths because blocks have a fixed size and files have varying sizes. If you download 1 file, you will get partial chunks of other files. It is impossible to not get chunks of other files.
Example, if the torrent block size is 4 MB and the file you want is 82 MB. 82 is not divisible by 4. So there will be 1 or 2 or more blocks which contain data from parallel files. You might say “well I’ll delete those parallel files” from my computer. If you plan to seed the torrent, you can’t delete the parallel files because then you won’t have the full block to seed to other people.
Large collections in torrent format always have the problem that most people only want 1% of the file, but it’s never been an issue on any private trackers I’ve been on due to seedtime incentives. Most places don’t really like hit n runs but that’s kinda hard to enforce with a public torrent, but on the flip side the vast majority of leeches will only be grabbing an incredibly inconsequential amount of data. Anyone with 300tb of data on their seedbox accepts they’re gonna get hit and ran by the vast majority of people.
As someone with a meager 80tb data center I will bear the cross of hosting this data for all the young Me’s who want to play Nintendo DS games they can’t afford, nor seed.
Thanks for the explainer but nothing you’ve said contradicts my statement or really matters. Yes you’ll end up with stray blocks but it’s the difference between fetching the whole thing vs an extra hundred megabytes so the difference is moot for the average user.