- cross-posted to:
- piracy@lemmit.online
- cross-posted to:
- piracy@lemmit.online
When will Anna fly too close to the sun?
We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB), grouped by popularity.
This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs.
It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.



This is a cool stunt but I wonder about the practicality.
It says that the music will be distributed in bulk torrents. This makes it virtually impossible to access the songs or albums individually. The way that a torrent file works is that the release is broken up into blocks, sometimes the torrent blocks contain data from multiple files. If you download the blocks from 1 album you want, the blocks will also contain data from other albums you may not want.
Also who will seed 300 tb of mostly unpopular music on public torrents?
As for the quality of the release, it says that files are in 160k vorbis format if spotify gave the song a ‘popular’ score. If the song does not have a popular score, the 160k vorbis file was convert to 75k opus file. A lossy to lossy conversion can introduce artifacts. The 160k vorbis file is identical to the spotify quality, but lower quality than a CD. The quality is probably similar to a ~256k mp3.
Any torrent client released in the past two decades can prioritise blocks by file path in the torrent, assuming these are all loose files organised by album or whatever then it’s easy to pick just the stuff you want.
That’s not how it works. The blocks are set by the torrent file author, not the downloader’s torrent client. The .torrent file (or torrent metadata if you use a magnet) contains a set list of blocks and every block has a fixed size (and a file hash checksum for the block). It is impossible for blocks to perfectly match file lengths because blocks have a fixed size and files have varying sizes. If you download 1 file, you will get partial chunks of other files. It is impossible to not get chunks of other files.
Example, if the torrent block size is 4 MB and the file you want is 82 MB. 82 is not divisible by 4. So there will be 1 or 2 or more blocks which contain data from parallel files. You might say “well I’ll delete those parallel files” from my computer. If you plan to seed the torrent, you can’t delete the parallel files because then you won’t have the full block to seed to other people.
Large collections in torrent format always have the problem that most people only want 1% of the file, but it’s never been an issue on any private trackers I’ve been on due to seedtime incentives. Most places don’t really like hit n runs but that’s kinda hard to enforce with a public torrent, but on the flip side the vast majority of leeches will only be grabbing an incredibly inconsequential amount of data. Anyone with 300tb of data on their seedbox accepts they’re gonna get hit and ran by the vast majority of people.
As someone with a meager 80tb data center I will bear the cross of hosting this data for all the young Me’s who want to play Nintendo DS games they can’t afford, nor seed.
Thanks for the explainer but nothing you’ve said contradicts my statement or really matters. Yes you’ll end up with stray blocks but it’s the difference between fetching the whole thing vs an extra hundred megabytes so the difference is moot for the average user.
I think the utility is for preservation, not necessary easy access. Presumably some folks will download and seed those torrents. If I need a song from it 5 years from now and can’t find it anywhere else, I can find the torrent that contains it and download just that song. It’s a hassle, but it’s totally possible to download 1 file from a torrent.
assuming they’re not in weird archives. when i used to torrent stuff there were a lot of usenet uploads that were split into 58 zip files
Huh, when I read the press release I thought it was saying they were going to break them down into smaller chunks sorted by popularity
The article says that they archived 256 million songs. The article says that they want to distribute the 256 million songs in a list of “bulk torrents”. The article says the purpose of the project is to create an “authoritative list of torrents aiming to represent all music ever produced”,
If you have 256 million files and your goal is to have a list of bulk torrents. How many songs would you have per torrent? If you put 1 million songs per torrent, it would still be 256 torrents, over 1 tb each in size.
A torrent client can only handle 500-2000 torrents before it starts going wonky, depending on which client you use. If they split the 250 million songs into 2000 torrents, it would be 100k songs per torrent.
Currently on the Anna’s archive website, it only has the torrent for the metadata for the songs. The metadata torrent for the 256 million songs is 200 gb on it’s own. This is only the text data for the songs.
Each torrent will likely be hundreds of gigabytes.
yea, I guess I was thinking 1tb was at least a manageable size for a homebrewer, especially if you were only interested in archiving the top 1 or 2 million songs.
It’s still not a casually sized torrent by any means, but it’s a lot more manageable than 300tb. If you’re someone who wants to archive that much media, you probably have enough tech literacy to manage several torrent clients. The article also says they’re willing to make individual songs available on their website, “if people are interested in it”.