See linked posting. I’ve commented there with a link to a CLI tool in Python that allows downloading of IA collections. I’ve submitted a patch to enable specifying start and end points so that it’s easier to resume downloading a huge collection, or to allow multiple people to split up the work.
https://archive.org/details/georgeblood
https://archive.org/details/78rpm_bowling_green
F*ck the RIAA and absurdly long copyright.
EDIT: There is more than one collection of 78s on IA, so I updated the title.
The issue with these collections are that they’re absolutely HUGE. And yes, IA offers torrents for them, but as a separate torrent for every. single. album. And the torrents have all data in them – FLAC, fixed-rate MP3, VBR MP3, PDF liner notes, etc. etc… there may be some extremely hardcore data-hoarders out there who want everything, but IMHO as these are scratchy old 78 records, FLAC is overkill to just save the audio in a listenable format. The George Blood collection, just the VBR MP3s, is looking to be about 6TB. With ALL data it might be over 40TB! I can’t afford that many hard drives :)
So, my approach at the moment is to save just the VBR MP3s (they seem to be done at up to 320kbps VBR) and the JPEG album cover. If I have a chance and any storage left afterwards, I can make a separate pass to get the album liner PDFs…
Tool used: https://github.com/jjjake/internetarchive
Patch to allow setting start and end item indices for downloads: https://github.com/jjjake/internetarchive/pull/605
Example usage to grab just the VBR MP3 and record label JPG for each (note the --start-idx and --end-idx arguments):
#ia download --start-idx=4001 --end-idx=8000 -a -i --format="VBR MP3" --format="JPEG" --search collection:georgeblood
I’m going to concentrate on the George Blood collection for now… I’m starting at item 1. It would be great if others started at index 50,000, 100,000, 150,000, … and others started at the end and worked backwards in similarly-sized chunks, so that it’s assured someone gets each of them.
deleted by creator
Yeah, you’re right, Fuck em.
FYI I’m currently on 4001-8000 of the ‘Great 78 Collection’. Looks like I’ll need about 6TB to get it all, yikes! (Just the VBR MP3 files, not the FLACs. Holy Hell.)
collection:georgeblood
https://archive.org/details/georgeblood
If everyone would take blocks of it, say 4000 each, we can eventually create torrents for each one or something so it can all be reassembled if/when the IA has to take it down.
Yup. Torrents are the way forward to archive such collection.
I wish the IA would offer a torrents of the overall collection but it’s over 400k separate torrents, one for each album. And they contain FLACs, fixed- and VBR MP3s, PDF jacket notes, JPGs … it’s just too much for one person (I am OK with buying an 8TB drive or two, but not a dozen!)
I’m trying to at least grab the VBR MP3s (these are old scratchy records after all… I don’t know how much FLAC will really preserve). Maybe if I can get most of those, I’ll do a second pass and get the album cover JPGs, then liner PDFs… depending on if/how long the collection stays up.
The IA has torrents for everything they upload already.
Normally I would just fetch the torrent, yes, but this particular collection is huge – over 400k separate items (which on IA be their own torrents). Is there a way to get an aggregate, but filtered, torrent with just, say, the album jpg and VBR mp3 files for each? I don’t think I can afford the entire collection as each also has the FLACs.
Removed by mod
around 5500… gonna take a while. My ISP says there’s no monthly cap but I wonder if I really should dl this much…