How to archive a website in a future-proof way (involves PDF hybrid)

evenwicht@lemmy.sdf.org · edit-2 4 months ago

How to archive a website in a future-proof way (involves PDF hybrid)

evenwicht@lemmy.sdf.org · edit-2 4 months ago

IIUC you are referring to this extension, which is Firefox-only (~~like~~unlike the save page WE, which has a Chromium version).

Indeed the beauty of ZIP is stability. But the contents are not. HTML changes so rapidly, I bet if I unzip an old MAFF file it would not have stood the test of time well. That’s why I like the PDF wrapper. Nonetheless, this WebScrapBook could stand in place of the MHTML from the save page WE extension. In fact, save page WE usually fails to save all objects for some reason. So WebScrapBook is probably more complete.

(edit) Apparently webscrapbook gives a choice between htz and maff. I like that it timestamps the content, which is a good idea for archived docs.

(edit2) Do you know what happens with JavaScript? I think JS can be quite disruptive to archival. If webscrapbook saves the JS, it’s saving an app, in effect, and that language changes. The JS also may depend on being able to access the web, which makes a shitshow of archival because obviously you must be online and all the same external URLs must still be reachable. OTOH, saving the JS is probably desirable if doing the hybrid PDF save because the PDF version would always contain the static result, not the JS. Yet the JS could still be useful to have a copy of.

(edit3) I installed webscrapbook but it had no effect. Right-clicking does not give any new functions.

smpl · 4 months ago

deleted by creator

evenwicht@lemmy.sdf.org · 4 months ago

In principle the ideal archive would contain the JavaScript for forensic (and similar) use cases, as there is both a document (HTML) and an app (JS) involved. But then we would want the choice whether to run the app (or at least inspect it), while also having the option to offline faithfully restore the original rendering. You seem to imply that saving JS is an option. I wonder if you choose to save the JS, does it then save the stock skeleton of the HTML, or the result in that case?

smpl · 4 months ago

deleted by creator

How to archive a website in a future-proof way (involves PDF hybrid)

How to archive a website in a future-proof way (involves PDF hybrid)

MAFF (a shit-show, unsustained)

MHTML (shit-show due to non-portable browser-dependency)

PDF (lossy)

PDF+MHTML hybrid

We need to evolve

(update) The goals