Technology

37608 readers

184 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

[email protected]

191

The Internet Is Failing The Website Preservation Test (archive.ph)

submitted 1 year ago by [email protected] to c/[email protected]

86 comments fedilink hide all child comments

This is something that keeps me worried at night. Unlike other historical artefacts like pottery, vellum writing, or stone tablets, information on the Internet can just blink into nonexistence when the server hosting it goes offline. This makes it difficult for future anthropologists who want to study our history and document the different Internet epochs. For my part, I always try to send any news article I see to an archival site (like archive.ph) to help collectively preserve our present so it can still be seen by others in the future.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (8 children)

Ultimately this is a problem that’s never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn’t matter how careful you are, doesn’t matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.

There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that’s note enough either, as it provide no good way to update a resource.

The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.

The end result should be “Internet as globally distributed immutable data structure”.

Bit frustrating that this whole problem isn’t getting the attention it deserves. And that even relatively new projects like the Fediverse aren't putting in the extra effort to at least address it locally.

[–] [email protected] 1 points 1 year ago (1 children)

even beyond what you said, even if we had a global blockchain based browsing system, that wouldnt make it easier to keep the content ONLINE. If a website goes offline, the knowledge and reference is still lost, and whether its a URL or a blockchain, it would still point towards a dead resource.

[–] [email protected] 1 points 1 year ago (1 children)

It would make it much easier to keep content online, as everybody could mirror content with close to zero effort. That's quite opposite to today where content mirroring is essentially impossible, as all the links will still refer to the original source and still turn into 404s when that source goes down. That that file might still exist on another server is largely meaningless when you have no easy way to discover it and no way to tell if it is even the right file.

The problem we have today is not storage, but locating the data.

[–] [email protected] 1 points 1 year ago

Why would people mirror some body else's stuff?

Maybe youd personally do a small number of things if you found it interesting, but i dont see that being very side scale.

load more comments (6 replies)