this post was submitted on 23 Oct 2024
10 points (100.0% liked)

News from fediverse

0 readers
2 users here now

founded 9 months ago
MODERATORS
 

The recent downtime of the #InternetArchive reminded me

(a) How vital the site is for my own work. Fortunately, I save pretty much all old books I need for my work to my hard drive, so I am not totally lost without it - but still, most of the links to the individual folk tales I am translating go to online archives, and the Internet Archive is the most important among them.

(b) How storing all this vital cultural heritage stuff at one single site is a terrible idea. Today, the Internet Archive might be taken down by hackers. Tomorrow, the site might commit suicide by lawyers. And in a possible future, a fascist US government might take the site down out of sheer spite.

While there are a fair number of other, more specialized digital libraries out there, too many public domain works are only available at the Internet Archives. And another huge percentage is stored only at the Internet Archive and Google Books, which is not a lot better.

We need a more distributed archive system where all these works can stored on multiple servers around the world - yet where users can search through all of them with comparable ease. Only in this way will our digital cultural heritage be truly safe.

Perhaps a #Fediverse - based approach could work? Something like #Bookwyrm , but with actual data storage?

What do you think?

top 7 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 4 days ago
[–] [email protected] 1 points 4 days ago (1 children)

@juergen_[email protected] HathiTrust has a fair number of public domain books

[–] [email protected] 1 points 4 days ago

@petes_[email protected] True, but they also have geographic restrictions on many of them.

[–] [email protected] 1 points 5 days ago* (last edited 5 days ago)

@juergen_hubert weren't early music and film 'sharing' schemes based on distributed storage via torrents or such of files on individuals' hard drives?

[–] [email protected] 1 points 5 days ago (1 children)

@juergen_[email protected] Storing is easy, the problem you face is organization: You will need to find the books you are looking for. That requires a search index, and probably some kind of tags. OpenLibrary has a thousands of volunteers for this and most of their entries are still rather bare-bones. I think this requires a more institutionalized approach than BookWyrm (which draws data from institutions like OpenLibrary and Worldcat).

[–] [email protected] 1 points 5 days ago (1 children)

@[email protected] Are there any good models for distributed search functions out there?

[–] [email protected] 1 points 5 days ago

@juergen_[email protected] The only one I know is YaCy* https://yacy.net/, but I have never tried it.

Theoretically, you could build an engine with the Apache Lucene environment, and have its crawler component based on some kind of P2P networks, but that requires a whole lot of specialized technically expertise and probably cannot be done outside of major support infrastructure.