Topic of interest: Collaborative Caching and “DocsBox”

We talked about how difficult it would be to sail from Hacker Beach to Darwin (Australia), and the hardest part would obviously be not being within wifi range for two weeks. So that lead to the topic of collaboratively scraping the web, both in the interest of decentralization and resilience, and for offline use of reference material. The English-language wikipedia dump is less than 10Gb, meaning one off-the-shelf hard disk can store reference material 100x the size of wikipedia. This could probably include all the educational material necessary to self-study through primary school, secondary school, and university for all degrees a big university would typically offer.

In particular, I would like to have a copy of Mozilla Developer Network, the nodejs API reference, and StackOverflow for offline use. It is not a good idea if everybody scrapes all these websites just for themselves, so it makes sense to construct a sort of ‘docs box’ that is capable of exchanging data with other docs boxes. We of course talked about the optimal algorithm for updating such a distributed database. 🙂

This morning I had a look and it turns out that at least wikipedia and stackoverflow are available as data dumps over bittorrent. Stackoverflow even provides an rss feed of their data dump torrents. So given that this is already being used for that purpose, our docs box should probably just seed all these data dump torrents. That way, if two docs boxes are put into the same LAN, they will automatically exchange missing blocks from all the torrents they are downloading.

We could set up a Docs Box in each hackerspace so that nomadic hackers can refresh their Docs Boxes efficiently whenever they pass through one. Especially for first-time use, when you need to get a Terabyte of documents onto it, you would just have to leave it plugged in overnight at a hackerspace.

It would also be cheap to donate such Docs Boxes to for instance schools in remote villages, as a sort of combination between the Khan Academy project and the Hole-in-the-Wall project.

Assuming a 1Tb size, we calculated that it would take about a month to create a DocsBox using only  bittorrent client and a standard internet connection. If there is another DocsBox on the same (wired) LAN, then you could probably do it overnight.

One thought on “Topic of interest: Collaborative Caching and “DocsBox”

Comments are closed.