Apologies for the late response, I was on vacation and offline for the most part. I have made any judgement as to the importance or risk of loss of this (or any) data set. I simple picked something from the spreadsheet and run a wget command to mirror it. It's rather dumb mirror to boot because it is simply crawling the HTML website. The process completed and it filled 4.8G, but sorting out through the files downloaded is beyond my paygrade. These backups are kind of "panic mode backups". Sites like this are designed for human consumption where data is searched based on parameters for example. So the so called "deep web" part of them, what sits in database systems and what actually matters, remains invisible and inaccessible. Perhaps in parallel, we should be reaching out to each of those organization and working with them at an IT technical level on how to backup and decentralized their data. It will be very hard to establish some sort of a continuous working relationship because that will require authorization from management and various officials possible. However, advising us how to easily get to the data might be possible and it will make our time and resources way more worthwhile.
@John_Baez @Joos-gcv @marsroverdriver @Ronowlzsky, please make a clear decision: Do we keep publishing this or do we delete this? All relevant information is above. I am not a climate expert and cannot verify myself.
Delete for space? It's just 224 megabytes, right? I'd think that counts as "negligible" if the only issue is space. Reading the discussion above, it sounds like the real problems are:
1) Borislav wrote: " No idea how much of the actual important/useful data was picked up."
2) Jan wrote: "I seriously doubt the USGS water network is at risk."
Is that right? The problem is that we're not confident we've got a complete backup and we think maybe this site is not so urgent anyway?
@Joos-gcv and @sakaal - given all I've heard, I'm happy to let you delete this batch of data. It seems 1) likely to be fragmentary, 2) not very high priority, 3) a bit of a nuisance to figure out.