USGS Waterdata

Create issue
Issue #43 wontfix
Borislav Iordanov created an issue

http://waterdata.usgs.gov/

Comments (10)

  1. Borislav Iordanov reporter

    Jan,

    Apologies for the late response, I was on vacation and offline for the most part. I have made any judgement as to the importance or risk of loss of this (or any) data set. I simple picked something from the spreadsheet and run a wget command to mirror it. It's rather dumb mirror to boot because it is simply crawling the HTML website. The process completed and it filled 4.8G, but sorting out through the files downloaded is beyond my paygrade. These backups are kind of "panic mode backups". Sites like this are designed for human consumption where data is searched based on parameters for example. So the so called "deep web" part of them, what sits in database systems and what actually matters, remains invisible and inaccessible. Perhaps in parallel, we should be reaching out to each of those organization and working with them at an IT technical level on how to backup and decentralized their data. It will be very hard to establish some sort of a continuous working relationship because that will require authorization from management and various officials possible. However, advising us how to easily get to the data might be possible and it will make our time and resources way more worthwhile.

  2. Borislav Iordanov reporter

    Website was backed up by crawling with 'wget'. No idea how much of the actual important/useful data was picked up.

  3. Sakari Albert Maaranen

    What do you mean we don't have the capacity or time, @Joos-gcv ? Borislav said it's already complete. And it's small.

    [sam@pub05 USGS]$ du -sbc *
    224431205   waterdata.usgs.gov
    

    I have published this at:

    pub05:/var/local/pub/USGS/2017-01-04_i43/waterdata.usgs.gov
    
  4. Sakari Albert Maaranen

    @John_Baez @Joos-gcv @marsroverdriver @Ronowlzsky, please make a clear decision: Do we keep publishing this or do we delete this? All relevant information is above. I am not a climate expert and cannot verify myself.

  5. John Baez

    Delete for space? It's just 224 megabytes, right? I'd think that counts as "negligible" if the only issue is space. Reading the discussion above, it sounds like the real problems are:

    1) Borislav wrote: " No idea how much of the actual important/useful data was picked up."

    2) Jan wrote: "I seriously doubt the USGS water network is at risk."

    Is that right? The problem is that we're not confident we've got a complete backup and we think maybe this site is not so urgent anyway?

  6. John Baez

    @Joos-gcv and @sakaal - given all I've heard, I'm happy to let you delete this batch of data. It seems 1) likely to be fragmentary, 2) not very high priority, 3) a bit of a nuisance to figure out.

  7. Sakari Albert Maaranen

    Deleted.

    [sam@pub05 USGS]$ pwd
    /var/local/pub/USGS
    [sam@pub05 USGS]$ du -sbcBM 2017-01-04_i43
    215M    2017-01-04_i43
    215M    total
    [sam@pub05 USGS]$ sudo chmod -R o+w 2017-01-04_i43
    [sam@pub05 USGS]$ sudo rm -rf 2017-01-04_i43
    [sam@pub05 USGS]$ cd ..
    [sam@pub05 pub]$ rmdir USGS
    
  8. Log in to comment