Issue #44 on hold
Marc Rosen
created an issue

i'm looking into the data sets to bring down, at the moment.

Comments (6)

  1. Marc Rosen reporter

    To reiterate what I said in my email, globalchange.gov, itself, does not host datasets. Rather, it links to datasets hosted on other websites. Because space is an issue, I said that it would be sensible to therefore not download the datasets from gobalchange.gov, since they should have already been downloaded. That is what I was referring to in what you quoted.

    Additionally, as I explained via email, what globalchange.gov really has to offer is on data.globalchange.gov. This website provides graph data linking datasets with authors and models and other attributes, and provides a graph query interface for them. For this reason, as I explained via email, simply using wget to mirror the website will still end up losing most of the data that this website has to offer. To that end, I have sent emails to globalchange.gov asking if it would be possible to get a database dump of their website, so that we could make a fully-functioning clone, without losing any data. I have not yet received a response from them, however.

  2. Greg Kochanski

    I have a pretty good image: 11 Gb; 201377 files. I terminated it, but not until it was down in the weeds of very repetitive accesses of the same files through slightly different paths.

  3. Greg Kochanski

    I can chug on it more, but it was getting URLs analogous to this one: "toolkit.climate.gov/reports?f[0]=field_state:West Virginia&f[1]=field_state:Missouri&f[2]=field_state:Iowa"

    Since it's a site that is dynamically generated from a database, one never knows whether the set of URLs is infinite. You can always add f[3] and f[4], and f[5], etc.

    When I come back from work, I'll take a closer look at it to see if there's a reasonable hope that it's finite. (And the *.climate.gov site is suffering from the same problem; for the last day or more, it's been -- apparently -- finding many ways to reveal the same set of reports.)

  4. Log in to comment