#48, the data and datasets from the U.S. Energy Information Administration were reported as replicated and saved. An open question for The Azimuth Backup Project and for ClimateMirror (or Datarefuge) in general is how to respond to the case where government sites continue to operate, uninterrupted or at diminished capacity, but they do not "go dark."
Even if such sites were, at some point, to be shut down, in part or en masse, we do not know when that might be. Aiming to be complete with data copy by the Inauguration Friday is (a) optimistic, and (b) may not be a realistic read of the incoming administration's priorities. Accordingly, for the datasets we have captured, speaking for the overall, there is a gap between what has been saved and whatever else is added.
Updating our sets to reflect additions is a challenging technical problem. We don't receive any notice of updates in the disciplined matter that, for instance git provides. There is no mechanism for pushing change notices using RSS or Atom feeds. Accordingly, this mechanism will need to be invented, and will need to be able to distinguish between changes, additions, and faults in our initial data collection which are remedied by a second try.
To do this will require a test platform. I am proposing that the U.S. EIA data we have be such as platform, and we initially limit our efforts to testing against one element of it, namely, the state-level carbon dioxide emissions dataset, something which is updated about every 9 months. An update, which I believe we did not capture, is available at:
I do not know how sophisticated we want to get to do this, or what kind of funding we might be able to raise to take it one. Surely there are powerful idea for monitoring a set of documents, in the cloud, say, checking for duplicates or updates. Our datasets are not organized, at present, in such a manner.
I do not know if doing a differential walk of a site against our store makes sense, or how we would deal with a wholesale reorganization of data at a site.
These are interesting things to think about, and this task exists to encapsulate this effort.