This has been completed and is being moved to /var/local/pub on pub04. SHA sums are being attached.
Two points, because this unit of the DOE and NASA is directly under threat of abolishment by both administration and the Republican controlled Congress in their draft proposal for DOE funding, this was started early in our process. Because of that it was buffeted by all the growing pains and reorganization of files and disks and servers.
Originally there are two tickets, one to preserve the Web site, and this Issue #27 which was to preserve the data. As experience grew with this dataset and government data sites in general, keeping these apart seemed silly, so they are included all together in one directory in /var/local/pub.
There is a possibility that some of the data at https://daac.ornl.gov/get_data.shtml was missed, due to a report at the ClimateMirror issues. I don't really know any way of checking this apart from sizes and then comparing directories to see what was gotten and what not. Also, while httrack appears to do a better job of mirroring the structure of a Web site, it does not do as well doing the --follow-ftp that wget does. So I'm doing a wget and comparing sizes, on pub04. If this were an FTP pull, I could du -s -c -b but it is not. And I do not have a technique for estimating the size of a Web site.
But there's no hint in the HTML of the page of a directory that one can go and get all this from, at least that I could find. (I used Chrome's developer tools raw HTML to look.) What there is is a PERL script, which downloads given a parameter:
I was trying to get sizes of the remnant up to Ben's 18 Tb using du -s -b -c on azi03 and apparently the connection has either been blocked or throttled. I went in using pub05 and it worked fine. Here's what I learned so far about the /pub subdirectory on eclipse, at the same level as /cdr:
lftp eclipse.ncdc.noaa.gov:/pub> du -s -c -b ./gacp
lftp eclipse.ncdc.noaa.gov:/pub> du -s -c -b ./ibtracs
./ibtracs/v03r03/all/shp/storm: Getting directory contents (12521795)[Waiting for response...]
even without offering a username or password, I get in the download file:
[jan@azi03 cdiac]$ cat download.pl\?ds_id\=818
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<p>The document has moved <a href="http://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=818">here</a>.</p>
<address>Apache Server at daac.ornl.gov Port 443</address>
If I try wget with a username and password, I get the same thing.
I also tried Lynx (!). The server doesn't quite know what to do with it.