Issue #6 resolved

NOAA NCEP/NCAR Reanalysis FTP site

Jan Galkowski
created an issue

Downloaded via `lftp' from:

ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/

Uploading to:

azi01:/home/jan/local_data/ncep.reanalysis.ftp

Comments (36)

  1. Bryce A. Lynch

    Archive size (output of du -sh): 358 gigabytes

    No transfer log, didn't generate one and don't want to restart the mirroring process (it took nearly seven days to complete).

    Suggestion for comparing completeness and accuracy of mirrors:

    • Download the SHA512SUMS file I've attached to this ticket.
    • Copy into the root directory (in this case, ftp.cdc.noaa.gov/) of your mirror.
    • Execute the command sha512sum -c SHA512SUMS, which will compare the SHA-512 hashes of your files against those in the file. Any which don't match will be flagged as incorrect. This means that your files don't match mine. This doesn't mean that mine are right and yours are wrong, it means that they don't match. We'll need to figure out what to do about that later.
  2. Jan Galkowski reporter

    This is important, this dataset. I'm continuing mine now, and hope to get a match with Bruce's work. This will also serve as a point estimate of how repeatable our process is. Partly done.

  3. Jan Galkowski reporter

    I have completed mine, except that I used the Azimuth Backup Project default of doing SHA256 sums and not SHA512. Going back and redoing those.

    My uploads are in /home/maxwell/local_data/jan_spillover/ncep_reanalysis_ftp.

    Also, I only got 229 GB.

  4. Jan Galkowski reporter

    Turns out, the wget missed a bunch of the NetCDF files, for some reason. I am re-downloading them to my workstation using pure FTP which seems to be going well, and then will upload to the server, and recalculation the SHA512 sums.

  5. Jan Galkowski reporter

    Moving this to /media/datarefuge/ to make some space on my /home/jan/local_data/.

    (This is obsolete. Now lives in /home/maxwell/local_data/jan_spillover/.)

  6. Jan Galkowski reporter

    Both my /media/jan-one/ and /home/jan/local_data/ are getting kinda full, especially /media/jan-one/. Any possibility of tossing more storage there? I'd go to another box, but there are wgets running heading there, and don't want to interrupt. I could of course, after taking some time to copy things off.

  7. Sakari Maaranen

    Ahh, they were old comments. You currently have available capacity:

    4.5T free in jan-one
    5.5T free in datarefuge
    2.0T free in azi01:~jan/local_data
    3.8T free in azi02:~jan/local_data
    
  8. Sakari Maaranen

    Our copy of this data set on datarefuge may be corrupted.

    du: cannot read directory ftp.cdc.noaa.gov.Projects.Datasets.ncep.reanalysis_o/Datasets/ncep.reanalysis/pressure: Permission denied
    du: cannot read directory ftp.cdc.noaa.gov.Projects.Datasets.ncep.reanalysis_o/Datasets/ncep.reanalysis/spectral: Permission denied
    du: cannot read directory ftp.cdc.noaa.gov.Projects.Datasets.ncep.reanalysis_o/Datasets/ncep.reanalysis/other_gauss: Permission denied
    

    The same command was able to read other directories on datarefuge just fine.

  9. Jan Galkowski reporter

    These were being copied when datarefuge went offline, so I am not surprised. Because I was not sure if I could get them back, I restarted the copy, which is in progress on azi01.

    I would say lose them on /media/datarefuge/ as at best they are questionable and at worst lost.

  10. Jan Galkowski reporter
    • edited description

    Download completed. Calculating SHA sums. Bruce and I each did this, and that was continued to try to ascertain how much variability there might be among different people downloading the same thing.

    At some point this should be revisited and we should get a unique set of files. I don't mind as these reanalysis data are critically important. The main datafiles are in subdirectories:

    drwx------ 2 jan jan    4096 Jan  7 04:33 www.esrl.noaa.gov
    drwxrwxr-x 2 jan jan  135168 Dec 27 14:48 surface_gauss
    drwxrwxr-x 2 jan jan   36864 Dec 27 14:47 surface
    drwxrwxr-x 2 jan jan   36864 Dec 27 14:46 other_gauss
    drwxrwxr-x 2 jan jan   20480 Dec 27 14:46 pressure
    drwxrwxr-x 2 jan jan    4096 Dec 27 14:45 tropopause
    drwxrwxr-x 2 jan jan   20480 Dec 27 14:43 spectral
    

    and there are replicas in the subdirectories Datasets and old.

  11. Jan Galkowski reporter

    Download completed, but there may be redundant information here. But we do not need access to the original to ascertain that. Accordingly, postponing until later. Can use the present time to download more.

  12. Log in to comment