NOAA Climate Program Office cpo.noaa.gov

Issue #111 closed
Jan Galkowski
created an issue

cpo.noaa.gov: Web site

Principal product is the U.S. National Climate Assessment, as part of the U.S. Global Change Research Program.

Intending to use:

httrack "http://cpo.noaa.gov" -O . -i --mirror --depth=8 --ext-depth=3 --max-rate=100000000 %c500 --sockets=30 \
        --retries=30 --host-control=0 TN 60 --near --robots=0 %s

Comments (10)

  1. Jan Galkowski reporter

    Revised copy to:

    httrack "http://cpo.noaa.gov" -O . -i --mirror --depth=6 --ext-depth=1 --max-rate=100000000 %c500 --sockets=30 \
            --retries=30 --host-control=0 TN 60 --near --robots=0 %s
    

    and doing it on azi02 under /home/jan/local_data/cpo.noaa.gov/DUMMY-TIMESTAMP: httrack apparently loads copies of external site contents at the same level (here) as ./cpo.noaa.gov, which makes for a mess to clean up afterwards.

  2. Jan Galkowski reporter

    18 Gb final. Calculating SHA256 and SHA512 checksums. This httrack run went much better than the last, which ran all over creation spidering, and crashed in the end.

  3. Jan Galkowski reporter

    On pub04, trying to move /var/local/jan/cpo.nasa.gov to /var/local/pub/ and failing with: [jan@pub04 jan]$ mv ./cpo.noaa.gov /var/local/pub/ mv: cannot move ‘./cpo.noaa.gov’ to ‘/var/local/pub/cpo.noaa.gov’: Permission denied I don't dig this. Why?

    Cannot close ticket, because.

  4. Sakari Maaranen

    I changed to your user account @Jan Galkowski and was able to:

    cd /var/local/jan/
    mv cpo.noaa.gov /var/local/pub/
    

    I don't have time to investigate why you had problems. Also, please consider using a similar naming convention as others on that server. It already contains other NOAA data sets, so it is more user friendly, if their naming follows a similar logic. There's nothing wrong with your naming -- it's just different from the rest.

  5. Jan Galkowski reporter

    These are not problems, actually, since the diff claims the filesets are identical.  I was simply trying to understand why rsync might be producing a different du than a tar, rsync of tarball, and un-tar​:

    [jan@pub04 jan]$ nice ionice du -s -b -c ./foo/cpo.noaa.gov/ &
    [1] 7003
    [jan@pub04 jan]$ 19090384384    ./foo/cpo.noaa.gov/
    19090384384     total
    [jan@pub04 jan]$ nice ionice du -s -c -b  /var/local/pub/cpo.noaa.gov
    19090404864     /var/local/pub/cpo.noaa.gov
    19090404864     total
    
    [jan@pub04 jan]$ diff -r --brief ./foo/cpo.noaa.gov /var/local/pub/cpo.noaa.gov
    
    [jan@pub04 jan]$
    [jan@pub04 jan]$
    

    I thought there might be some curiosity about this. If such discrepancies are not well known to experts, I simply declare them to be gremlins and will ignore them, since the diff is proof enough of data integrity.

  6. Log in to comment