Issue #30 closed
Jan Galkowski
created an issue

These data are accessible from several sites, and are of varying vintage, including a reprocessing of data needed to fix a bug in the ARGO data handling system.

Some of the sources I have may overlap, so I am depositing them in a single large "ARGO" directory.

The original sources are:

https://www.nodc.noaa.gov/argo/

ftp://ftp.nodc.noaa.gov/nodc/archive/ (This an archive of I believe uncorrected data.)

https://www.nodc.noaa.gov/argo/accessData.htm

ftp://ftp.nodc.noaa.gov/pub/data.nodc/argo/

I know of ARGO because of my interaction and support of Woods Hole Oceanographic Institution (WHOI), which helps maintain them as well as build and launch some, and collect some data.

Late addition ftp://ftp.aoml.noaa.gov/phod/pub/ARGO_FTP/argo/.

Comments (27)

  1. Jan Galkowski reporter

    These are being done using:

    https://www.nodc.noaa.gov/argo/

    wget --dns-timeout=10 --connect-timeout=20 --read-timeout=120 --wait=5 --random-wait --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --output-file=www-nodc-noaa-gov-argo.log --no-check-certificate https://www.nodc.noaa.gov/argo/

    ftp://ftp.nodc.noaa.gov/nodc/archive/

    wget --dns-timeout=10 --connect-timeout=20 --read-timeout=120 --wait=5 --random-wait --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --output-file=ftp-nodc-noaa-gov-nodc-archive.log --no-check-certificate http ftp://ftp.nodc.noaa.gov/nodc/archive/

    https://www.nodc.noaa.gov/argo/accessData.htm

    wget --dns-timeout=10 --connect-timeout=20 --read-timeout=120 --wait=5 --random-wait --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --output-file=www-nodc-noaa-gov-argo-accessData.log --no-check-certificate https://www.nodc.noaa.gov/argo/accessData.htm

    ftp://ftp.nodc.noaa.gov/pub/data.nodc/argo/

    wget --dns-timeout=10 --connect-timeout=20 --read-timeout=120 --wait=5 --random-wait --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --output-file=ftp-nodc-noaa-gov-pub-data-nodc-argo.log --no-check-certificate ftp://ftp.nodc.noaa.gov/pub/data.nodc/argo/

  2. Jan Galkowski reporter

    https://www.nodc.noaa.gov/argo/ is completed, on azi02.

    https://www.nodc.noaa.gov/argo/accessData.htm is completed on azi02.

    lftp of ftp://ftp.nodc.noaa.gov/nodc/archive/ on azi02 and of ftp://ftp.nodc.noaa.gov/pub/data.nodc/argo/ on azi02 still in progress. Plenty of room for now on azi02, /home/jan/local_data/. These are going to its ARGO subdirectory.

    No checksums yet generated. I'll wait for all the pieces to finish.

  3. Jan Galkowski reporter

    Progress continues slowly. Now have 2435Gb. That currently breaks out as:

    [jan@azi02 ARGO]$ find . -maxdepth 1 -type d -exec nice ionice du -s -b -c --apparent-size -BG {} \;
    2435G   .
    2435G   total
    253G    ./argo.accessData
    253G    total
    1G      ./data.nodc.argo
    1G      total
    1791G   ./nodc.archive
    1791G   total
    392G    ./nodc.noaa.gov-argo
    392G    total
    [jan@azi02 ARGO]$
    
  4. Jan Galkowski reporter

    This has been organized to ftp://ftp.aoml.noaa.gov/phod/pub/ARGO_FTP/argo/. It's not clear what happened to the archives. I am attempting to get a sizing of the new section, now.

  5. Jan Galkowski reporter

    Permanent home for the ARGO data will be pub04 and 3 Tb has been allocated for the purpose. After the copy completes, I will rsync the data to /var/local/jan/ARGO/ on pub04, and then finish up the SHA sums and so on.

  6. Jan Galkowski reporter

    ARGO status:

    [jan@azi02 ARGO]$ sudo find . -maxdepth 1 -type d -exec nice ionice du -s -b -c --apparent-size -BG {} \;
    2278G   .
    2278G   total
    1803G   ./nodc.archive
    1803G   total
    393G    ./nodc.noaa.gov-argo
    393G    total
    83G     ./phod-ARGO_FTP-argo
    83G     total
    
  7. Jan Galkowski reporter

    SHA sums done. It took a full day!

    [jan@azi02 ARGO]$ ls -lt
    total 3451584
    -rw-rw-r-- 1 jan jan 2072579652 Apr 14 02:00 ARGO.sha512.txt
    -rw-rw-r-- 1 jan jan 1461829572 Apr 13 17:20 ARGO.sha256.txt
    drwxrwxr-x 5 jan jan       4096 Apr 13 07:37 2017-04-13T0736
    
  8. Jan Galkowski reporter

    rsync completed:

    4/16/2017 12:29:07 AM
    
    [jan@pub04 ARGO]$ nice ionice find . -type f -print | wc -l
    9542972
    
    [jan@azi02 ARGO]$  nice ionice find . -type f -print | wc -l
    
    9542972
    [jan@azi02 ARGO]$
    [
    
  9. Log in to comment