Issue #29 resolved
John Baez
created an issue

This seems like an important database that nobody has attempted to back up yet:

NASA Atmospheric Science Data Center (ASDC): ASDC products supported, at

Neither us, nor any of the teams I know about. It contains lots of data about clouds, atmospheric aerosols, the Earth's radiation budget and tropospheric chemistry.

  1. Sakari Maaranen

    2016-12-21 Hetzner storage box

    wget --dns-timeout=10 --connect-timeout=20 --read-timeout=120 \
     --wait=6 --random-wait --follow-ftp --progress=dot:mega \
     --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 \
     --no-remove-listing --no-check-certificate \ \
     -H \ \

    I am excluding links to:

  2. Sakari Maaranen

    This site seems to be using CAPTCHA for controlling access to some resources. Perhaps we should focus on FTP sites - or at least I should, because I don't have too much time analyzing every site at this level.

    I may have to interrupt this, if it contains bogus files, blocked by CAPTCHAs. I detected this by observing those image file names in the transfer log. Do they have an FTP site or can we otherwise get around CAPTCHA protections?

    Do scan your transfer logs for anything suspicious!

  3. Sakari Maaranen

    This contains lots of dynamically generated files like CAPTCHAs etc. Should probably be cleaned up. I hope also the valuable data is there.

  4. Jan Galkowski

    Hi Scott!

    It contains what appears to be a stub ... On azi03, /home/jan/local_data/datarefuge/ I say that because when I checked it, a cd to it acted like a self-reference ... ls showed an identical copy of the subdirectory, and repeating the cd ended up with the same. Doing pwd showed ever increasing depths.

    So what I did was wipe all that, and then chown maxwell, and chgrp maxwell, and chmod 755 on /home/jan/local_data/datarefuge/eosweb/larc.nasa.datapool, on *azi03``. If it's FTP, that's superior stuff for us, so please dump it there. I think there should be room, although it depends on how big.

    What I have been doing recently when I get into room problems, is just before I hit some kind of limit, I'll go to the shell screen and do ^z leaving the job suspended, so I can fg when things improve.

  5. Sakari Maaranen
    [sam@pub01]$ pwd
    [sam@pub01 pub]$ du -sbc NASA_ASDC/*
    18523693279     NASA_ASDC/2017-01-14_29

    Note that the directory contains a large number of files (57115), many of them redundant like CAPTCHAs.

