Wiki

Clone wiki

azimuth-inventory / Climate data sources

How to add an entry

  1. Coordinate with ClimateMirror.org issue tracker. Find the relevant issue and discuss it there.
  2. Create an item for each data set in the Azimuth Backup Issue Tracker.
  3. Enter relevant details on each source
  4. Prioritize based on how valuable and how urgent the source is.
  5. Set the Milestone to 0-Identified for new data sources, before they have been properly examined.
  6. Assign the issue to whomever is working on it.

Data sets in the 0-Identified milestone:

  1. Analyze the data set and describe how you think it should be backed up.
  2. Check if the site is using CDN or otherwise has large files on a different domain.
  3. Document which domains to target for web crawling or other protocols; HTTP, FTP, SFTP, etc.
  4. When you think you have gathered enough information, set the Milestone to 1-Specified and Status to Open.

Data sets in the 1-Specified milestone:

  1. Document the method how you are going to back up the data set.
  2. Document the location where the backup copy will be stored.
  3. Start the download. Do use a log file, if possible.
  4. Set the Milestone to 2-In Progress.

Note that for security reasons the location may be expressed in a general way, if it is not yet publicly accessible. All the backups will be made publicly accessible later on.

Data sets in the 2-In Progress milestone:

  1. After the download has completed, document which directories were created as a result.
  2. Use a command like ionice du -bsc * or similar, to get the size of each directory and document it.
  3. Attach a file listing, if possible (with SHA-256 checksums if you can generate them).
  4. Attach the transfer log, if possible.
  5. Add a link to the public licence and/or copyright statement from the origin, if any.
  6. Set the Milestone to 3-Complete.

Data sets in the 3-Complete milestone:

  1. Read the File hierarchy plan for the public servers.
  2. Make sure the data set ends up on the public server where you reserved space for it.
  3. Once the data set is in the allocated public directory, set the directory permissions using the script: /usr/local/bin/set_read_only.sh or ask people with sudo to do it.
  4. Make sure the final location is documented in the issue tracker and on Our progress page.
  5. Do not keep redundant copies of the same data, so remove any working directories you may have left behind. Removing unnecessary copies is important, so others know that the space is available.
  6. Set the Milestone to 4-Published Status to Closed.

{ Name of the data set }

{ YYYY-MM-DD }

{ backup location }

wget --prefer-family=IPv4 –-no-verbose \
 --dns-timeout=10 --connect-timeout=20 --read-timeout=120 \
 --tries=40 --timestamping=on --recursive --level=inf \
 --no-remove-listing –output-file=LOG_FILE_NAME.log \
 --follow-ftp --no-check-certificate \
 -H -Dfirst.example.com,second.example.com \
 http://first.example.com/somedata/ \
 https://second.example.com/more/data/

Attachments:

  • directory listing
  • transfer log
  • checksums (optional)

Updated