I've looked into the website a bit.
First, it claims to have more than 1.5 million sampling locations in the database, so --wait 15 will lead to 1.5M * 15 / 86400 = 225 days of download time.
Second, it looks like there's data in the database that's not displayed on static web pages. It looks like you need to go to https://www.waterqualitydata.us/portal/ and enter a site-ID to get all the data.
Once you do that, download URLs can be seen by hitting the "Show Web Service Calls" button.
I'm working on a python script to snarf that data.
I'll look into that idea tomorrow morning before work. FYI, so far the
python script seems to be behaving well.
From the look of the site, it's a database-backed design, and it would not
surprise me if the database stores the actual data.
(The total amount of data isn't huge, so that's not an unreasonable
I searched for the string "ftp:" in all the files I've downloaded so far, and found no examples. So, the HTML doesn't point to any ftp access. Also, just trying to connect via ftp to names like ftp.waterqualitydata.us has no success.
Rather than a site-scrape, I began to grab the backend database through the Download link. The sites database was small (2,483,824 records). The Physical/chemical metadata is much larger (after 12 hours, I am at 12.02 GB of a ZIP archive). Does this sound consistent with what others are getting?
Apparently done. Hashed. Uploading to pub04.rz21.azimuthproject-kickstarter.org:/var/local/gpk/i38_www.waterqualitydata.us .
13 GB, 809884 files. Of that, 4.3 GB is the database, in 88022 files (three files per geographic location).
I'm somewhat concerned that the database download was only partial. I expected that total number of files would be dominated by the database-related files, but the database is much too small for that.
ebovine: did I miss a link to the contents of the database?
Me: need to look at logs and the database download scripts. Try to make a 1:1 correspondence.
The database download apparently terminated early. I'm guessing it stopped when I ran out of disk space a week ago.
It started right up again, though, picking up from where it left off.
So, I believe I have all of the non-database parts of the website; the database is only ~10% downloaded, but back in business.