Error when downloading VIIRS due to os.path.join using backwards slash in Window

Issue #5 resolved
Bich Tran created an issue

I had this error when running pywapor in Windows 10, using the same script that I successfully ran in Linux (conda installation is the same)

--> (1/45) Processing 'VNP02IMG.A2022286.1106.002.2022286184253.nc'.
                --> Downloading `VNP03IMG.A2022286.1106.002.2022286182100.nc`.
  --> Server error 400 Client Error: Bad Request for url: https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData%5C5200%5CVNP03IMG%5C2022%5C286%5CVNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D [].
                --> Trying to download R:/Bich/pywapor35_custom4\VIIRSL1\VNP03IMG.A2022286.1106.002.2022286182100.nc, attempt 2 of 3 in 15 seconds.
                --> Server error 400 Client Error: Bad Request for url: https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData%5C5200%5CVNP03IMG%5C2022%5C286%5CVNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D [].
                --> Trying to download R:/Bich/pywapor35_custom4\VIIRSL1\VNP03IMG.A2022286.1106.002.2022286182100.nc, attempt 3 of 3 in 34 seconds.
                --> Server error 400 Client Error: Bad Request for url: https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData%5C5200%5CVNP03IMG%5C2022%5C286%5CVNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D [].
                --> Collect attempt 1 of 2 for `VIIRSL1.VNP02IMG` failed, giving up now, see full traceback below for more info. (NameError: Could not download https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData\5200\VNP03IMG\2022\286\VNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D after 3 attempts.).

Traceback (most recent call last):
  File "C:\Users\ntr002\Miniconda3\envs\phd\lib\site-packages\pywapor\collect\downloader.py", line 129, in collect_sources
    x = dler(**args)
  File "C:\Users\ntr002\Miniconda3\envs\phd\lib\site-packages\pywapor\collect\product\VIIRSL1.py", line 466, in download
    nc03_file = download_url(url, os.path.join(folder, nc03_parts[-1]))
  File "C:\Users\ntr002\Miniconda3\envs\phd\lib\site-packages\pywapor\collect\protocol\crawler.py", line 316, in download_url
    raise NameError(f"Could not download {url} after {max_tries} attempts.")
NameError: Could not download https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData\5200\VNP03IMG\2022\286\VNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D after 3 attempts.                

The url that caused the error https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData\5200\VNP03IMG\2022\286\VNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D

has mixed slashes, because it was created in VIIRSL1.py using os.path.join (in windows, path is separated with '\')

In Linux, this should generate

https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData/5200/VNP03IMG/2022/286/VNP03IMG.A2022286.1106.002.2022286182100.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D

I tested downloading the corrected url, it worked

from pywapor.collect.protocol.crawler import download_urls, download_url

url=r'https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData/5200/VNP03IMG/2022/302/VNP03IMG.A2022302.1106.002.2022302190411.nc.dap.nc4?dap4.ce=/geolocation_data/longitude%5B%5D%5B%5D;/geolocation_data/latitude%5B%5D%5B%5D'

nc03_file = download_url(url, os.path.join(project_folder, 'test.nc'))

Suggestion:

  • use posixpath for joining parts of url (source)
            base_url02 = posixpath.join(base_url, "5200", nc02_parts[-2], year_doy[:4], year_doy[4:], nc02_parts[-1])
            base_url03 = posixpath.join(base_url, "5200", nc03_parts[-2], year_doy[:4], year_doy[4:], nc03_parts[-1])
            base_cloud = posixpath.join(base_url, "5110", nc_cloud_parts[-2], year_doy[:4], year_doy[4:], nc_cloud_parts[-1])
  • check if there’s similar issue with other product