et_look using all available RAM then crashed

Bich Tran reporter

edited description

2024-05-27T13:45:57+00:00

Bich Tran reporter

Updated: We are able to run et_look with the provided et_look_in.nc on a desktop with 64Gb RAM, peak ram used: ~ 40Gb.

We will try to chunk the input dataset by 5 time_bins.

2024-05-27T15:24:40+00:00

bert.coerver

Hi Bich,

If you look at the (keyword-)arguments of pywapor.et_look.main(), you'll see a keyword called chunks. You can't (yet) change this keyword through the pywapor.Project.run_et_look() method, but I'm sure you can figure out how to run the pywapor.et_look.main() function directly.

The default value for chunks is {"time_bins": -1, "x": 500, "y": 500}`, meaning that the time dimension is not chunked, so you could for example try setting it to 25 or something.

I’m in the process of exposing this `chunks` keyword for all the different stages (i.e. also pre_et_look, se_root etc.), so work in progress…

Bert

‌

2024-05-28T07:58:13+00:00

Bich Tran reporter

Updated: The default chunksize chunks = {"time_bins": -1, "x": 500, "y": 500} in pywapor35 may cause RAM issue. I tested different version and chunksize to run et_look:

pywapor34 et_look version 2, default chunksize chunks = {"time_bins": 1, "x": 1000, "y": 1000}

    --> Saving output to `et_look_out.nc`.
        > peak-memory-usage: 2.0GB, execution-time: 0:02:42.884231.
        > chunksize|dimsize: [time_bins: 1|31, y: 452|452, x: 759|759], crs: None
< ET_LOOK (0:36:14.177909)

pywapor35 et_look version 2, customize chunks = {"time_bins": 1, "x": 1000, "y": 1000}

ds_out= pywapor.et_look.main(ds, et_look_version = 'v2', chunks = {"time_bins": 1, "x": 1000, "y": 1000})

    --> Saving output to `et_look_out.nc`.
        > peak-memory-usage: 1.1GB, execution-time: 0:02:59.866786.
        > chunksize|dimsize: [time_bins: 1|31, y: 452|452, x: 759|759], crs: None
< ET_LOOK (0:36:52.146582)

pywapor35 et_look version 3, customize chunks = {"time_bins": 1, "x": 1000, "y": 1000}

ds_out= pywapor.et_look.main(ds, et_look_version = 'v3', chunks = {"time_bins": 1, "x": 1000, "y": 1000})

--> Saving output to `et_look_out.nc`.
        > peak-memory-usage: 1.3GB, execution-time: 0:02:45.936586.
        > chunksize|dimsize: [time_bins: 1|31, y: 452|452, x: 759|759], crs: None
< ET_LOOK (0:42:34.667140)

pywapor35 et_look version 3, customize chunks = {"time_bins": 10, "x": 1000, "y": 1000} => crashed
pywapor35 et_look version 3, customizechunks = {"time_bins": 1, "x": 500, "y": 500} => crashed

It might depends on dimsize, but chunks = {"time_bins": 1, "x": 1000, "y": 1000} seems to work better.

Suggestion: add **kwargs to pywapor.project.run_et_look():

    def run_et_look(self, et_look_version = "v3", **kwargs):
        self.et_look_out = pywapor.et_look.main(self.et_look_in, et_look_version = et_look_version,**kwargs)
        return self.et_look_out

So that in the project workflow: et_look can run with custom chunksize

et_look = project.run_et_look(chunks = {"time_bins": 1, "x": 500, "y": 500})

‌

2024-05-28T10:41:39+00:00

bert.coerver

changed status to open

2024-05-28T11:07:12+00:00

bert.coerver

marked as enhancement

2024-05-29T08:32:51+00:00

bert.coerver

changed status to resolved

Added option to configure chunk sizes in https://bitbucket.org/cioapps/pywapor/commits/5d4af9b0ec8dda6b1b6c7a4e390e75b9dac66afc

2024-05-29T08:34:54+00:00

Comments (7)