Memory issues on cu_spat_to_SH/cu_SH_to_spat functions

Zekun Ni reporter

changed title to Memory issues on cu_spat_to_SH/cu_SH_to_spat functions

2023-04-27T08:57:36+00:00

Nathanaël Schaeffer repo owner

Hello,
A first quick answer, with a possible workaround (but it should be confirmed by a careful analysis later).
If the issue is the one I think based on your description, you can assume that if you allocate (nlm+mmax+16) complex numbers, you should be safe (see here). But I’m not sure this is the issue, as this nlm_stride is used for internal buffers only (as far as I remember, I will check later).

Let me know if this tentative workaround solves the issue.
By the way, are the failures systematic for a given size or more or less random ?

2023-04-27T09:08:58+00:00

Zekun Ni reporter

Unfortunately according to the link in my original post nlm_stride is also explicitly used for the output of cu_spat_to_SH as this line sets all nlm_stride elements to zero. I’m not sure if indices after nlm are actually visited in the succeeding CUDA kernel ilegendre, though. I can hard code (nlm+mmax+1+31)/32 in my code as a workaround but a less ugly solution is definitely needed.

‌

The failure is kind of random but is much more frequent if the array itself is small, so that such overflow is more likely to affect an adjacent memory chunk.

2023-04-27T09:18:38+00:00

Nathanaël Schaeffer repo owner

Yes, if the workaround solves the issue, I will definitely fix the code in a proper and clean way.
But use this formula please, the one you wrote is not correct (real vs complex numbers, bytes vs sizeof(real)):

nlm + mmax + 16

Thank you very much for reporting.
FYI, I also use extensively the double-precision version of these functions and have never stumbled on this kind of issue, so it might also be specific to single-precision, which is not really what you suggest.

2023-04-27T09:32:39+00:00

Zekun Ni reporter

I’ll test the workaround after next time my program fails and see if “invalid argument” error still appears. Anyway this cudaMemsetAsync call does look likely to trigger memory errors if the output array is not large enough.

I also haven’t run into this issue during the last two years, but recently the program grows more sophisticated and involves more frequent calls on more arrays, and then it starts to fail miserably, especially when there are heavier memory accesses on the GPUs.

2023-04-27T09:38:52+00:00

Nathanaël Schaeffer repo owner

changed status to open

New commit d2accf1 in master, with a proper fix. Can you check if it solves your issue?

2023-04-27T12:25:10+00:00

Zekun Ni reporter

Hi Nathanaël, thanks for the fix, and after applying this fix the “invalid argument” error no longer pops up so far. There is a variant program which kept failing shortly after being started before this fix. Now it has been running without errors for an hour. It does look like this error has been fixed after eliminating the memory overflow at sht_gpu_kernels.cu:1576.

2023-04-28T01:37:23+00:00

Nathanaël Schaeffer repo owner

Thank you for spotting it! For what it’s worth, I have made a new release, v3.5.4.

2023-04-28T10:14:05+00:00

Nathanaël Schaeffer repo owner

changed status to resolved

fixed in v3.5.4

2023-04-28T10:15:09+00:00

Comments (9)