Is there a memory limit on invididual kernels?

Hi,

I have run across something a bit strange.

I have a code that uses MPI to compute on multiple GPUs.

I am testing the code with a large problem on a TitanXP.

When I run the code using 1 MPI rank, the code crashes with

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

I have seen this before when the problem size is too large to fit in the GPU RAM.

However, when I run the code with 2 MPI ranks (oversubscribing the GPU) the code works fine and completes. Running nvidia-smi during the run shows that the code is using ~6GB (3GB per rank) total out of the 12GB RAM on the card.

Since the problem can fit into GPU memory, I do not understand why it crashes when using 1 MPI rank.

Is there some kind of memory size limitation on an individual kernel?

  • Ron

Hi Ron,

Are any of the individual arrays >2GB or is the total static memory >2GB? If so, you’ll need to add the flag “-Mlarge_arrays” for dynamic arrays, or “-mcmodel=medium” for static.

-Mat

Hi,

I think one of the arrays is ~3GB.

When I add -Mlarge_arrays (or -mcmodel=medium) I get a segfault right away:

[PREDSCI-GPU2:24251] *** Process received signal ***
[PREDSCI-GPU2:24251] Signal: Segmentation fault (11)
[PREDSCI-GPU2:24251] Signal code: Address not mapped (1)
[PREDSCI-GPU2:24251] Failing at address: 0x7ffdcb93d018
[PREDSCI-GPU2:24251] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f81b016b390]
[PREDSCI-GPU2:24251] [ 1] /usr/local/pgi/linux86-64/2017/lib/libpgf90.so(pgf90_copy_f77_argl_i8+0x16c)[0x7f81b10d3a8c]
[PREDSCI-GPU2:24251] [ 2] [0x7065f0]
[PREDSCI-GPU2:24251] *** End of error message ***

Try adding “-i8” as well in case you’re overflowing an “integer” variable.

Otherwise, I’m not sure. The segv is occurring on the host side so you can try running the program through a debugger to see where it’s failing.

I realized my code is linking to some libraries that I compiled without -Mlargearrays. Does every part of the code need to be compiled with that for it to work?

Are there any performance differences when using large arrays (i.e. why is it not default)?

  • Ron

Hi Ron,

Does every part of the code need to be compiled with that for it to work?

With “-mcmodel=medium”, then you would need to re-compile all the libraries with this flag as well.

However with “-Mlarge_arrays” you don’t. That is unless you are passing the large array to the library routine, in which case you would need to recompile the library.

Are there any performance differences when using large arrays (i.e. why is it not default)?

Yes. -Mlarge_arrays may cause a slight slowdown since address offsets are now 64-bits instead of 32-bits.

-Mat

Hi,

I recompiled the libraries with -Mlarge_arrays, then re-compiled the code with -Mlarge_arrays, and now it works!

This also somehow seems to have fixed another problem I posted to this forum about a tool crashing with -O3 but not with -O1.
With -Mlarge_arrays on, the -O3 seems to be working as well.

  • Ron