I have a code that uses MPI to compute on multiple GPUs.
I am testing the code with a large problem on a TitanXP.
When I run the code using 1 MPI rank, the code crashes with
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
I have seen this before when the problem size is too large to fit in the GPU RAM.
However, when I run the code with 2 MPI ranks (oversubscribing the GPU) the code works fine and completes. Running nvidia-smi during the run shows that the code is using ~6GB (3GB per rank) total out of the 12GB RAM on the card.
Since the problem can fit into GPU memory, I do not understand why it crashes when using 1 MPI rank.
Is there some kind of memory size limitation on an individual kernel?
Are any of the individual arrays >2GB or is the total static memory >2GB? If so, you’ll need to add the flag “-Mlarge_arrays” for dynamic arrays, or “-mcmodel=medium” for static.
I realized my code is linking to some libraries that I compiled without -Mlargearrays. Does every part of the code need to be compiled with that for it to work?
Are there any performance differences when using large arrays (i.e. why is it not default)?
Does every part of the code need to be compiled with that for it to work?
With “-mcmodel=medium”, then you would need to re-compile all the libraries with this flag as well.
However with “-Mlarge_arrays” you don’t. That is unless you are passing the large array to the library routine, in which case you would need to recompile the library.
Are there any performance differences when using large arrays (i.e. why is it not default)?
Yes. -Mlarge_arrays may cause a slight slowdown since address offsets are now 64-bits instead of 32-bits.
I recompiled the libraries with -Mlarge_arrays, then re-compiled the code with -Mlarge_arrays, and now it works!
This also somehow seems to have fixed another problem I posted to this forum about a tool crashing with -O3 but not with -O1.
With -Mlarge_arrays on, the -O3 seems to be working as well.