Alternately Encountering "Illegal Memory Access"

JazzyJimmy_Liu · August 23, 2017, 12:57pm

Hi,

I am trying to feed an n-by-n-by-n structure into a kernel function. I may encounter an issue of illegal memory access when I try to copy the memory from device to host:

  0: copyout Memcpy (host=0x0x2b3968ae0020, dev=0x0x10316200000, size=6291456) FAILED: 77(an illegal memory access was encountered)

The size of 6291456 is when n = 64, and there are three 8-byte integer/double in each element of the structure, so that 64x64x64x3x8 = 6291456.

The puzzling thing is that I may encounter the issue alternately at runtime. Or sometimes it runs fine but then it fails when I tried to use cuda-memcheck.

I used to have the bug of the true size of the structure is not correctly extracted, but this time I read in the structure and extract the dimensions using size(grid, dim) where dim is the dimension to be extracted.

    The error messages from cuda-memcheck consist mostly of messages as below:
========= Invalid __global__ read of size 8
=========     at 0x0002f628 in photon_module_shoot_photon_
=========     by thread (31,0,0) in block (0,0,0)
=========     Address 0x103164011f0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
...

Your help is greatly appreciated as I was not familiar with memcheck and could not decipher the error messages. It is also greatly appreciated if I am allowed to send in the code and error messages if possible.

Many Thanks,

Jimmy

tull · August 24, 2017, 9:20pm

I have forwarded your question for a possible answer.
I anticipate they will ask if you can provide a program that demonstrates. Should you not want to share with the Forum, you can send it to trs@pgroup.com.

dave

MatColgrove · August 25, 2017, 6:41pm

Hi Jimmy,

An “illegal memory access” is a generic error similar to a seg fault on the host where a bad address is referenced on the device. I can occur under several circumstances with the most common causes being accessing memory beyond the end of an array, accessing a host pointer on the device, a single object using more than 2GB of memory without the -Mlarge_array flag, or using too much heap or stack space on the device.

Note that the error is most likely occurring in the kernel before the memcpy (as seen in the cuda-memcheck output), not in the memcpy itself.

Given the cuda-memcheck error and the fact that the code succeeds sometimes, my best guess is that you have an out-of-bounds error.

Is this OpenACC or CUDA Fortran? In OpenACC I’d recommend running the binary with Valgrind (www.valgrind.org) to see if it finds any memory errors. For CUDA Fortan, compile in emulation mode (-Mcuda=emu) and then run under Valgrind.

-Mat

JazzyJimmy_Liu · August 27, 2017, 4:43pm

Hi Dave and Mat,

Thank you very much for your reply.

Indeed, the error happened in the kernel, before the real memory copy took place. I was just confused about which way I should use to avoid out-of-bounds error.

I have provided my CUDA Fortran code and error messages to trs@pgroup.com. In the mean time, I will try to use the way of emulation mode + Valgrind to test as suggested by Mat.

Thanks,

Jimmy

MatColgrove · August 28, 2017, 7:16pm

Hi Jimmy,

I took a look at the code. I think the problem is that you’re using an uninitialized value as the index into an array. Hence the failure is intermittent depending upon what value happens to be in the variable. I sent more details in my email response.

-Mat

Topic		Replies	Views
illegal memory access was encountered.... CUDA Programming and Performance	1	684	March 7, 2019
illegal memory access - any help appreciated CUDA Programming and Performance	5	6733	February 8, 2018
Cuda error 77 (0x4d) when increasing problem size CUDA Programming and Performance	5	8873	June 9, 2016
an illegal memory access was encountered CUDA Programming and Performance	0	890	November 8, 2017
an illegal memory access was encounteredCUDA error check cudaMemcpy d_exf: an illegal memory access CUDA Programming and Performance	2	1823	March 17, 2016
Illegal Memory Access in CUDA Fortran Code Legacy PGI Compilers	7	10504	May 24, 2017
cuda-memcheck identifies libcuda.so as source of a cudaErrorIllegalAddress error CUDA Programming and Performance	2	648	December 19, 2017
Tracking Invalid read size and illegal memory access CUDA Programming and Performance	3	7706	May 24, 2016
GPUassert: an illegal memory access was encountered CUDA Programming and Performance cuda	1	1204	November 12, 2020
Incidental error 700 - an illegal memory access is encountered CUDA Programming and Performance cuda	5	8916	March 25, 2021

Alternately Encountering "Illegal Memory Access"

Related topics