Debugging acc kernel

AROM · April 29, 2013, 5:02pm

I have FORTRAN code.
I marked it with ACC directives. Launching application result in

call to cuMemFree returned error 700: Launch failed

cuda-memcheck shows huge amount of errors like

========= Invalid __global__ read of size 4
=========     at 0x0003cb58 in mp_thompson_837_gpu
=========     by thread (56,0,0) in block (0,29,0)
=========     Address 0x0c42c9fc is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x34b) [0x54b6b]
...

I failed to debug my application with cuda-gdb.
Is there any correct way to debug such application (acc kernel)?

boundary check test passed OK.

Alexey

MatColgrove · April 29, 2013, 11:04pm

Hi Alexey,

Yes, until we can get on device debugging supported these are difficult issues to determine. There are multiple points of failure here. It could be a problem with the original source, the generated CUDA kernel, or a problem with the back-end compiler.

When diagnosing these types of issues, I typically start with original code and use diagnostic flags such as -Mbounds, -Mchkptr, and -Mchkstk as well as run the program under Valgrind to check for memory issues.

The next step is to start commenting out parts of the accelerated code to determine the line number that the error occurs at. This sometimes gives an better indication of the problem. Next, I will keep the generated GPU code and look for errors associated with this line number. I will then also adjust the loop schedule to effect the generated kernel to see if this has an effect. Of course, if you do find that the compiler is generating bad CUDA code, we would appreciate you sending us a reproducing example.

If the generate CUDA code looks ok, my next step is to compile the back end code without optimization (-ta=nvidia,O0).

Mat

AROM · May 22, 2013, 12:13pm

Hi Mat,

thank you for your advices. Unfortunately, I didn’t find the error in the ACC kernel yet.

Is it possible to save kernel generated by PGI, modify, compile it with NVCC and substitute old kernel with modified one.

Alexey

MatColgrove · May 22, 2013, 5:04pm

Hi Alexy,

Is it possible to save kernel generated by PGI,

Yes, “-ta=nvidia,keepgpu”

modify, compile it with NVCC and substitute old kernel with modified one.

Not really. It’s possible but if you suspect PGI is generating bad CUDA code, the better thing to do is send a report and reproducing example to PGI Customer Service (trs@pgroup.com). If you ask them to forward the code to me, I’ll take a look to see if I can determine the issue.

Mat

Topic		Replies	Views
Launch failed error Legacy PGI Compilers	8	11283	September 13, 2013
How to debug "Illegal address during kernel execution&q Legacy PGI Compilers	1	2190	August 29, 2018
Cuda-memcheck error that I cannot figure out Legacy PGI Compilers cuda	3	877	June 30, 2021
How to debug OpenACC code with NVIDIA GPU? Legacy PGI Compilers	2	5997	March 1, 2016
error for a simple OPENACC program Legacy PGI Compilers	23	12021	May 16, 2013
cuda-memcheck/6.5 & pgi/15.1 error ? Legacy PGI Compilers	1	6096	February 25, 2015
Stepping into OpenACC region with cuda-gdb Legacy PGI Compilers	3	6749	October 9, 2013
PGI generates kernel that couldn't be launched Legacy PGI Compilers	8	4861	April 24, 2013
OpenACC bug? Legacy PGI Compilers	1	1717	May 9, 2012
openAcc Error: cuda_enter.c: No such file or directory Legacy PGI Compilers	2	3197	June 14, 2013

Debugging acc kernel

Related topics