Debugging acc kernel

I have FORTRAN code.
I marked it with ACC directives. Launching application result in

call to cuMemFree returned error 700: Launch failed

cuda-memcheck shows huge amount of errors like

========= Invalid __global__ read of size 4
=========     at 0x0003cb58 in mp_thompson_837_gpu
=========     by thread (56,0,0) in block (0,29,0)
=========     Address 0x0c42c9fc is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/ (cuLaunchKernel + 0x34b) [0x54b6b]

I failed to debug my application with cuda-gdb.
Is there any correct way to debug such application (acc kernel)?

boundary check test passed OK.


Hi Alexey,

Yes, until we can get on device debugging supported these are difficult issues to determine. There are multiple points of failure here. It could be a problem with the original source, the generated CUDA kernel, or a problem with the back-end compiler.

When diagnosing these types of issues, I typically start with original code and use diagnostic flags such as -Mbounds, -Mchkptr, and -Mchkstk as well as run the program under Valgrind to check for memory issues.

The next step is to start commenting out parts of the accelerated code to determine the line number that the error occurs at. This sometimes gives an better indication of the problem. Next, I will keep the generated GPU code and look for errors associated with this line number. I will then also adjust the loop schedule to effect the generated kernel to see if this has an effect. Of course, if you do find that the compiler is generating bad CUDA code, we would appreciate you sending us a reproducing example.

If the generate CUDA code looks ok, my next step is to compile the back end code without optimization (-ta=nvidia,O0).

  • Mat

Hi Mat,

thank you for your advices. Unfortunately, I didn’t find the error in the ACC kernel yet.

Is it possible to save kernel generated by PGI, modify, compile it with NVCC and substitute old kernel with modified one.


Hi Alexy,

Is it possible to save kernel generated by PGI,

Yes, “-ta=nvidia,keepgpu”

modify, compile it with NVCC and substitute old kernel with modified one.

Not really. It’s possible but if you suspect PGI is generating bad CUDA code, the better thing to do is send a report and reproducing example to PGI Customer Service ( If you ask them to forward the code to me, I’ll take a look to see if I can determine the issue.

  • Mat