debugging cuda kernels

diego.rivera16 · December 3, 2007, 9:51pm

What are the methods to debug CUDA kernels? Is there a way to stop the execution of a kernel on the GPU to know the value of any variable in the code?

Thanks,

Diego

seb · December 4, 2007, 12:23am

Nvidia tells us a debugger is in the works but will not comment on the release date.

Until this magical piece of software is released no, there’s no way to stop the execution of a kernel on the GPU to know the value of any variable.
The emulation mode allows you to do this, however it doesn’t run the kernel on the GPU but emulates GPU behaviour by sequentially executing all threads on the CPU. Consider this when drawing conclusions. __synchreads() behavior is emulated correctly.

If you get correct results when using emulation mode and wrong results when running on the GPU you could try to write intermediate values back to global memory and evaluate them after the kernel returned.

You could also try to start with a simple kernel and work your way up to the real thing step by step always checking results. Consider nvcc optimization when doing this - often when you remove certain parts of the code (global memory writes) nvcc will optimize out a lot of your code and you’re not actually debugging what you wrote. Setting optimization compiler flags might help in this case.

diego.rivera16 · December 4, 2007, 12:56pm

Thanks a lot for your answer. It is really well explain and will help me a lot.

Nvidia tells us a debugger is in the works but will not comment on the release date.

Until this magical piece of software is released no, there’s no way to stop the execution of a kernel on the GPU to know the value of any variable.

The emulation mode allows you to do this, however it doesn’t run the kernel on the GPU but emulates GPU behaviour by sequentially executing all threads on the CPU. Consider this when drawing conclusions. __synchreads() behavior is emulated correctly.

If you get correct results when using emulation mode and wrong results when running on the GPU you could try to write intermediate values back to global memory and evaluate them after the kernel returned.

You could also try to start with a simple kernel and work your way up to the real thing step by step always checking results. Consider nvcc optimization when doing this - often when you remove certain parts of the code (global memory writes) nvcc will optimize out a lot of your code and you’re not actually debugging what you wrote. Setting optimization compiler flags might help in this case.

[snapback]288794[/snapback]