hardware breakpoint or something :)


What techniques do you use to debug your kernels ?
Emulation mode in CUDA is a handy tool to debug some out of bounds reads/writes,
but does not ilustrate wery well threads interactions (especially when shared memory and __syncthreads are used a lot).

Now i have a (common) situation when my kernel runs perfectly in emulation mode,
and just does not run in ‘normal’ mode, the kernel is quite large and complex,
is there any way to know at witch instruction the kernel execution failed, any chance for something like hw breakpoint or some usefull message from cuda runtime why the execution failed ? (out of bound memory access, devide deadlock, watchdog killed kernel execution, etc. etc.)

The only way i see for now to debug the kernel is the trial and error method of commenting blocks of code until we get something that works and then adding bits of code and see what happends, its a huge waste of time External Image

What techniques do you use in such a situations ?

My technique:

  1. use CUT_CHECK_ERROR for the error message (not always extremely clear, but gives a first idea)
  2. output intermediate values (and skipping the rest of the kernel)
  3. look for boundary-conditions that may cause infinite loops and such
  4. look for while loops that never end and such
  5. pray for the visual debugger as soon as possible ;)

As ive painfully learned myself in this thread :
look for racing conditions and look for out of bounds reads/writes!

As was already mentionned, output some intermediate variables.

If the kernel is simply not “running”, as in it exits as soon as it is called and you get a cudaerror, my guess would be out of bounds read/writes.