Beginning with the 375.51 driver, and continuing at least into the 375.66 driver, CUDA debugging has encountered some problems. I discovered these problems with my company’s NightView debugging, but the problems appear to stem from the driver itself.
In particular, we no longer get KERNEL_READY events, nor are breakpoints hit. However, we found a workaround where, if we stopped both the CPU and GPU in response to an earlier ELF_IMAGE_LOADED event and set a breakpoint while both CPU & GPU were stopped, these things would start working. This is a problem for our debugger because it focuses on real-time development, and so stopping the CPU is undesirable. (Stopping the GPU is unavoidable, sadly.)
I did some testing with both our own RedHawk kernel and CentOS 7.3 kernel with these results:
RedHawk 7.3 4.4.60-rt73-RedHawk-7.3-trace + NVIDIA 375.26 + CUDA 8.0.61: pass
RedHawk 7.3 4.4.60-rt73-RedHawk-7.3-trace + NVIDIA 375.51 + CUDA 8.0.61: FAIL
CentOS 7.3 3.10.0-514.16.1.el7.x86_64 + NVIDIA 375.26 + CUDA 8.0.61: pass
CentOS 7.3 3.10.0-514.16.1.el7.x86_64 + NVIDIA 375.51 + CUDA 8.0.61: FAIL
CentOS 7.3 3.10.0-514.16.1.el7.x86_64 + NVIDIA 375.66 + CUDA 8.0.61: FAIL
And we never encountered any of these problems with CUDA 8.0.44 or earlier, or with earlier driver versions, either. So I’m pretty sure the problem stems from the 375.51 driver update itself.
Stopping the CPU is something that cuda-gdb does as part of its normal operation, so I can’t reproduce all the problems we are seeing using it. But I was able to reproduce a couple. It doesn’t appear that I can add attachments here, so I’ve put the examples up on our website, and I’l include URLs to them:
In both cases, untar them, cd to the test directory and do “make doit”. That will build a test program and run cuda-gdb with a couple commands. The last is an echo command with the expectation. That expectation is met with the 375.26 driver but not with 375.51 or 375.66.
In the first one, it uses “set cuda kernel_events application” to show that KERNEL_READY events are not being received if no breakpoints were set.
In the second one, it uses “set cuda break_on_launch system” to show that automatic breakpoints in memset32_post are not being hit.