How does Windows get around the limitation that Linux/Mac have for debugging with one GPU?

Hi,

I just realized that if you have one GPU and it is running X11 that you cannot debug the GPU. While I understand the reason I was wondering how Windows gets around this issue, from what I read it only happens on Linux & Mac OS X.

I’m assuming the issue arises when you pause, it pauses all the GPU threads and since they don’t have context switching like a general purpose CPU it would freeze the OS’s GUI, on a General purpose CPU you have context switching & a scheduler to give the illusion of parallel multi-tasking so when you pause a program’s execution it only pauses that context and not the others (thus the OS doesn’t freeze).

Can someone enlighten me on this issue and why it doesn’t occur on Windows?

Thanks,
Gabriel

Gabriel,

I provide additional details on the Nsight VSE 2.2 and 3.0 solution in this forum post:

https://devtalk.nvidia.com/default/topic/532961/how-does-single-gpu-debugging-happen-on-kepler-/#3749973.

The Nsight VSE CUDA Debugger and cuda-gdb are separate debuggers. Each solution has different pros and cons. The algorithm used in Nsight VSE is portable to other operating systems.

Hi Greg,

thanks, that sums it up quite well. Couldn’t serialization of non-deterministic code end up giving the expected results during the debugging (if it is always serialized in the same manner), but in real world usage not work correctly because of a bug? Maybe this is one of the reasons why you changed the algorithm in version 3.0.

Concerning the 3.0 frame replay, what other cons does it have (other than speed)?

I guess that having a dedicated card for X11 and a dedicated GPU for CUDA gives a more “real” experience at the expense of having a more complicated configuration and having multiple cards.

Greg,

quick question, if I have an Optimus capable card (K1000m) on Linux, is it possible to debug the GPU with Bumblebee installed? I assume it shouldn’t be since it creates a virtual screen for it.

Thanks,
Gabriel

Yes. It is. I answered you on the other posts.

I replied, thanks. Here is the link in case someone needs it:

https://devtalk.nvidia.com/default/topic/535773/cuda-setup-and-installation/lenovo-thinkpad-w530-running-linux-use-integrted-intel-for-x11-and-discrete-nvidia-for-cuda-dev/?offset=10#3765086

Couldn’t serialization of non-deterministic code end up giving the expected results during the debugging (if it is always serialized in the same manner), but in real world usage not work correctly because of a bug?

Serialization may hide issues with race conditions between kernels or memory copies issued in two different streams. This is not typically a use case you would debug with a source level debugger. Almost all debuggers have some impact on the code execution which can result in either hiding or exposing race conditions. Using debugger run control is a common method to test for race conditions.

Concerning the 3.0 frame replay, what other cons does it have (other than speed)?

The API interception that enable frame replay introduces some run time and memory usage overhead. It does have additional advantage that you can continually re-inspect different issues in the frame in a reproducible manner and even save off the replay for later investigation. Replays for D3D are saved as source code that you can manipulate. Replays for OpenGL are saved in a binary format and can be investigated but not modified.