Remote debugging with Nsight Eclipse Plugin surprisingly slow: Am I missing something?

Hey all,

I am using Eclipse with the Nsight Eclipse Plugin to debug CUDA applications on a remote system. I just came back to this setup after a longer break and I must say I am a bit surprised how rough the experience is… I just want to check if I am missing something here or if someone has some smart options to make remote CUDA development a bit more bearable.

Everything is just a bit slow

When I start a local NVIDIA CUDA GDB Debugger configuration, it takes max 3s for it to reach the first breakpoint (before the first device kernel call). When I start it remotely (NVIDIA CUDA GDB Debugger (Remote)), it can take almost 1min to get to the same breakpoint. And that does not include building the project:

  1. It takes ~10s to Launch the configuration (as shown in the Progress tab). This includes copying the 4MB executable of my project to the remote (which takes only ~2s using scp from command line).
  2. For debugging, library code is downloaded from the remote to the host. This takes the longest (easily 40s). Usually, one can set sysroot in cuda-gdb to point to local library files and avoid the download, but I haven’t been able to set that in Eclipse (I opened another post for that: Remote debugging: `CUDA GDB init file` is ignored )
  3. Stepping through device code is quite sluggish too. On a local debug configuration, the delay after a stepping instruction is hardly recognizable. In the remote debugging configuration, every stepping instruction inside device code takes 3-5s.

My setup

Both, local and remote system run Linux (Arch/Ubuntu host and Ubuntu remote). The remote compute node is on a cluster that uses Univa Grid Engine for resource distribution. That means I ssh onto the head node, use qlogin to request a login session onto a compute node with GPU access, then I start an ssh server on the compute node (using dropbear, direct ssh is disabled of course), then I setup an ssh tunnel to forward two ports from my local machine via headnote to the compute node (one port for the remote connection in Eclipse and one for the cuda-gdbserver).

I’m currently using the latest Eclipse (version 2020-12 (4.18.0)) and both, local and remote system have CUDA toolkit 11.2.0 installed.

How can I make this smoother?

Does anybody have ideas on how to speed up things? I am just surprised that the only supported developing environment for GPUs on Linux remotes is not so smooth. Did I just not set up things correctly? Or is it that Linux isn’t supported well enough (maybe Nsight in Visual Studio offers a better experience)? Or is it just that Eclipse is a bit of a pain? I tried CLion, which recently added some support for CUDA, but unfortunately one can’t step through device code currently and remote makefile projects are not supported either. Other than that, it made a much snappier impression (but could also just be the design).

What are other people using?

  • I end up finding myself more and more ditching Nsight debugger and just using command line cuda-gdb or simply printf statements… But that gets annoying once the problems get more complicated.
  • Are there some Eclipse settings to reduce latency of remote communications? I came across this post suggesting to disable some monitor setting for Java debugging. So it doesn’t apply to CUDA debugging, but maybe there are similar options? Eclipse Java remote debugger extremely slow over VPN - Stack Overflow
  • I played with some of the CUDA Debugger options (like disabling / enabling memcheck), but didn’t find anything that improved things significantly. But maybe someone else has?

Any help or suggestions or just some feedback on other peoples workflows would be much appreciated.