I am using Eclipse with the Nsight Eclipse Plugin to debug CUDA applications on a remote system. I just came back to this setup after a longer break and I must say I am a bit surprised how rough the experience is… I just want to check if I am missing something here or if someone has some smart options to make remote CUDA development a bit more bearable.
Everything is just a bit slow
When I start a local NVIDIA CUDA GDB Debugger configuration, it takes max
3s for it to reach the first breakpoint (before the first device kernel call). When I start it remotely (NVIDIA CUDA GDB Debugger (Remote)), it can take almost
1min to get to the same breakpoint. And that does not include building the project:
- It takes
~10sto Launch the configuration (as shown in the Progress tab). This includes copying the 4MB executable of my project to the remote (which takes only
scpfrom command line).
- For debugging, library code is downloaded from the remote to the host. This takes the longest (easily
40s). Usually, one can
cuda-gdbto point to local library files and avoid the download, but I haven’t been able to set that in Eclipse (I opened another post for that: Remote debugging: `CUDA GDB init file` is ignored )
- Stepping through device code is quite sluggish too. On a local debug configuration, the delay after a stepping instruction is hardly recognizable. In the remote debugging configuration, every stepping instruction inside device code takes
Both, local and remote system run
Arch/Ubuntu host and
Ubuntu remote). The remote compute node is on a cluster that uses Univa Grid Engine for resource distribution. That means I
ssh onto the head node, use
qlogin to request a login session onto a compute node with GPU access, then I start an ssh server on the compute node (using
dropbear, direct ssh is disabled of course), then I setup an ssh tunnel to forward two ports from my local machine via headnote to the compute node (one port for the remote connection in Eclipse and one for the
I’m currently using the latest Eclipse (version
2020-12 (4.18.0)) and both, local and remote system have CUDA toolkit
How can I make this smoother?
Does anybody have ideas on how to speed up things? I am just surprised that the only supported developing environment for GPUs on Linux remotes is not so smooth. Did I just not set up things correctly? Or is it that Linux isn’t supported well enough (maybe Nsight in Visual Studio offers a better experience)? Or is it just that Eclipse is a bit of a pain? I tried CLion, which recently added some support for CUDA, but unfortunately one can’t step through device code currently and remote makefile projects are not supported either. Other than that, it made a much snappier impression (but could also just be the design).
What are other people using?
- I end up finding myself more and more ditching Nsight debugger and just using command line
printfstatements… But that gets annoying once the problems get more complicated.
- Are there some Eclipse settings to reduce latency of remote communications? I came across this post suggesting to disable some monitor setting for Java debugging. So it doesn’t apply to CUDA debugging, but maybe there are similar options? Eclipse Java remote debugger extremely slow over VPN - Stack Overflow
- I played with some of the CUDA Debugger options (like disabling / enabling memcheck), but didn’t find anything that improved things significantly. But maybe someone else has?
Any help or suggestions or just some feedback on other peoples workflows would be much appreciated.