Program does only work when started with cuda-gdb


i got a strange problem with my application. I moved with my program from my local linux environment to the Tegra X1 embedded platform.
I got a strange problem which i do not understand and i hope someone can helpme with it:

My program compiles fine native on the X1. However, if i start it normally, it seems like my kernel is not beeing executed at all.
The strange thing is, if i try to debug it using cuda-gdb, it runs as expected and outputs the correct results. I just launch ‘cuda-gdb Application’ from console and then start it with ‘run’.
In contrast to that, if i just use the normal gdb, i.e. ‘gdb Application’ it behaves like i start it directly, so it seems like the kernel is not beeing executed at all.

This is my configuration:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          7.5 / 7.0
  CUDA Capability Major/Minor version number:    5.3

Is it maybe a problem with my CUDA installation?

Is everything being run from a direct JTX1 login to GUI? It gets more complicated if doing remote login via ssh.

No, i was remotely logging in via ssh

Something subtle about video is that for graphical applications there is always a display environment associated with the program. When you log in remotely without something to name a display, that environment breaks. When you do log in remotely and display environment is set, likely that display environment points at the host you started ssh from, and not the Jetson. As it turns out, this can include anything using GPU, the system may not be smart enough to realize the GPU is not for video use. If you have a logged in GUI session on the Jetson, then it is quite possible you could set the DISPLAY environment variable to that display and it would start working again.

If your desktop machine happens to have everything needed for GPU and you have a desktop PC with a nice nVidia graphics card, you might find amazing performance! The only thing is that performance would not be from Jetson. If your desktop machine has conflicting versions, e.g., required CUDA version differs between desktop and Jetson, then you’d get some surprising version mismatch errors.