I have an application that does some image processing for computer vision purposes with CUDA, specifically, the application is to be used for drone navigation in a known environment. There are several things bugging me, the first one being the most important:
Upon compiling with -G (or --device-debug ), the application produces the expected results although it runs slowly. Contrary-wise, disabling debugging leaves the program to run in real-time, but the result is slightly off, and becomes useless over time.
I am compiling with -arch compute_50 -code -sm_50, both the latest (compute/sm_62) and the earliest (compute/sm_20) supported alternatives gives an Invalid Texture-error. I am using nvcc from the CUDA-8.0 SDK.
Furthermore, upon compiling with -Xptxas -g, the code seems to run slowly, but simultaneously produces the wrong result, implying that this option differs from -G, contrary to what is described in the docs: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options
The program is non-probabilistic and the observations above are made consistently. Does the behavior of -G have any aspects that could possibly produce the above result, that is, that the program runs fine with -G and not without it?
By the way, I have run cuda-memcheck on the program both with and without debugging enabled, both resulting in no detected errors.