I installed Cuda Toolkit 4, Dev Driver and Gpu toolkit. Made a new project in VS2010. I have two classes GpuStoreProxy.cpp and CudaStore.cu.Build Customizations are setup, so all compiles well.
The problem is that I cannot debug into the .CU file. And, I am not referring to debugging a method decorated with global, just a regular method in the .CU file is skipped by the debugger. I know that code in that method is executed because I can see the output from printf.
Here is the output from the “Command Line” under “CUDA C/C++”:
(Approximate command-line, please see the output window after a build for the full command-line)
Driver API (NVCC Compilation Type is .cubin, .gpu, or .ptx)
The code by default compiles without debug information and with optimizations. The -G0 flag compiles for GPU debugging (which means that you need two GPUs and windows 7 and nsight or linux and cuda-gdb to debug). The flag you need is -g -O0 (the first to enable host debugging and the second to disable optimizations or the stepping is going to jump around erratically)
In Visual studio, you can go into project properties -> configuration properties -> CUDA Runtime API -> General
enable “Generate Host Debug Information”, Disable Optimization and change runtime library to multi threaded debug.
I am certainly a newbie to this, as I still can’t get it work.
Under CUDA Device I have: Generate GPU Debug Information = Yes (-G0), Code Generation = compute_10,sm_10;compute_20,sm_20. I tried changing Generate GPU Debug Information to -g -O0 or some combination of those parameters, but that didn’t work.
Under CUDA Host I have: Generate Host Debug Information = Yes (-D_NEXUS_DEBUG -g), Runtime Library = Multi-threaded Debug DLL (/MDd). I can’t switch it to Multi-Threaded Debug (/MTd) because this is a DLL, not an executable – the entry point is in a different assembly.
I also need to mention that what I am trying to do is to call this from C#, so I have C# console EXE that has a reference to Managed C++ DLL that serves as a wrapper. The Managed C++ code resides in .CPP file and it then calls into .CU file. That’s where the debugger skips – as I try to go from a line in .CPP file to a line in .CU file.
I tried running the CUDA SDK examples and I was able to debug them just fine. Does it have anything to do with the fact that the example’s entry point is in the .CU file, not .CPP file?