However, Nsight behaves strangly. From my observation, it only stops at the very first kernel. I think it might due to my CMAKE setting, anyone knows my problem?
I’m not sure how to run Nsight. I only picked up this thread, because I’m the maintainer of the FindCUDA code in CMake. It might be useful to start a new thread that is more targeted at the GUI settings for what you want, so it doesn’t get lost in this thread. There might also be additional information in the Nsight documentation that could help you with getting break points just where you want them.
Ha! So you are a CMake guy~!! Fantastic. I haven’t found any good articles about CMake setting with CUDA. I attached my CMakeLists.txt here. Do you mind to see if it is correct? To be honest, I am not an expert with CMake either, I noticed many people talking about the FindCUDA.cmake in MyGForge > Projects > FindCUDA > SVN > Browse repository
However, I don’t quite understand, since the Find_PACAKGE(CUDA) seems to work perfectly good for me. Why bother using that FindCuda.cmake? Is there any difference?
Earlier, CMAKE did not support CUDA Natively. So Abe Stephens wrote one. Thats what many people were referring to.
Later, it was integrated (or somebody else wrote it freshly for CMAKE) with CMAKE.
So FIND_PACKAGE() works now with CMake
Sarnath is correct. CMake didn’t originally support CUDA, then Abe Stephens wrote support based on a swig module we wrote for the Manta interactive ray tracer. Since coming to NVIDIA I picked up where Abe left off, added a bunch more features and had it officially integrated into the CMake distribution where I maintain it. The one hosted at SCI is generally up to date and contains the primary test bed for my development.
As far as your CMakeLists.txt file, if you use cuda_add_executable, you shouldn’t need to add include or library paths. The cuda_add_executable adds these automatically. Most of the documentation lives in the FindCUDA.cmake script found in the distribution and can also be displayed with cmake --help-full.
# the cuda flags are lists and can be appended
list(APPEND CUDA_NVCC_FLAGS -arch=sm_21 -ftz=true -prec-div=false -prec-sqrt=false)
# the cuda flags also support configuration specific flags such as this debug flag
list(APPEND CUDA_NVCC_FLAGS_DEBUG -G0)
CUDA_ADD_EXECUTABLE(CUDA_VideoProc xxx.cpp)
TARGET_LINK_LIBRARIES(CUDA_VideoProc
debug cutil32D optimized cutil32
debug shrUtils32D optimized shrUtils32
glew32
debug rendercheckgl32D optimized rendercheckgl32
vfw32.lib
)