Debugging with Nsight

harware: Quadro 600 and Quadro FX 570
software: 64 bit Win7 with VS 2008

I configured my project with CMAKE
I set the nvcc flags as:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch=sm_21)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -ftz=true)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -prec-div=false)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -prec-sqrt=false)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -G0)

However, Nsight behaves strangly. From my observation, it only stops at the very first kernel. I think it might due to my CMAKE setting, anyone knows my problem?

Actually, it only stops at the breakpoint in thread(0,0,0).

What are the recommended flags for compiling CUDA code for debugging in Nsight? I’m not very familiar with Nsight.

I asked around here at NVIDIA, and -G0 should be the correct flag for nvcc (-g for ptxas if converting ptx->cubin directly).

In order to hit the break point for all threads you need to do the following:

Thanks, it does improve the debugging~!!!~.

However, it still cannot hit every breakpoints.

Here is my CUDA Debugger setting, anything else that might be helpful?

Data Stack minimum size 0

Data stack size multiplier 1

Enable backtrace: True

Enable kernel launch debugging: True

Override local debuging checks: False

Stepper freezes non-focuses warps: True

Stop on launch Failure: True

Unconditional Breakpoins follow focus: False (Default is true, I changed it)

Thanks

I’m not sure how to run Nsight. I only picked up this thread, because I’m the maintainer of the FindCUDA code in CMake. It might be useful to start a new thread that is more targeted at the GUI settings for what you want, so it doesn’t get lost in this thread. There might also be additional information in the Nsight documentation that could help you with getting break points just where you want them.

Ha! So you are a CMake guy~!! Fantastic. I haven’t found any good articles about CMake setting with CUDA. I attached my CMakeLists.txt here. Do you mind to see if it is correct? To be honest, I am not an expert with CMake either, I noticed many people talking about the FindCUDA.cmake in MyGForge > Projects > FindCUDA > SVN > Browse repository

However, I don’t quite understand, since the Find_PACAKGE(CUDA) seems to work perfectly good for me. Why bother using that FindCuda.cmake? Is there any difference?

PROJECT(CUDA_VideoProc)

CMAKE_MINIMUM_REQUIRED(VERSION 2.8)

FIND_PACKAGE(CUDA)

LINK_DIRECTORIES(${CUDA_SDK_ROOT_DIR}/common/lib/Win32)

LINK_DIRECTORIES(${CUDA_SDK_ROOT_DIR}/common/lib)

LINK_DIRECTORIES(${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)

LINK_DIRECTORIES(${CUDA_SDK_ROOT_DIR}/../shared/lib/Win32)

INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR})

INCLUDE_DIRECTORIES(${CUDA_SDK_ROOT_DIR}/common/inc)

INCLUDE_DIRECTORIES(${CUDA_TOOLKIT_ROOT_DIR}/include)

INCLUDE_DIRECTORIES(${CUDA_SDK_ROOT_DIR}/../shared/inc)

if(MSVC) 

#We statically link to reduce dependancies 

foreach(flag_var CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO) 

    if(${flag_var} MATCHES "/MD") 

        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}") 

    endif(${flag_var} MATCHES "/MD") 

    if(${flag_var} MATCHES "/MDd") 

        string(REGEX REPLACE "/MDd" "/MTd" ${flag_var} "${${flag_var}}") 

    endif(${flag_var} MATCHES "/MDd") 

endforeach(flag_var) 

endif(MSVC)

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch=sm_21)

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -ftz=true)

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -prec-div=false)

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -prec-sqrt=false)

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -G0)

CUDA_ADD_EXECUTABLE(CUDA_VideoProc xxx.cpp)

TARGET_LINK_LIBRARIES(CUDA_VideoProc cudart cutil32D shrUtils32D glew32 rendercheckgl32D vfw32.lib)

Earlier, CMAKE did not support CUDA Natively. So Abe Stephens wrote one. Thats what many people were referring to.
Later, it was integrated (or somebody else wrote it freshly for CMAKE) with CMAKE.
So FIND_PACKAGE() works now with CMake

Sarnath is correct. CMake didn’t originally support CUDA, then Abe Stephens wrote support based on a swig module we wrote for the Manta interactive ray tracer. Since coming to NVIDIA I picked up where Abe left off, added a bunch more features and had it officially integrated into the CMake distribution where I maintain it. The one hosted at SCI is generally up to date and contains the primary test bed for my development.

As far as your CMakeLists.txt file, if you use cuda_add_executable, you shouldn’t need to add include or library paths. The cuda_add_executable adds these automatically. Most of the documentation lives in the FindCUDA.cmake script found in the distribution and can also be displayed with cmake --help-full.

# the cuda flags are lists and can be appended

list(APPEND CUDA_NVCC_FLAGS -arch=sm_21 -ftz=true -prec-div=false -prec-sqrt=false)

# the cuda flags also support configuration specific flags such as this debug flag

list(APPEND CUDA_NVCC_FLAGS_DEBUG -G0)

CUDA_ADD_EXECUTABLE(CUDA_VideoProc xxx.cpp)

TARGET_LINK_LIBRARIES(CUDA_VideoProc

  debug cutil32D    optimized cutil32

  debug shrUtils32D optimized shrUtils32

  glew32 

  debug rendercheckgl32D optimized rendercheckgl32

  vfw32.lib

  )

Thank you so much~~!!

Thank you so much for your effort to put CMAKE into CUDA~!!! I can’t imagine without it.