Debugging CUDA More Efficiently with NVIDIA Compute Sanitizer

Originally published at: https://developer.nvidia.com/blog/debugging-cuda-more-efficiently-with-nvidia-compute-sanitizer/

Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can introduce new dimensions to the already complex debugging process. There are various tools and techniques available to developers to help make debugging simpler and more efficient. This post looks at one such…

Great news! I look forward to using this. Does this also now work with address sanitizer and thread sanitizer?

No, the compute sanitizer tools use binary patching at runtime, so they work independently from compiler-assisted tools such as asan or tsan.

This looks very useful! Do you know if these tools will detect problems within user Cuda code which are compiled into larger OptiX kernels? Thanks.

Yes, the compute-sanitizer tools support OptiX applications since CUDA 11.6.

See Compute Sanitizer User Manual :: Compute Sanitizer Documentation for more information.

@achartiernv cannot get cudaError when I run the memory leak example, does I miss anything? I am using cuda 12.3 on windows 11, here are my output:

$ compute-sanitizer --tool memcheck --leak-check=full .\build\bin\Debug\memory_check.exe
========= COMPUTE-SANITIZER
Before: Array 0, 1 .. N-1: 1.000000 1.000000 1.000000
After : Array 0, 1 .. N-1: 3.000000 3.000000 3.000000
========= Leaked 4,092 bytes at 0x900000000
=========     Saved host backtrace up to driver entry point at allocation time
=========     Host Frame:cuMemHostGetFlags [0x7ffce725dbe4]
=========                in C:\WINDOWS\system32\DriverStore\FileRepository\nvmiui.inf_amd64_f6620f3a4d623ccc\nvcuda64.dll
=========     Host Frame:cudart::driverHelper::mallocManagedPtr [0x73228]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:cudart::cudaApiMallocManaged [0x4154f]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:cudaMallocManaged [0x1fe05]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:main in E:\code-repos\cuda-examples\src\sanitizer\memory_check.cu:15 [0x8059]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:invoke_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79 [0x7e189]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:__scrt_common_main_seh in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 [0x7e06e]    
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:__scrt_common_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331 [0x7df2e]        
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:mainCRTStartup in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17 [0x7e21e]
=========                in E:\code-repos\cuda-examples\build\bin\Debug\memory_check.exe
=========     Host Frame:BaseThreadInitThunk [0x7ffdb45f257d]
=========                in C:\WINDOWS\System32\KERNEL32.DLL
=========     Host Frame:RtlUserThreadStart [0x7ffdb536aa58]
=========                in C:\WINDOWS\SYSTEM32\ntdll.dll
=========
========= LEAK SUMMARY: 4092 bytes leaked in 1 allocations
========= ERROR SUMMARY: 1 error

cudaError is the return type of the cudaMallocManaged function. The example output of the post was captured on Linux. In that output the cudaMallocManaged frame contains full type information, including the return, template and function parameters types.
However, depending on the platform, compiler versions, flags, and tool version, the availability of these debug information and the tool ability to display them may vary. On Windows, the tool will not display full type information for that frame, it is not an anomaly on your side.

Get it. Thanks for your explanation!