cuda-memcheck in CUDA 10.0 gets stuck with CPU-only applications

carlosgalvezp · November 1, 2018, 2:48pm

Hi,

I have a CI job that runs cuda-memcheck on all the unit tests in the repo. Some contain CPU code only and some contain both CPU+GPU (CUDA) code. The reason for testing all files is to simplify the testing environment.

This was working perfectly fine on CUDA 9.0, Ubuntu 18.04. However after upgrading to CUDA 10.0, cuda-memcheck randomly gets stuck while running the application. Sometimes it fails at the first attempt; sometimes it takes 2-3 attempts before it gets stuck. I need to then stop the application with Ctrl-C. I’m running on driver 410.72.

This can be reproduced with the following code:

#include <cuda_runtime.h>
#include <cassert>

int main()
{
    void* ptr;
    //cudaError_t status = cudaMalloc(&ptr, 1000U);
    //assert(status == cudaSuccess);
    return 0;
}

This code is compiled as follows:

g++ -I /usr/local/cuda/include -L /usr/local/cuda/lib64/ test.cpp -lcudart

Then calling cuda-memcheck gets stuck when calling it like this:

cuda-memcheck --leak-check full a.out

However when the cudaMalloc line is uncommented out, cuda-memcheck never hangs as expected.
On the other hand, it always reports “0 errors” which seems wrong since I am allocating memory with cudaMalloc and the application is exiting without freeing that memory.

How can the problem be solved?

Thanks!

rbischof · November 1, 2018, 11:15pm

We didn’t repro the CPU only code hangs you mention.

Regarding the leak check, please see [url]https://docs.nvidia.com/cuda/cuda-memcheck/index.html#leak-checking[/url].

For an accurate leak checking summary to be generated, the application’s CUDA context must be destroyed at the end. This can be done explicitly by calling cuCtxDestroy() in applications using the CUDA driver API, or by calling cudaDeviceReset() in applications programmed against the CUDA run time API.

Adding cudaDeviceReset() to your app will provide the expected error.

carlosgalvezp · November 2, 2018, 7:37am

Thanks!

Adding the cudaDeviceReset call fixed both issues. Which is understandable since finally the code contains some call to the CUDA API.

I still believe that you should have a look at the hang, we could reproduce it on 3 different machines. The easiest way is to execute it in an infinite loop; at some point it simply gets stuck. Sometimes it takes 1 run sometimes it takes 20-30 runs, but it always ends up getting stuck:

while true; do cuda-memcheck --leak-check full a.out ; done

The version of cuda-memcheck is:

CUDA-MEMCHECK version 10.0.130 ID:(46)

strace shows that the process is stuck here:

strace: Process 8004 attached
restart_syscall(<... resuming interrupted poll ...>

Please let me know if I can run other commands to investigate it further.

Perhaps this is not how cuda-memcheck is supposed to be used, but in that case it would be good to document it.

Follow-up question: will cudaDeviceReset affect other processes running CUDA? Say I have 2 unit tests running in parallel. Will calling that function on one unit test affect the second unit test?

Topic		Replies	Views
Cuda-memcheck seems to ignore memory leaks CUDA-MEMCHECK tools	1	1414	September 27, 2021
cuda-memcheck : windows + cublas CUDA-MEMCHECK	2	1325	November 12, 2018
Cuda memcheck tool not detecting device memory leaks Compute Sanitizer cuda	6	855	March 8, 2024
Potential Bug, cuda-memcheck can someone verify? Program crashing on GPU initialisation with cuda-me CUDA Programming and Performance	11	3584	April 24, 2020
initcheck and malloc/free don't get along CUDA Programming and Performance	1	697	May 19, 2016
'cuda-memcheck' works on one machine, does not work on some others CUDA Setup and Installation	0	634	May 8, 2018
Can cuda-memcheck disturb multi-threaded multi-gpu CUDA applications' synchronization structure? CUDA Programming and Performance	6	817	March 20, 2018
cuda-memcheck hangs the whole system CUDA Programming and Performance	14	4602	December 31, 2015
Problem with MEMCHECK CUDA Programming and Performance	0	496	May 25, 2013
CUDA Memcheck Initialization failed (not profiler error) CUDA-MEMCHECK	2	2297	September 16, 2019

cuda-memcheck in CUDA 10.0 gets stuck with CPU-only applications

Related topics