cuda-memcheck in CUDA 10.0 gets stuck with CPU-only applications

Hi,

I have a CI job that runs cuda-memcheck on all the unit tests in the repo. Some contain CPU code only and some contain both CPU+GPU (CUDA) code. The reason for testing all files is to simplify the testing environment.

This was working perfectly fine on CUDA 9.0, Ubuntu 18.04. However after upgrading to CUDA 10.0, cuda-memcheck randomly gets stuck while running the application. Sometimes it fails at the first attempt; sometimes it takes 2-3 attempts before it gets stuck. I need to then stop the application with Ctrl-C. I’m running on driver 410.72.

This can be reproduced with the following code:

#include <cuda_runtime.h>
#include <cassert>

int main()
{
    void* ptr;
    //cudaError_t status = cudaMalloc(&ptr, 1000U);
    //assert(status == cudaSuccess);
    return 0;
}

This code is compiled as follows:

g++ -I /usr/local/cuda/include -L /usr/local/cuda/lib64/ test.cpp -lcudart

Then calling cuda-memcheck gets stuck when calling it like this:

cuda-memcheck --leak-check full a.out

However when the cudaMalloc line is uncommented out, cuda-memcheck never hangs as expected.
On the other hand, it always reports “0 errors” which seems wrong since I am allocating memory with cudaMalloc and the application is exiting without freeing that memory.

How can the problem be solved?

Thanks!

We didn’t repro the CPU only code hangs you mention.

Regarding the leak check, please see [url]https://docs.nvidia.com/cuda/cuda-memcheck/index.html#leak-checking[/url].

For an accurate leak checking summary to be generated, the application’s CUDA context must be destroyed at the end. This can be done explicitly by calling cuCtxDestroy() in applications using the CUDA driver API, or by calling cudaDeviceReset() in applications programmed against the CUDA run time API.

Adding cudaDeviceReset() to your app will provide the expected error.

Thanks!

Adding the cudaDeviceReset call fixed both issues. Which is understandable since finally the code contains some call to the CUDA API.

I still believe that you should have a look at the hang, we could reproduce it on 3 different machines. The easiest way is to execute it in an infinite loop; at some point it simply gets stuck. Sometimes it takes 1 run sometimes it takes 20-30 runs, but it always ends up getting stuck:

while true; do cuda-memcheck --leak-check full a.out ; done

The version of cuda-memcheck is:

CUDA-MEMCHECK version 10.0.130 ID:(46)

strace shows that the process is stuck here:

strace: Process 8004 attached
restart_syscall(<... resuming interrupted poll ...>

Please let me know if I can run other commands to investigate it further.

Perhaps this is not how cuda-memcheck is supposed to be used, but in that case it would be good to document it.

Follow-up question: will cudaDeviceReset affect other processes running CUDA? Say I have 2 unit tests running in parallel. Will calling that function on one unit test affect the second unit test?