cudaMemPool and cuda-memcheck

Hi

I’m trying to use the cuda memory pool with cudaMallocAsync and cudaFreeAsync. I tried a very simple example to see how it works, and it seems to work correctly, though I get a lot of errors when running with cuda-memcheck. My code looks something like:

    uint8_t* devPtr;
    cudaMallocAsync(&devPtr, 1, stream);
    cudaMemsetAsync(devPtr, 0, 1, stream);
    cudaMemcpyAsync(hostPtr, devPtr, 1, cudaMemcpyDefault, stream);
    cudaFreeAsync(devPtr, stream);

When I run this through cuda-memcheck I get errors like:

========= Host API memory access error at host access to 0x302000600 of size 1 bytes
=========     Invalid start allocation on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so (cuMemcpyAsync + 0x209) [0x225f09]

Is there a known incompatibility between cuda-memcheck and the cuda memory pool? Or am I missing something when using the async malloc/free?

Edit: I just tested the example in cuda-samples/Samples/streamOrderedAllocation/, and the exact same thing happens there. Lots of errors when running with cuda-memcheck.

Thanks
Yannick

Quoting from the cuda-memcheck docs.
“CUDA-MEMCHECK is deprecated and will be removed in a future release of the CUDA toolkit. Please use the compute-sanitizer as a drop-in replacement.”

When I use cuda-memcheck with code that uses cuda memory pools, the following warning is shown

========= CUDA-MEMCHECK
========= This tool is deprecated and will be removed in a future release of the CUDA toolkit
========= Please use the compute-sanitizer tool as a drop-in replacement
========= Internal Memcheck Warning: Detected use of unsupported CUDA memory pools. Please use the compute-sanitizer tool instead.
=========
========= ERROR SUMMARY: 0 errors

So, the simple answer is don’t use cuda-memcheck with memory pools.

2 Likes

Ok, I feel rather stupid now, cuda-memcheck even prints out that memory pools are unsupported… But I didn’t see that message because there were just so many errors.

Running the same test with compute-sanitizer gives no errors, as expected. Thanks a lot!