Hi
I’m trying to use the cuda memory pool with cudaMallocAsync and cudaFreeAsync. I tried a very simple example to see how it works, and it seems to work correctly, though I get a lot of errors when running with cuda-memcheck. My code looks something like:
uint8_t* devPtr;
cudaMallocAsync(&devPtr, 1, stream);
cudaMemsetAsync(devPtr, 0, 1, stream);
cudaMemcpyAsync(hostPtr, devPtr, 1, cudaMemcpyDefault, stream);
cudaFreeAsync(devPtr, stream);
When I run this through cuda-memcheck I get errors like:
========= Host API memory access error at host access to 0x302000600 of size 1 bytes
========= Invalid start allocation on access by cudaMemcopy source.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/lib/x86_64-linux-gnu/libcuda.so (cuMemcpyAsync + 0x209) [0x225f09]
Is there a known incompatibility between cuda-memcheck and the cuda memory pool? Or am I missing something when using the async malloc/free?
Edit: I just tested the example in cuda-samples/Samples/streamOrderedAllocation/
, and the exact same thing happens there. Lots of errors when running with cuda-memcheck.
Thanks
Yannick