I’ve been seeing errors in my (quite complex) commercial CUDA app that pop up at random times - usually invalid argument from cudaMemset
. I’ve been working on this bug for a few weeks now. I’ve gotten to the point where now I’m using cudaPointerGetAttributes
to test all my GPU buffers after every CUDA call (as well as synchronizing before & after each), and I’ve narrowed it down to cufftDestroy()
. Before that call, all buffers are valid; after that call, for all my buffers, cudaPointerGetAttributes
returns success, but returns a devicePointer address of 0
! After that point, any cudaMemset
will fail on those buffers.
This only happens on CUDA 11.2 and later, apparently. And I think only on Windows.
I’m definitely passing a valid plan ID to cufftDestroy
(it’s 1
, as returned from cufftPlan2D
), and cufftDestroy
returns success as does cufftPlan2D
.
I’m pretty sure I have no host stack/heap corruption; I’m using a very careful host allocator with bounds checking and the (very large) app is otherwise behaving properly. Also, cuda-memcheck doesn’t report any issues.
I’ve written a small reproducer that mimics the order and size of CUDA mallocs/frees and cufft calls, but of course everything works fine there… Are there any known issues with cufftDestroy
? Is there any possible way for it to trash the CUDA heap in some unusual circumstance (presumably based on some odd situation I’ve created)? Wish I could peek into its source.