Possible cufftDestroy bug, and/or bug in my app?

I’ve been seeing errors in my (quite complex) commercial CUDA app that pop up at random times - usually invalid argument from cudaMemset. I’ve been working on this bug for a few weeks now. I’ve gotten to the point where now I’m using cudaPointerGetAttributes to test all my GPU buffers after every CUDA call (as well as synchronizing before & after each), and I’ve narrowed it down to cufftDestroy() . Before that call, all buffers are valid; after that call, for all my buffers, cudaPointerGetAttributes returns success, but returns a devicePointer address of 0 ! After that point, any cudaMemset will fail on those buffers.

This only happens on CUDA 11.2 and later, apparently. And I think only on Windows.
I’m definitely passing a valid plan ID to cufftDestroy (it’s 1 , as returned from cufftPlan2D ), and cufftDestroy returns success as does cufftPlan2D .
I’m pretty sure I have no host stack/heap corruption; I’m using a very careful host allocator with bounds checking and the (very large) app is otherwise behaving properly. Also, cuda-memcheck doesn’t report any issues.

I’ve written a small reproducer that mimics the order and size of CUDA mallocs/frees and cufft calls, but of course everything works fine there… Are there any known issues with cufftDestroy? Is there any possible way for it to trash the CUDA heap in some unusual circumstance (presumably based on some odd situation I’ve created)? Wish I could peek into its source.

some additional info in this thread

I believe I understand this now: cufftDestroy can switch CUDA contexts if it thinks the app could be multi-GPU, i.e. if the app has used CUDA driver calls to push/pop state or potentially made any other driver API call. This is a bug in cufft, and has been reported & acknowledged.