Cufft bug with prime factor 101?

I’m running cufftExecC2R under compute-sanitizer --tool memcheck.
I’m getting the following error types:

======== Program hit named symbol not found (error 500) on CUDA API call to cuModuleGetFunction.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x242a49]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame: [0x22915c]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x294940]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x237d6b]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x238524]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x22d545]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x22ed41]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2359fd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x23dbaa]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x23e426]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2335a9]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2337c0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x9a000]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x9a4cc]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x92f42]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x912a6]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftXtMakePlanMany [0xa45d0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftMakePlanMany64 [0xa52dd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftMakePlanMany [0xa1cff]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftPlanMany [0xa2b62]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftPlan2d [0xa2bf3]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10

And then:
Invalid global read of size 8 bytes
========= at 0x710 in void prime_fft<(unsigned int)101, (unsigned int)2, (unsigned int)8, (unsigned int)4, (unsigned int)1, (padding_t)0, (twiddle_t)0, (loadstore_modifier_t)2, (layout_t)1, unsigned int, float>(kernel_arguments_t)
========= by thread (5,1,0) in block (8803,0,0)
========= Address 0x7fda6fe588e8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x20d4ea]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame: [0x2f32bd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2399b0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x247096]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x24740d]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa783a]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa7a0a]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa721c]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x92042]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x921b0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftExecC2R [0xa27ee]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10

I only get the first error a small number of times, whereas the second error I see many times with different thread numbers.

I’m running cuda 11.4.
Is the prime factor 101 a problem, or is it something else? If I run periodicity 4096 * 4096, I get no error message and the results are correct. But with periodicity 3838 * 3710, I get the above behavior.

  1. I recommend testing against the latest CUDA version. Bugs get fixed all the time.
  2. I recommend providing a short, complete test case that demonstrates the issue.

Not sure if this is helpful, but checking the cufftResult cr value for CUFFT_SUCCESS works (it’s “successful”), both for cufftPlan2d and also cufftExecC2R and R2C. However, the memcheck error (from compute-sanitizer) above only appears for C2R.

Also, it’s in-place Fourier.