I’m running cufftExecC2R under compute-sanitizer --tool memcheck.
I’m getting the following error types:
======== Program hit named symbol not found (error 500) on CUDA API call to cuModuleGetFunction.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x242a49]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame: [0x22915c]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x294940]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x237d6b]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x238524]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x22d545]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x22ed41]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2359fd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x23dbaa]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x23e426]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2335a9]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2337c0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x9a000]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x9a4cc]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x92f42]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x912a6]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftXtMakePlanMany [0xa45d0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftMakePlanMany64 [0xa52dd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftMakePlanMany [0xa1cff]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftPlanMany [0xa2b62]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftPlan2d [0xa2bf3]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
And then:
Invalid global read of size 8 bytes
========= at 0x710 in void prime_fft<(unsigned int)101, (unsigned int)2, (unsigned int)8, (unsigned int)4, (unsigned int)1, (padding_t)0, (twiddle_t)0, (loadstore_modifier_t)2, (layout_t)1, unsigned int, float>(kernel_arguments_t)
========= by thread (5,1,0) in block (8803,0,0)
========= Address 0x7fda6fe588e8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x20d4ea]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame: [0x2f32bd]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x2399b0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x247096]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x24740d]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa783a]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa7a0a]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0xa721c]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x92042]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame: [0x921b0]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
========= Host Frame:cufftExecC2R [0xa27ee]
========= in /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
I only get the first error a small number of times, whereas the second error I see many times with different thread numbers.
I’m running cuda 11.4.
Is the prime factor 101 a problem, or is it something else? If I run periodicity 4096 * 4096, I get no error message and the results are correct. But with periodicity 3838 * 3710, I get the above behavior.