cuFFT LTO callback not working (C2C)

I’ve been trying to implement Fourier-based convolution using the load callback of the IFFT to perform the kernel multiplication, but the callback doesn’t seem to be doing anything.

Specifically, I’m trying to use the ‘new’ LTO callback functionality that’s now available in Windows using the dynamic cufft library and for C2C transform.

I tried with both the 12.9 and 13.0 toolkits and compiling the kernel both offline (with nvcc/bin2c) and using nvrtc. I don’t get an error when creating the plan after calling cufftXtSetJITCallback(), so I think I’m doing things correctly (in fact, I can get it to return an error when I pass the wrong pointer or the wrong callbackSymbolName on purpose).

What I observe is that the IFFT is indeed performed, but without the IFFT scaling and not doing anything in the callback, so clearly something is different from the normal non-callback IFFT. Even a simple callback that sets the output to 0+i0 always produces the same result (IFFT without scaling), so it seems like cuFFT simply ignores my callback and bypasses the IFFT scaling.

I’m building with Visual Studio 2022 (C++ 14), targeting the Ampere architecture (RTX A5500 mobile). I tried both Debug and Release build, thinking maybe Debug wouldn’t work because the callback is generated with optimizations that may be incompatible with Debug mode, but I get the same results either way.

A few questions come to mind:

  • Are LTO callbacks available for C2C transforms, or just R2C and C2R?
  • Are LTO callbacks expected to work in Debug mode?