Cufft JIT LTO Store callback reporting internal driver error

I’m having issues with JIT LTO store callbacks. I was able to get load callbacks to work just fine, but my store callback is giving a CUFFT_INTERNAL_ERROR. I’m running on an A5000 in WSL (but I also tried on an L4 on a cloud instance.) I’ve looked over the code and compared to the documentation and can’t seem to find anything wrong with how I’m implementing it.
I’ve uploaded a gzip of the entire test case (first time trying that so I’m not sure it’ll work) but also including the key aspects of the code in text form:

The callback file contains:
__device___ void CB_IFFT_Store(
void *dataOut, unsigned long long offset, cufftComplex element,
void *callerInfo, void *sharedPointer)
{
((float2 *)dataOut)[offset].x = element.x / 512.0f;
((float2 *)dataOut)[offset].y = element.y / 512.0f;

} // CB_IFFT_Store()

The ifft.cu file contains:
cufftStatus = cufftCreate(&plan);
CheckCufftStatus(cufftStatus, “cufftCreate()”);

cufftStatus = cufftXtSetJITCallback(plan, “CB_IFFTStore”, (void*)CB_IFFTStore, sizeof(CB_IFFTStore), CUFFT_CB_ST_COMPLEX, (void**)nullptr);
CheckCufftStatus(cufftStatus, “cufftXtSetJITCallback()”);

cufftStatus = cufftMakePlan1d(plan, 512, CUFFT_C2C, 1, &workSize);
CheckCufftStatus(cufftStatus, “cufftMakePlan1d()”);

And the Makefile contains:
all : ifft

ifftCallbackStore.fatbin: ifftCallbackStore.cu
nvcc --generate-code arch=compute_86,code=lto_86 -dc -fatbin $< -o $@

ifftCallbackStore_fatbin.cuh: ifftCallbackStore.fatbin
bin2c --name CB_IFFTStore --type longlong $< > $@

ifft.o: ifft.cu ifftCallbackStore_fatbin.cuh
nvcc --generate-line-info -rdc=false -Wno-deprecated-gpu-targets --std=c++14 -O3 -g -I /usr/local/cuda/include -c $< -o $@

ifft: ifft.o
nvcc -Wno-deprecated-gpu-targets -O3 -L /usr/local/cuda/lib64 $^ -o $@ -lcufft -lcudart

Specifically, the issue is the cufftMakePlan1d fails with: “CUFFT_INTERNAL_ERROR: An internal driver error occurred.” I’m running WSL on Windows 11 with Ubuntu 24.04.2 LTS. My CUDA Driver Version: 573.40. Load callbacks work just fine (after adjusting the callback prototype and the CUFFT_CB_ST_COMPLEX flag accordingly.) As mentioned, I also tried this on an L4 in a cloud instance, so I don’t think it’s a WSL related issue. Any help would be much appreciated.

ifftStoreCallback.tar.gz (1.8 KB)

Wow, I feel dumb. CB_IFFT_Store != CB_IFFTStore. So lessons:

  1. Camelcase is your friend, not sure why I’ve used it everywhere for my entire career and decided to use underbars in this one case
  2. grep is also your friend. I could have sworn I did grep for the function name at one point, but I clearly missed it.
  3. Not so much a lesson for me, but it would be nice if cufft did a better job of indicating what the error is. It just couldn’t find the callback because it didn’t exist. I’m not sure if there is anyway this type of error checking could be added to the compilation, but that would be ideal.