cufft callbacks not working with CUDA 8.0.44 with SM 37

Compiling sample: simpleCUFFT_callback as follows
make SMS=37
results in the following warning:
nvlink warning : Function ‘Z27ComplexPointwiseMulAndScalePvmS_S’ has address taken but no possible call to it

Attempting to run the code results in the following output:
GPU Device 0: “Tesla K80” with compute capability 3.7

Transforming signal cufftExecC2C
Transforming signal back cufftExecC2C
CUDA error at code=6(CUFFT_EXEC_FAILED) “cufftExecC2C(cb_plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_INVERSE)”

Compiling for SM 35 works as expected, however some of my other kernels would benefit from the extra registers SM 37 allows, so I’d really like to be able to compile my application (which has the identical issue as the sample cuFFT callback mentioned above) for SM 37.

Same sample compiles just fine on CUDA 7.5.

OS: RHEL: 3.10.0-327.el7.x86_64
CUDA 8.0.44

I tried reporting this as a bug, but I keep getting an error, so trying the forums.

Any thoughts? Known bug? Workaround? (besides using SM 35)

The interface to the NVIDIA bug reporting system is quirky. As I recall from prior interactions, it has a “spam” detection mechanism that can fail a submission with an incomprehensible error message. You may want to try submitting in stages: First, enter a simple textual bug description, avoiding any words that might make it look like “spam” (e.g. needless superlatives, double exclamation marks; mention of “drug”, “weightloss”, etc) or “vandalism” (e.g. swear words). In subsequent steps, add a more extensive description, supporting code, screen shots etc as needed.

You may want to examine whether CUFFT comes with specific code for sm_37. If that is not the case, this may be the source of your issue: You cannot match up a callback compiled for sm_37 with the sm_35 code in the library. Note: This is a hypothesis only!

I’ve filed a bug: 1846317

Thanks for filing the bug report. I’m guessing it will be some time before anything comes of it so I’m thinking about possible workarounds. As mentioned I can just compile for SM 35, but some of my other kernels would benefit from 37. Is it possible to mix and match whereby I could compile my kernels for 37, but the cufft (including the callbacks) would use 35?
Another possibility would be to go back to CUDA 7.5, but there are downsides to that as well.

Also, I know NVidia generally doesn’t share information about bug reports, but now that I have a bug report number, is there any way to track its status?