What specifically is deprecated about cuFFT callbacks in CUDA 11.4?

The release notes for CUDA 11.4 state:

Support for callback functionality using separately compiled device code is deprecated on all GPU architectures. Callback functionality will continue to be supported for all GPU architectures.

It’s unclear what this means exactly. I have used callback functionality since it was introduced to cuFFT, and my understanding was that it has always required separate compilation, because using callbacks requires linking against the cuFFT static library, and linking with the static library requires using separate compilation, as stated in the cuFFT documentation here:

Whereas to compile against the static cuFFT library, extra steps need to be taken. The library needs to be device linked. It may happen during building and linking of a simple program, or as a separate step. The entire process is described in Using Separarate Compilation in CUDA.

and

The cuFFT static library supports user supplied callback routines. The callback routines are CUDA device code, and must be separately compiled with NVCC and linked with the cuFFT library. Please refer to the NVCC documentation regarding separate compilation for details. If you specify an SM when compiling your callback functions, you must specify one of the SM’s cuFFT includes.

Can someone clarify what specifically has been deprecated, and what the prescribed method is for compiling/linking with cuFFT when using callback functionality going forward?

We are revamping callbacks to add flexibility and performance. We are not expecting many, if any changes, to legacy code. We will have more details in the future, closer to release.

Hi jasonriek5l,

What hardware are you currently using with callbacks?

Our software is deployed with many different GPUs, so I might be missing some, but off the top of my head:

  • GTX 1080
  • Quadro P2000
  • Titan X (Pascal)
  • Titan V
  • V100/V100S
  • Titan RTX
  • Quadro RTX 4000
  • Quadro RTX 5000
  • A100
  • Jetson AGX Xavier

Basically, we are currently using some models from every generation since Pascal at this point.

Are there any updates on this?

Or on the cufft release 1.8 known issue:

  • Performance of cuFFT callback functionality was changed across all plan types and FFT sizes. Performance of a small set of cases regressed up to 0.5x, while most of the cases didn’t change performance significantly, or improved up to 2x. In addition to these performance changes, using cuFFT callbacks for loading data in out-of-place transforms might exhibit performance and memory footprint overhead for all cuFFT plan types and FFT sizes. An upcoming release will update the cuFFT callback implementation, removing the overheads and performance drops. cuFFT deprecated callback functionality based on separate compiled device code in cuFFT 11.4.