cuFFT Callbacks in Shared Libraries

Is it possible to have cuFFT callback routines in two or more shared libraries? For example, I have a basic project where I just FFT input data, then scale and IFFT it back. I am following the steps in this example https://devblogs.nvidia.com/cuda-pro-tip-use-cufft-callbacks-custom-data-processing/

I have one class that computes a padded FFT using a load routine and compiles into a shared library libZeroPaddedFFT.so

Another class computes an IFFT and scales by the FFT length using a store routine. This class is compiled into shared library libScaledFFT.so

I have tested each library and know they both work exactly how I want them to individually. My problem is when I try to link both libraries into one main.cpp. I am receiving the runtime error

CUDA CALL failed at scaledFFT.cu:55 : invalid device symbol

which corresponds to this line, where I am copying the callback function pointer from the device to the host

CUDA_CALL(cudaMemcpyFromSymbol(&hostCallbackPtr, devCallbackPtr, sizeof(hostCallbackPtr)));

Also for reference, here are the commands for how I am building my libZeroPaddedFFT.so

/opt/cuda-9.0/bin/nvcc -ccbin g++ -Xcompiler "-O3 -DNDEBUG -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-long-long -fno-strict-aliasing -Wno-variadic-macros -Wno-missing-field-initializers -fPIC" -I./ -I/opt/cuda-9.0/include -I/opt/cuda-9.0/samples/common/inc -m64 -dc -gencode arch=compute_61,code=sm_61 zeroPaddedFFT.cu -o zeroPaddedFFT.o
/opt/cuda-9.0/bin/nvcc -ccbin g++ -Xcompiler "-O3 -DNDEBUG -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-long-long -fno-strict-aliasing -Wno-variadic-macros -Wno-missing-field-initializers -fPIC" -I./ -I/opt/cuda-9.0/include -I/opt/cuda-9.0/samples/common/inc -m64 -dlink -gencode arch=compute_61,code=sm_61 zeroPaddedFFT.o -o zeroPadLink.o -L/opt/cuda-9.0/lib64 -L. -lcufft_static -lculibos
gcc -o libZeroPaddedFFT.so -rdynamic -shared -Wl,-rpath=/opt/cuda9.0/lib64 zeroPaddedFFT.o zeroPadLink.o -L/opt/cuda-9.0/lib64 -L. -lcufft_static -lculibos

Then to compile my main.cpp,

g++ -I./ -I/opt/cuda-9.0/include -I/opt/cuda-9.0/samples/common/inc -m64 -o callbackTest main.cpp -L/opt/cuda-9.0/lib64 -L. -lcudart -lZeroPaddedFFT -lScaledFFT

Again, I know it can’t be a syntax error because both of my libraries work individually. Can anyone explain to me why this happens or a potential work around? Thank you in advance!

I built a test case using your description. The failure is not identical to yours, but it fails similarly, nonetheless. I was using CUDA 9.1

I’ve filed an internal bug to look at it. Not sure when I’ll have anything else to report.

No immediate suggestions for workaround, other than to put all of your CUFFT functions requiring callbacks into a single shared library, which you could probably imagine yourself without me suggesting it.

Reading through the documentation here:
https://docs.nvidia.com/cuda/cufft/index.html#callback-routines states that only static linking is supported. Even if I were to put all cuFFT callbacks into a single shared library as a workaround, would it be officially supported? Or am I already using callbacks in a way that is not currently intended.

That’s correct. But you are using static linking already:

/opt/cuda-9.0/bin/nvcc -ccbin g++ -Xcompiler "-O3 -DNDEBUG -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-long-long -fno-strict-aliasing -Wno-variadic-macros -Wno-missing-field-initializers -fPIC" -I./ -I/opt/cuda-9.0/include -I/opt/cuda-9.0/samples/common/inc -m64 -dlink -gencode arch=compute_61,code=sm_61 zeroPaddedFFT.o -o zeroPadLink.o -L/opt/cuda-9.0/lib64 -L. -lcufft_static -lculibos

note the -lcufft_static

Right now you are creating 2 shared object libraries (.so) each has 1 completely statically linked cufft call. Since you haven’t provided a complete example, I can’t be super-specific with names, etc. but one of your libraries is called libZeroPaddedFFT.so and one is called libScaledFFT.so

Let’s call libZeroPaddedFFT.so library1 and libScaledFFT.so library2

In library1 you have a function, which calls a CUFFT transform, and specifies a load callback, or a store callback, or both. Let’s call it function1, transform1, loadcallback1, and storecallback1. Within the confines of library1, and function1, all of this stuff is statically linked.

We can make all the same statements about library2: it has function2, transform2, loadcallback2, and storecallback2. Within the confines of library2, and function2, all of this is statically linked.

What I’m suggesting is that you dispense with those, and instead create a library3 which has two functions in it: function1 and function2. In addition, library 3 will contain the calls to transform1 and transform2 (from function1 and function2, respectively) as well as the callbacks for each. All of this stuff will be statically linked together in library3.

You would then link your main.cpp against library3 only, but could call function1 and function2 from main, without this issue biting you.

That should work. This wouldn’t have any different “support” scenario than what you are doing now.