"has address taken but no possible call to it"

Hello,

I’m getting this information when using callback in cufft just as in the examples. I could not find any meaning of the error message in the documentations. What exactly does it mean?

The relavate code is

__device__ cufftComplex CallbackZeroPad(void* dataIn, size_t offset, void *callerInfo, void *sharedPtr) {
        int N = *((int*)callerInfo);
        if( offset < N )
                return ((cufftComplex*)dataIn)[offset];
        else
                return make_cuFloatComplex(0,0);
}
__device__ cufftCallbackLoadC CallbackZeroPadSymbol = CallbackZeroPad;

and

cufftCallbackLoadC h_loadCallbackPtr;
CHECK_RESULT( cudaMemcpyFromSymbol( &h_loadCallbackPtr, CallbackZeroPadSymbol, sizeof(h_loadCallbackPtr)) );
CHECK_LAST_ERROR();
CHECK_RESULT( cufftXtSetCallback( plan, reinterpret_cast<void**>(&h_loadCallbackPtr),    CUFFT_CB_LD_COMPLEX, caller_info ) );
CHECK_LAST_ERROR();

When I ran the code, the last “CHECK_LAST_ERROR” reports cuda error (8) ‘invalid device function’

Cheers
Arne

Check the GPU architecture you are building for: Is it the same as the GPU architecture of your GPU? Also (not sure whether this is still true, or applies to your code) objects passed to CUDA kernels cannot have virtual functions.

Hi,

I’m compiling as follows:

nvcc -O3 --use_fast_math -std=c++11 --compile --relocatable-device-code=true -gencode arch=compute_52,code=compute_52 -gencode arch=compute_52,code=sm_52  -x cu -o cuda_impl.o cuda_impl.cu
nvcc --cudart static --relocatable-device-code=true -gencode arch=compute_52,code=compute_52 -gencode arch=compute_52,code=sm_52 -link -o  "speed_test"  *.o  -lcufft_static -lnvToolsExt -lculibos -lcuda

and get the following result when linking:

nvlink warning : Function '_Z15CallbackZeroPadPvmS_S_' has address taken but no possible call to it
nvlink warning : Function '_Z15CallbackScalingPvm6float2S_S_' has address taken but no possible call to it

Here is the output of running the stock device query example:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX TITAN X”
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 12200 MBytes (12792365056 bytes)
(24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
GPU Max Clock rate: 1076 MHz (1.08 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX TITAN X
Result = PASS

Hello,

after changing the compiler parameters, the linker error message has address taken but no possible call to disappeared.

The following program compiles without error or warning.

#include <cufft.h>
#include <cufftXt.h>
#include <cuda.h>
#include <cuda_runtime.h>

#include <iostream>
#include <stdexcept>


/*********************************
 * Error Checking and Handling
 *********************************/
inline const char* getErrorString(cudaError_t err) { return cudaGetErrorString(err); }
inline const char* getErrorString(cufftResult_t err) { return "cufft error"; }

#define CHECK_RESULT( X )  { auto ret = X; if( ret ) { std::clog << "Error [" << ret << ": " << getErrorString(ret) << "] on line " <<  __LINE__ << "\n"; throw std::runtime_error("Cuda error"); } }
#define CHECK_LAST_ERROR() { auto r = cudaPeekAtLastError(); if( r ) { std::clog << "Previous Error [" << r  << ": " << cudaGetErrorString(r) << "] on line " <<  __LINE__ <<"\n"; throw std::runtime_error("Previous cuda error"); } }


/********************************
 * Callback definition
 ********************************/
__device__ cufftComplex CallbackZeroPad(void* dataIn, size_t offset, void *callerInfo, void *sharedPtr) {
        int N = *((int*)callerInfo);
        if( offset < N )
                return ((cufftComplex*)dataIn)[offset];
        else
                return make_cuFloatComplex(0,0);
}
__device__ cufftCallbackLoadC CallbackZeroPadSymbol = CallbackZeroPad;


int main() {

        const int N = 1 << 21;

        // setup fft            
        size_t work_size=0;

        cufftHandle plan;
        CHECK_RESULT( cufftCreate(&plan) );
        CHECK_LAST_ERROR();
        CHECK_RESULT( cufftMakePlan1d(plan,  N, CUFFT_C2C, 1, &work_size) );
        CHECK_LAST_ERROR();

        // create caller_info
        int* dev_N = 0;
        CHECK_RESULT( cudaMalloc( &dev_N, sizeof(int) ) );
        CHECK_RESULT( cudaMemcpy( dev_N, &N, sizeof(int), cudaMemcpyHostToDevice ) );

        // setup callback
        cufftCallbackLoadC h_loadCallbackPtr;
        CHECK_RESULT( cudaMemcpyFromSymbol( &h_loadCallbackPtr, CallbackZeroPadSymbol, sizeof(h_loadCallbackPtr)) );
        CHECK_LAST_ERROR();
        CHECK_RESULT( cufftXtSetCallback( plan, reinterpret_cast<void**>(&h_loadCallbackPtr),    CUFFT_CB_LD_COMPLEX, (void**)(&dev_N) ) );
        CHECK_LAST_ERROR();

        return 0;
}

And the command line to compile it:

nvcc --std=c++11 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -lcufft_static -lculibos -lcudart_static -lcuda -arch=compute_52 -m64 test.cu -o test.x

The result of running the test program:

Previous Error [8: invalid device function] on line 56
terminate called after throwing an instance of 'std::runtime_error'
  what():  Previous cuda error
Aborted

Can anyone spot what I’m doing wrong?

It would have been helpful (for future reference) to mention which ones you changed, so readers do not have to hunt through the details of the “before” and “after” build commands.

Based on the information you provided, your are building code for sm_52 and then attempt to run on an sm_52 GPU. I have no idea how that would result in an “invalid device function” error. I don’t have experience with CUFFT callbacks functions.

I believe there is a CUFFT callback example app distributed with CUDA. Does that work on your machine? If it does, use that as a blueprint for your own efforts.

Hi,

I’m trying to do the same thing as you (padding an array with zeroes for FFT using a callback function, except that mine is 3D) and I’m getting the same error.
Have you found a way to get rid of it?