Using a __device__ function pointer. Problems using a pointer to a __device__ function.

I am trying to make a kernel that invokes a device function via a pointer. It works well as long as the function and its caller reside in the same source (.cu) file, but breaks if they are in different files. Here is the full example code.

FuncPointer.h:

#ifndef FuncPointer_h

#define FuncPointer_h

typedef float (*op_func) (float, float);

struct FuncPointer {

	FuncPointer();

    op_func fptr;

};

#endif // FuncPointer_h

Main.cu:

#include <cstdio>

#include "FuncPointer.h"

/// start of FuncPointer.cu

__device__ float add_func (float x, float y)

{

    return x + y;

}

__device__ op_func func = add_func;

FuncPointer::FuncPointer() {

    cudaMemcpyFromSymbol(&fptr, func, sizeof(func));

}

/// end of FuncPointer.cu

__global__ void kernel (FuncPointer* p)

{

    float x=100, y=10, result=0;

    result = p->fptr(x, y);

    printf ("result = %f\n", result);

}

int main () 

{

    FuncPointer fp;

	FuncPointer* dev_fp;

	cudaMalloc(&dev_fp, sizeof(FuncPointer));

	cudaMemcpy(dev_fp, &fp, 

		sizeof(FuncPointer), cudaMemcpyHostToDevice);

	

	kernel<<<1,1>>>(dev_fp);

	cudaFree(dev_fp);

return EXIT_SUCCESS;

}

This works as expected.

Note the code section between [font=“Courier New”]/// start of FuncPointer.cu[/font] and [font=“Courier New”]/// end of FuncPointer.cu[/font].

If I move this code from Main.cu into another file FuncPointer.cu and link them together, the execution stops with the “unspecified launch error” message.

What is wrong with calling a device function from another file by pointer?

A similar question was asked in this post, but never answered.

Calling a device function defined in a different compilation unit requires linking of device code, so references to other compilation units can be resolved. Up to and including CUDA 4.2, there is no support for linking of device code. In this case the function pointer is a device variable not accessible from outside the compilation unit it is defined in.

As was announced at GTC, CUDA 5.0 will provide for (static) linking of device code. See for example this presentation by our chief technologist for GPU computing, Mark Harris: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0641-GTC2012-CUDA-5-Beyond.pdf

A CUDA 5.0 preview is available to registered developers. I believe (but have not checked) that the new linker is part of the preview. Please note that the stability and maturity of the preview should not be assume to be on par with that of release candidates.

Thanks for the explanation; I will try CUDA 5.0 preview out!

BTW, I see the same problem if I try to invoke a virtual function that is defined in a different compilation unit. For the same reason, obviously.