Device Function Pointers

Having some problems copying function pointers from device to host and back. I am running CUDA 5.0 and compiling with Visual Studio 2010 with compute mode 3.0 on a GeForce 650 card. The code shown below is embedded within a shared library and the executable basically makes calls to the RenderKernel::execute method.

The error I receive is ‘unknown error’ after executing the kernel which I presume implies the function pointer was not valid. If I replace (*func) with (*symbol_CClipGroup_INTERSECT) in the code, it executes as expected. This is boiled down to the minimal code required to get the error so I believe it has something to do with copying from the symbol (but I recieve no error when I do so).

The similar CUDA function pointers example compiles and executes without errors on the same system.

Does anyone have any suggestions on how to resolve this?

namespace vox {

/** Format of clipping function */
typedef void(*ClipFunc)(void *, Ray3f &);

}

// ----------------------------------------------------------------------------
//  Intersection callback function
// ----------------------------------------------------------------------------
__device__ void CClipGroup_INTERSECT(void * dataPtr, vox::Ray3f & ray)
{
    ray.min = 1.0f;
    // :TODO:
}
__device__ vox::ClipFunc symbol_CClipGroup_INTERSECT = CClipGroup_INTERSECT;

namespace vox {

namespace {
namespace filescope {

__global__ void renderKernel(ClipFunc func) 
{ 	
    Ray3f ray;
    (*func)(nullptr, ray);
}

} // namespace filescope
} // namespace anonymous

// --------------------------------------------------------------------
//  Executes the rendering stage kernel on the active device
// --------------------------------------------------------------------
void RenderKernel::execute(size_t xstart, size_t ystart,
                           size_t width,  size_t height)
{
    // Setup the execution configuration
    static const unsigned int BLOCK_SIZE = 32;
    dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
    dim3 blocks( 
        (width + threads.x - 1) / threads.x,
		(height + threads.y - 1) / threads.y 
        );

    ClipFunc result;
    VOX_CUDA_CHECK(cudaMemcpyFromSymbol(&result, symbol_CClipGroup_INTERSECT, sizeof(ClipFunc)));

    // Execute the device rendering kernel
    filescope::renderKernel<<<blocks,threads>>>(result);
}

} // namespace vox

– Lucas Sherman

Edit: Expanded my device macros for clarity

The only difference I can see between what you’ve written and some func pointer code I’ve written is your symbol declaration.

Instead of:

__device__ vox::ClipFunc symbol_CClipGroup_INTERSECT = CClipGroup_INTERSECT;

I have something like:

__constant__ vox::ClipFunc symbol_CClipGroup_INTERSECT = CClipGroup_INTERSECT;

Changing the device modifier to constant yields the following error which I don’t understand at all. A quick Google search turns up no results so perhaps this indicates a configuration issue?

Error 69 error : Internal error: reloc address not found for %s+0x%llx C:\Users\lucas\Documents\Projects\voxrender\trunk\Binaries\x86\Source\VolumeScatterRenderer\ptxas

Hmm… you say you are using a shared library on CUDA 5.0? Do you mean you are using “-dc” to generate relocatable code?

Your snippet compiles on CUDA 5.5 and 5.0 with something like this:

nvcc -m 32 -arch sm_30 -Xptxas=-v -cubin vox.cu

I am not using the -dc/-rdc flag. To my knowledge nvcc does not support relocatable code in DLLs. I meant that this code is embedded in a DLL which is loaded by a different application.

I am actually using the CUDA CMake script for generating the build files but the flags are set as follows:

FIND_PACKAGE(CUDA REQUIRED)
SET(CUDA_NVCC_FLAGS "-arch;sm_30;--cl-version 2010")

message(STATUS "Cuda include directory: " "${CUDA_INCLUDE_DIRS}")
message(STATUS "Cuda library directory: " "${CUDA_LIBRARIES}")
INCLUDE_DIRECTORIES(SYSTEM ${CUDA_INCLUDE_DIRS})

and the project is configured with

CUDA_ADD_LIBRARY(VolumeScatterRenderer SHARED
                 ${SOURCE_FILES} 
                 ${HEADER_FILES})

Since you don’t see any obvious problems with the code I will take the time to try and replicate the problem outside of my build environment.