Error with new CUDA 4.0 Kernel call argument is incompatable with parameter of type "CUfunction&

I’m trying to test out the new CUDA 4.0 kernel call. Defined as,

CUresult cuLaunchKernel ( CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, CUstream hStream, void ** kernelParams, void ** extra)

I’m using it like so

//define the arguments going into the ParallelMatrixMulKernelMultipleOfEight kernel

    void *args[6] = { &d_A, &d_B, &d_C, &height, &width, &width};

    // new CUDA 4.0 Driver API Kernel launch call

    cutilDrvSafeCallNoSync(cuLaunchKernel(ParallelMatrixMulKernelMultipleOfEight, 

        blocksInGrid.x, blocksInGrid.y, blocksInGrid.z, 

        threadsPerBlock.x, threadsPerBlock.y, threadsPerBlock.z, 

        2 * 8 * 8 * sizeof(float), NULL, args, NULL) );

Where ParallelMatrixMulKernelMultipleOfEight is defined in a .h file (and implemented in a .cu file) as

__global__ void ParallelMatrixMulKernelMultipleOfEight(float *A, float *B, float *C, unsigned int HeightA, unsigned int WidthB, unsigned int WidthAHeightB);

When compiling I recieve the error,

error: argument of type “void()(float, float*, float*, unsigned int, unsigned int, unsigned int)” is incompatible with parameter type “CUfunction”

I can’t understand why the list of parameters inside the args array would be incompatable with the function defined. They are exactly the right number and the right type. Does anyone know what’s going on?

I’m trying to test out the new CUDA 4.0 kernel call. Defined as,

CUresult cuLaunchKernel ( CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, CUstream hStream, void ** kernelParams, void ** extra)

I’m using it like so

//define the arguments going into the ParallelMatrixMulKernelMultipleOfEight kernel

    void *args[6] = { &d_A, &d_B, &d_C, &height, &width, &width};

    // new CUDA 4.0 Driver API Kernel launch call

    cutilDrvSafeCallNoSync(cuLaunchKernel(ParallelMatrixMulKernelMultipleOfEight, 

        blocksInGrid.x, blocksInGrid.y, blocksInGrid.z, 

        threadsPerBlock.x, threadsPerBlock.y, threadsPerBlock.z, 

        2 * 8 * 8 * sizeof(float), NULL, args, NULL) );

Where ParallelMatrixMulKernelMultipleOfEight is defined in a .h file (and implemented in a .cu file) as

__global__ void ParallelMatrixMulKernelMultipleOfEight(float *A, float *B, float *C, unsigned int HeightA, unsigned int WidthB, unsigned int WidthAHeightB);

When compiling I recieve the error,

error: argument of type “void()(float, float*, float*, unsigned int, unsigned int, unsigned int)” is incompatible with parameter type “CUfunction”

I can’t understand why the list of parameters inside the args array would be incompatable with the function defined. They are exactly the right number and the right type. Does anyone know what’s going on?

But they are not the right type. You have to pass a CUFunction handle to cuLaunchKernel. You are not. In the driver API, function handles are returned by cuModuleGetFunction. So load the cubin or PTX from file, retrieve a handle to the kernel you want to launch, and then launch it.