I’m trying to test out the new CUDA 4.0 kernel call. Defined as,
CUresult cuLaunchKernel ( CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, CUstream hStream, void ** kernelParams, void ** extra)
I’m using it like so
//define the arguments going into the ParallelMatrixMulKernelMultipleOfEight kernel
void *args[6] = { &d_A, &d_B, &d_C, &height, &width, &width};
// new CUDA 4.0 Driver API Kernel launch call
cutilDrvSafeCallNoSync(cuLaunchKernel(ParallelMatrixMulKernelMultipleOfEight,
blocksInGrid.x, blocksInGrid.y, blocksInGrid.z,
threadsPerBlock.x, threadsPerBlock.y, threadsPerBlock.z,
2 * 8 * 8 * sizeof(float), NULL, args, NULL) );
Where ParallelMatrixMulKernelMultipleOfEight is defined in a .h file (and implemented in a .cu file) as
__global__ void ParallelMatrixMulKernelMultipleOfEight(float *A, float *B, float *C, unsigned int HeightA, unsigned int WidthB, unsigned int WidthAHeightB);
When compiling I recieve the error,
error: argument of type “void()(float, float*, float*, unsigned int, unsigned int, unsigned int)” is incompatible with parameter type “CUfunction”
I can’t understand why the list of parameters inside the args array would be incompatable with the function defined. They are exactly the right number and the right type. Does anyone know what’s going on?