I’ve been looking at the new CUDA 4.0 Driver API Kernel launch call in the matrixMulDrv example in the NVIDIA GPU Computing SDK 4.0 /C/src folder. The call is in matrixMulDrv.cpp on line 168.
I’ve been looking at the new CUDA 4.0 Driver API Kernel launch call in the matrixMulDrv example in the NVIDIA GPU Computing SDK 4.0 /C/src folder. The call is in matrixMulDrv.cpp on line 168.
Does it mean the advantage of writing templates is completely lost in the Driver API since one cannot launch a “templated” kernel as in the runtime API ?
but where can I find this mangled name ? Well I give you more details on my current problems, I coded a multi threaded dll using openmp and cuda Driver Api, and need now to sort large arrays of numbers. I thought I would be using thrust or CUDPP and discovered that they are based on cuda Runtime Api, which makes it impossible to use them in my code ( maybe I am wrong , but that is what I read in many places). I decided to convert cudpp in cuda driver API and in this code, I found calls like this one :
switch(traitsCode)
{
case 0: // single block, single row, non-full last block
segmentedScan4<T, SegmentedScanTraits<T, op, isBackward, isExclusive, doShiftFlagsLeft, false, false,
false> >
<<< grid, threads, sharedMemSize >>>
(d_out, d_idata, d_iflags, numElements, 0, 0, 0);
break;
which is a “double” templatized kernel, making it difficult to run with cuLaunchKernel. Do I need to instantiate all the possible cases in order to make it work or is there some mangled name trick that I can use to get the correct kernel running ?