I have recently update the machines in my cluster to the new cuda driver 9.2.
However, the new driver updated how to Launch CUDA Kernels works, as comment here https://developer.nvidia.com/cuda-toolkit/whatsnew
My app use the syntax of kernelname<<<blocks, threads, 0, stream>>>(param0,param1, etc…);
Before (up to cuda 9.1), this call was being translated by the driver to other calls, such as the cudaConfigureCall(), cudaSetupArgument() (for each param) and finally the cudaLaunch().
However, now, it is being translated directly to the cudaLaunchKernel(), which receives an arg pointer where all the parameters are in the memory.
I am doing a wrapper that intercepts the CUDA calls and perform some special features. Since the cudaLaunchKernel() has only a pointer of where the parameters are, it is hard to guess the size and the type of the parameters.
Is there anyway to force the driver to keep using the old cudaLaunch fashion?
I mean, without changing the applications and keeping using the syntax before…