I’m designing a tool that takes cuda API calls from many applications and that executes them in one or more cudaStream. I’m doing some test on it with the SDK example of the vector addition (I’ve changed nothing, the only difference is that the cuda API calls are done by my tools). It works generally well when there are less than 10 executions of the program. When the amount of simultaneous executions rises, I begin to get the error cudaErrorInvalidDeviceFunction after having called the kernel :
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
I verified some points :
- The address of the function is the same for successful and failed calls (0x7f2311df4f80). As it’s the same address, it’s the same function. The function can’t be the problem.
- Arguments A, B and C can be transfered with cudaMemCpy without problem and their values are OK. It doesn’t seem to be a problem with the arguments.
Does someone know when this error can happen please ? I’m stuck and out of ideas to debug this problem.
- The requested device function does not exist or is not compiled for the
- proper device architecture.
cudaErrorInvalidDeviceFunction = 8,
==> not the only cause, as the kernel sometimes works without error