CUDA Kernel broken

Hi all,
I recently had the issue with running a large number of loops within GPU global function. The same function worked probably two months ago.

A typical global function would be: (both npt and tricount are integers >100,000)

global_ void CalculateHeight( double3* triList, int tricount, int npt, double2* bounds, double3* ptList)
int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx < npt)
	for (int i = 0; i < tricount; i++)
		//check points in triangle
                        ptList[idx] =function(triList[i])....

The function can not complete. The error is “Application has been blocked from accessing Graphics Hardware”.

If I make a 2D block— using idy as the index number for tricount, it may work. I’m not sure if this is because my level of “parallelism” is not high.

My system configuration is:
CUDA 9.0; VS 2015; OS: Windows 10; GPU: NVIDIA GeForce 930 MX

Let me know if you need further information.

Any suggestions are highly appreciated. Thank you!

You might be running into a WDDM TDR timeout. This is well documented so if you google for that you will find plenty of description of what it is and how you might work around it. IMO, the easiest way to modify this is from nsight VSE:

I can’t determine from your code snippet whether you are using proper CUDA error checking or not (not sure? just google “proper CUDA error checking”). If not, I always advise that, any time you are having trouble with a CUDA code. Also, the nsight VSE system has a useful “memory checker” built in, which can be enabled.


That works well… I did not know Nsight has a configuration for TDR.:)