GPU Never Returns Every few 100 runs, the kernel never returns

I have a kernel that runs fine much of the time, but will freeze and not return seemingly randomly. I can tell this because the display becomes unresponsive for much longer than the time it usually takes the kernel to complete (days instead of 10s of seconds). I have to restart my computer when this happens. I have whittled down the kernel code to something that doesn’t do any useful computation anymore, but still exhibits the freezing. This is copied below. What makes it lock up??

[codebox]extern shared float array;

//num About 2000

//ratios Size num*(blockDim.x-1)

//sums Size num*(blockDim.x-1)

//estimates Size blockDim.x*gridDim.x

//initialVals Size gridDim.x

static global void myMethodKernel(int num, const float* ratios, const float* sums, float* estimates, const float* initialVals) {

//shared memory allocation

__shared__ float first;

__shared__ float second;

__shared__ int baseArraySize;

__shared__ float numerator;

float* temp = (float*)array; 

float* intensities = (float*)&temp[blockDim.x]; 

intensities[threadIdx.x] = estimates[blockIdx.x * blockDim.x + threadIdx.x];

int event, iteration;


if (threadIdx.x == blockDim.x-1) {

	iteration =threadIdx.x;  event = 0;

	while (iteration > 1) {

		++event;  iteration = iteration/2.0f;


	baseArraySize = 1<<event; //=2^event


for (iteration = 0; iteration < 30; ++iteration) {

	int energyBin = 0;

	for (event = 0; event < num; ++event) {

		if (threadIdx.x == 0) {

			temp[threadIdx.x] = intensities[threadIdx.x] * sums[event*gridDim.x+blockIdx.x];

		} else {

			temp[threadIdx.x] = intensities[threadIdx.x] * ratios[event*(blockDim.x-1) + threadIdx.x-1];







Here are some other possible clues:

  • It does it more frequently when “num” is large. For instance if it is about 100 I have never seen it freeze. When it is about 2000, it fails to return about once every 200 calls. The original version of the kernel failed more often than this (up to ~every 5 trials).

  • I have never gotten it to freeze if I take out either of the terms in the multiplication within the if statements in the for loops. I can even multiply the remaining term by itself and it is fine. It is only when I multiply both numbers that it will sometimes fail.

  • The last two statements might be just red herrings. It fails more often when the time to complete the kernel is long and these two effects just increase the kernel execution time.

  • I never get an error message. It just never returns.

  • I am almost certain I am not accessing memory outside the arrays, but I am not as sure that the values are always good. Could this cause freezing?

  • However, I can run the same trials repeatedly so they have exactly the same numbers and sometimes it will freeze and sometime it will run fine and give the expected results. It definitely occurs randomly.

Any ideas of what could cause the kernel never to return? Or, any ideas of further tests I could do?