Out of registers or stack? No exceptions, just returning 0.0.

Bjoernar80 · June 13, 2013, 2:35pm

Hello,

I’ve written a small code to run on my Tesla C2075. I’m experiencing some problems, though.

__device__ float MTwo(float u)
{
	if (u >= 0.0 && u <= 2.0)
	{
		return (1.0 - abs(u - 1.0));
	}
	return 0.0;
}

__device__ float MThree(float u)
{
	return ( ((u * 0.5) * MTwo(u)) + (((3.0 - u) * 0.5) * MTwo(u - 1.0)) );
}

__device__ float MFour(float u)
{
	return ( ((u / 3.0) * MThree(u)) + (((4.0 - u) / 3.0) * MThree(u - 1.0)) );
}

__device__ float MFive(float u)
{
	if (u < 0.0 || u > 5.0) return 0.0;
	return ( ((u * 0.25) * MFour(u)) + (((5.0 - u) * 0.25) * MFour(u - 1.0)) );
}

These are being called by:

__global__ void QTestCalculate(double *QTestArray, const double Charge, const int *Ks, const int *Ns, const double *u, const int n_size)
{
	int index = blockIdx.x * blockDim.x + threadIdx.x;
	if (index < n_size)
	{
		int k3 = index % Ks[0];
		int k2 = ((index % (Ks[0] * Ks[1])) - k3) / Ks[0];
		int k1 = (index - k3 - (k2 * Ks[0])) / (Ks[0] * Ks[1]);
		float Q1 = 0.0, Q2 = 0.0, Q3 = 0.0;

		for (int n = Ns[0]; n <= 1; n++)
		{
			Q1 += MFive(float(u[0]) - float(k1) - float(n * Ks[0]));
		}

		for (int m = Ns[1]; m <= 1; m++)
		{
			Q2 += MFive(float(u[1]) - float(k2) - float(m * Ks[1]));
		}
		
		for (int o = Ns[2]; o <= 1; o++)
		{
			Q3 += MFive(u[2] - float(k3) - float(o * Ks[2]));
		}

		QTestArray[index] = (Charge * Q1 * Q2 * Q3);
	}
	__syncthreads();
}

My issue is this:
I can (from a different kernel) enter a precalculated value of u into MFive (even MSeven, which is not shown here). So I know that my M____ functions calculate the correct answere. I’ve validated what comes into the MFive() call, and know it should return a valid value other than 0.0.

I can also when compiling without the -arch=sm_20 flag get values for Q1, Q2 and Q3 and return those individually. I cannot multiply Q1 * Q2, Q2 * Q3 or Q1 * Q3 or all of the above, then the result is 0.0.

When using the -arch=sm_20 flag, it Q1, Q2, Q3 is all zeroes.

When I used A spline order of 7, (M function is a Cardinal B-Spline), I can get returns (valid values),
as far as MThree = return u; If I were to change it to: MThree = return u / 1.0; I get 0.0.

So, I’m guessing that I’m running out of registers or stack or something …

Is there any way to improve on this? Avoid the problem? I’m kind of new to CUDA, so I would appreciate any input on how to better solve this.

Best regards,
Bjørnar Jensen.

vyas · June 15, 2013, 2:17am

Are you checking error codes from the host portion of your application ? Try running your application under cuda-memcheck see if there are memory access errors or CUDA API errors in the application. You can also use cuda-gdb or Nsight Eclipse Edition on Linux/Mac, or Nsight Visual Studio Edition on Windows, to debug your application, by stepping through the application and examining the values of the variables.

Bjoernar80 · June 17, 2013, 10:34am

Hello,

I explicitly checked for errors after communication only.
However, I figured it out. The kernel is indeed limited by registers.

So, currently I’m avoiding the issue by reducing the block size. But I’ll have to figure out a smart way to reuse some variables to increase the load (currently only 50%) on the Tesla.

How do people get around this when rewriting (naturally) recursive algorithms in general?

Best regards,
Bjørnar.

Topic		Replies	Views
Code works with floats but not doubles CUDA Programming and Performance	4	4999	July 15, 2009
cannot resolve the error in running multi-block, mutli-threads kernel CUDA Programming and Performance	5	1066	February 5, 2014
unspecified launch failure CUDA Programming and Performance	2	9026	January 21, 2009
Different Results on Two Different Cards.. CUDA Programming and Performance	9	2222	March 21, 2012
Kernel fails to execute CUDA Programming and Performance	3	1135	July 8, 2010
About effect of number of threads in a block, Tesla C2075 CUDA Programming and Performance	12	3482	December 28, 2012
CUDA kernel never executed - SOLVED CUDA Programming and Performance	1	1090	July 8, 2010
having problem with simpe CUDA code Code debug CUDA Programming and Performance	4	1633	November 7, 2009
values become -1.#IND00 on the second call! Emu Mode works Fine. CUDA Programming and Performance	2	6674	June 11, 2008
The kernel isn't working CUDA Programming and Performance	9	1052	January 19, 2011

Out of registers or stack? No exceptions, just returning 0.0.

Related topics