Slow exe and Error when adding array elements and asigning result to another array element

paceman · December 17, 2012, 12:46pm

I hope someone can share an insight into to the problem I am having. A small program takes an array of doubles (100000 elements), adds together a subset of this array and assign the result to an element of another array. This seems like a very basic task, however my GT630 exerts strange behavior. What happens is that it works (but rather slow) for small values of intervalLength, but as soon as intervalLength becomes larger, around 300, the code fails.

What’s more interesting is that the problem seems to be not in the addition, but in the assigning of the results back to the output array. If the last line in the code below
output_dev[threadIdx.x] = totalSum;
is changed to
output_dev[threadIdx.x] = input_dev[0];
then the code runs lightningly fast - at least 100 times faster, and works for any large value of intervalLength. Also, if a line
totalSum=1;
preceeds the asignment then the code also runs fast and without errors. Some experimentation also showed that if the sum is calculated an a series of statements as opposed to using the loop, the code also works fine.

I am using GT630 4GB with 96 CUDA threads, launching 96 threads in one block.

The code:

extern "C" __global__ void TestCompute(double* input_dev, int input_devLen0, int* args_dev, int args_devLen0, double* output_dev, int output_devLen0)
{
	int intervalLength = args_dev[0];
	double totalSum = input_dev[num];
	if (num < input_devLen0)
	{
		for (int k = 0; k <= input_devLen0; k++)
		{
			totalSum = 0.0;
			for (int i = 0; i < intervalLength; i++)
			{
				if (input_devLen0 > i)
				{
					totalSum += input_dev[i];
				}
			}
			if (output_devLen0 > threadIdx.x)
			{
				output_dev[threadIdx.x] = totalSum; // input_dev[0];
			}
		}
	}
}

Many Thanks!

Topic		Replies	Views
Program gives unexpected error compiles smooth, but output is unexpected result CUDA Programming and Performance	5	3380	October 17, 2011
Array addition Addition of all th elements of an array CUDA Programming and Performance	5	3821	August 28, 2009
2Arrays addition program give wrong results for large size arrays CUDA Programming and Performance	4	618	August 28, 2016
First CUDA program... looks good, executing wrong? CUDA Programming and Performance	3	1057	June 24, 2009
Result of simple vector summation is not correct. CUDA Programming and Performance	2	825	July 23, 2013
How to sum all the elements of an array CUDA Programming and Performance	4	30625	April 6, 2011
Why it doesnt work ? Simple program that adds two vectors CUDA Programming and Performance	6	3963	March 18, 2010
CUDA BUG? atomicAdd CUDA Programming and Performance	1	6150	March 21, 2009
slowdown modifying kernel return variable CUDA Programming and Performance	2	2632	December 28, 2009
Wrong outputs for SUMMATION example SUMMATION problem in CUDA CUDA Programming and Performance	0	1019	April 29, 2010

Slow exe and Error when adding array elements and asigning result to another array element

Related topics