Hi guys,
I’ve been trying to write a program using CUDA to calculate the sum of an harmonic series: 1 + 1/2 + 1/3 + 1/4 + … 1/n indefinitely, outputting the result from time to time, trying to get to the highest sum in less time. Here is the problem: it calculates until a certain number and then it stops the sum. Here is my code:
[codebox]#include <stdio.h>
#include <cuda.h>
const int GRID_SIZE = 3;
const int BLOCK_SIZE = 10;
global void calculate( float *sumGPU, int loop )
{
float sumThread = 0;
register double start = (threadIdx.x * BLOCK_SIZE) + (blockIdx.x * BLOCK_SIZE * 10)
+ (loop * 10 * GRID_SIZE * BLOCK_SIZE);
register double end = start + (BLOCK_SIZE - 1);
register double i;
for (i = start; i <= end; i++)
{
sumThread += (float)1 / (i + 1);
}
sumGPU[BLOCK_SIZE * blockIdx.x + threadIdx.x] = sumThread;
}
int main()
{
const int SIZE = BLOCK_SIZE * GRID_SIZE;
float *sumCPU = new float;
float *sumGPU = new float;
float sum = 0;
const int SIZE_SUM = (SIZE) * sizeof(float);
cudaMalloc( (void**)&sumGPU, SIZE_SUM );
cudaMemcpy( sumGPU, sumCPU, SIZE_SUM, cudaMemcpyHostToDevice );
for (int i = 0; i < 43690; i++)
{
calculate<<< GRID_SIZE, BLOCK_SIZE >>>( sumGPU, i );
cudaMemcpy( sumCPU, sumGPU, SIZE_SUM, cudaMemcpyDeviceToHost );
for (int j = 0; j < SIZE; j++)
{
sum += sumCPU[j];
}
printf("%f\n", sum);
}
cudaFree( sumGPU );
delete[] sumCPU;
return EXIT_SUCCESS;
}
[/codebox]
Actually, this program stops after 43690 iterations in the for loop because I was just testing, but the goal is to make this loop infinite and to calculate the sum as far as possible(the GRID_SIZE will also be larger than 3 in the release). I’ve tried to increase the GRID_SIZE, but it appears I’m having the same problem all the times, the sum stops at about 17.03, I think its because of the float size used in the denominator of sum.
Anyone know a way to use a large type to calculate this? Or is there another problem in my code? =)
I’m new to CUDA(started learning it this week) so any tips will be welcome =D
Thanks!