Working with large numbers Help to calculate an harmonic sum

Hi guys,

I’ve been trying to write a program using CUDA to calculate the sum of an harmonic series: 1 + 1/2 + 1/3 + 1/4 + … 1/n indefinitely, outputting the result from time to time, trying to get to the highest sum in less time. Here is the problem: it calculates until a certain number and then it stops the sum. Here is my code:

[codebox]#include <stdio.h>

#include <cuda.h>

const int GRID_SIZE = 3;

const int BLOCK_SIZE = 10;

global void calculate( float *sumGPU, int loop )


float sumThread = 0;

register double start = (threadIdx.x * BLOCK_SIZE) + (blockIdx.x * BLOCK_SIZE * 10)

+ (loop * 10 * GRID_SIZE * BLOCK_SIZE);

register double end = start + (BLOCK_SIZE - 1);

register double i;

for (i = start; i <= end; i++)


	 sumThread += (float)1 / (i + 1);


sumGPU[BLOCK_SIZE * blockIdx.x + threadIdx.x] = sumThread;


int main()



float *sumCPU = new float;

float *sumGPU = new float;

float sum = 0;

const int SIZE_SUM = (SIZE) * sizeof(float);

cudaMalloc( (void**)&sumGPU, SIZE_SUM ); 

cudaMemcpy( sumGPU, sumCPU, SIZE_SUM, cudaMemcpyHostToDevice ); 

for (int i = 0; i < 43690; i++)


	calculate<<< GRID_SIZE, BLOCK_SIZE >>>( sumGPU, i ); 

	cudaMemcpy( sumCPU, sumGPU, SIZE_SUM, cudaMemcpyDeviceToHost ); 


	for (int j = 0; j < SIZE; j++)


		 sum += sumCPU[j];


	printf("%f\n", sum);


cudaFree( sumGPU );

delete[] sumCPU;




Actually, this program stops after 43690 iterations in the for loop because I was just testing, but the goal is to make this loop infinite and to calculate the sum as far as possible(the GRID_SIZE will also be larger than 3 in the release). I’ve tried to increase the GRID_SIZE, but it appears I’m having the same problem all the times, the sum stops at about 17.03, I think its because of the float size used in the denominator of sum.

Anyone know a way to use a large type to calculate this? Or is there another problem in my code? =)

I’m new to CUDA(started learning it this week) so any tips will be welcome =D


A floating point number does not have infinite resolution. You can extend the summing a little bit by using double instead of float, but 1/n will eventually be zero when n becomes large enough.

You will need another algorithm if you want to keep the summing up forever. Can’t help with the particulars, I’m afraid…

There are two tricks you can use to reduce the accumulation of round-off error:

  1. Sum the series backwards.
    Floating point arithmetic loses precision very quickly when adding terms of vastly different magnitudes. If you sum the harmonic series starting from 1, then pretty soon you are adding very small values to something which is O(1). If instead you can start summing from 1/n (where n is some large value) and then add 1/(n-1), 1/(n-2), and so on, you will be summing values which are much closer in magnitude. I’m not sure how to extend this technique to allow you to easily grow n based on previous calculations. Perhaps dividing the sequence into subsequences, and summing each of those chunks into global memory would work.

  2. Use “Kahan summation” to extend your precision beyond double in the intermediate additions.
    A standard trick to reduce round-off error in a sum is to use Kahan summation. Two variables are used: one to hold the sum so far, and one to store the accumulated round-off error. The error term is folded back into the sum at each iteration. This wikipedia page shows the algorithm:

Is it possible to combine two double precision floats into an emulated “long double”, much like the Mandelbrot combines two floats into an emulated double?

Yeah, the same place with the dsfun90 library also has a ddfun90 library for exactly this case.