Working with large numbers Help to calculate an harmonic sum

lprunk · June 21, 2009, 11:17pm

Hi guys,

I’ve been trying to write a program using CUDA to calculate the sum of an harmonic series: 1 + 1/2 + 1/3 + 1/4 + … 1/n indefinitely, outputting the result from time to time, trying to get to the highest sum in less time. Here is the problem: it calculates until a certain number and then it stops the sum. Here is my code:

[codebox]#include <stdio.h>

#include <cuda.h>

const int GRID_SIZE = 3;

const int BLOCK_SIZE = 10;

global void calculate( float *sumGPU, int loop )

{

float sumThread = 0;

register double start = (threadIdx.x * BLOCK_SIZE) + (blockIdx.x * BLOCK_SIZE * 10)

+ (loop * 10 * GRID_SIZE * BLOCK_SIZE);



register double end = start + (BLOCK_SIZE - 1);

register double i;



for (i = start; i <= end; i++)

 {

	 sumThread += (float)1 / (i + 1);

 }



sumGPU[BLOCK_SIZE * blockIdx.x + threadIdx.x] = sumThread;

}

int main()

{

const int SIZE = BLOCK_SIZE * GRID_SIZE;

float *sumCPU = new float;

float *sumGPU = new float;

float sum = 0;

const int SIZE_SUM = (SIZE) * sizeof(float);



cudaMalloc( (void**)&sumGPU, SIZE_SUM ); 



cudaMemcpy( sumGPU, sumCPU, SIZE_SUM, cudaMemcpyHostToDevice ); 



for (int i = 0; i < 43690; i++)

{

	calculate<<< GRID_SIZE, BLOCK_SIZE >>>( sumGPU, i ); 

	cudaMemcpy( sumCPU, sumGPU, SIZE_SUM, cudaMemcpyDeviceToHost ); 

	

	for (int j = 0; j < SIZE; j++)

	 {

		 sum += sumCPU[j];

	 }

	printf("%f\n", sum);

}



cudaFree( sumGPU );

delete[] sumCPU;



return EXIT_SUCCESS;

}

[/codebox]

Actually, this program stops after 43690 iterations in the for loop because I was just testing, but the goal is to make this loop infinite and to calculate the sum as far as possible(the GRID_SIZE will also be larger than 3 in the release). I’ve tried to increase the GRID_SIZE, but it appears I’m having the same problem all the times, the sum stops at about 17.03, I think its because of the float size used in the denominator of sum.

Anyone know a way to use a large type to calculate this? Or is there another problem in my code? =)

I’m new to CUDA(started learning it this week) so any tips will be welcome =D

Thanks!

mephistopheles · June 22, 2009, 12:00am

A floating point number does not have infinite resolution. You can extend the summing a little bit by using double instead of float, but 1/n will eventually be zero when n becomes large enough.

You will need another algorithm if you want to keep the summing up forever. Can’t help with the particulars, I’m afraid…

seibert · June 22, 2009, 2:15am

There are two tricks you can use to reduce the accumulation of round-off error:

Sum the series backwards.
Floating point arithmetic loses precision very quickly when adding terms of vastly different magnitudes. If you sum the harmonic series starting from 1, then pretty soon you are adding very small values to something which is O(1). If instead you can start summing from 1/n (where n is some large value) and then add 1/(n-1), 1/(n-2), and so on, you will be summing values which are much closer in magnitude. I’m not sure how to extend this technique to allow you to easily grow n based on previous calculations. Perhaps dividing the sequence into subsequences, and summing each of those chunks into global memory would work.
Use “Kahan summation” to extend your precision beyond double in the intermediate additions.
A standard trick to reduce round-off error in a sum is to use Kahan summation. Two variables are used: one to hold the sum so far, and one to store the accumulated round-off error. The error term is folded back into the sum at each iteration. This wikipedia page shows the algorithm:

[url=“Kahan summation algorithm - Wikipedia”]http://en.wikipedia.org/wiki/Kahan_summation[/url]

cbuchner1 · June 23, 2009, 9:00am

Is it possible to combine two double precision floats into an emulated “long double”, much like the Mandelbrot combines two floats into an emulated double?

seibert · June 23, 2009, 11:19am

Yeah, the same place with the dsfun90 library also has a ddfun90 library for exactly this case.

Topic		Replies	Views
floating point precision on CUDA CUDA Programming and Performance	11	14715	June 8, 2010
how to improve float array summation precision and stability? CUDA Programming and Performance	9	3282	January 15, 2019
Wrong outputs for SUMMATION example SUMMATION problem in CUDA CUDA Programming and Performance	0	995	April 29, 2010
Maximum native precision or mixed-precision to have 21 digit floating point type? CUDA Programming and Performance mixed-precision	1	670	January 14, 2022
Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! CUDA Programming and Performance	16	10463	December 8, 2009
CUDA dot product atomics problem CUDA Programming and Performance	4	1848	February 26, 2012
Need help in debugging parallel sum reduction program CUDA Programming and Performance	4	1290	March 7, 2010
CUDA - calculation of a sum CUDA Programming and Performance	7	5472	April 30, 2010
Calculation sum of array parts have large prime number elements CUDA Programming and Performance	5	1845	December 23, 2009
Is there any method to enhance precision? CUDA Programming and Performance	2	4292	January 3, 2009

Working with large numbers Help to calculate an harmonic sum

Related topics