Problem with addition of float values on GTX8800 Float addition problem

Tanmay_Anjaria · November 16, 2007, 10:18am

I am trying to add 64 float values in a thread. I have created 32400 threads for processing of 32400 8X8 blocks of an image. The range of float values varies from 100.000 to 50000.00. When I add all the 64 values for a 8x8 block and store them in a float array, I am not getting proper results. I have the same code running on Pentium processor. The results generated by GTX8800 are not matching with the result generated by Pentium processor.

The strange part to this is all the floating point variables don’t have any fractional part. For that reason the addition on Pentium processor is done in form in integer but as integer operations are not supported on GTX8800, I am doing this operation in float format.

Has anyone come across such a problem? Or does anyone know a solution to this?

VanDammage · November 16, 2007, 11:30am

Hi,

could you support us with some code snippets?
should be easier to locate the problem.

seibert · November 16, 2007, 11:32am

I don’t quite understand your problem, but I can tell you that the 8800 GTX supports integer operations.

AndreiB · November 16, 2007, 12:11pm

For floating point calculations results are different if you calculate same expression on GPU or CPU. Code on GPU uses float while code on CPU uses double.

paulius · November 16, 2007, 7:49pm

First, as was already pointed out, all CUDA-capable hardware HAS integer operations.

Second, you are not comparing the same computations. Because of the IEEE754 format for floats, there are integers that can be represented in 32 bit int format, but not in single precision (32bit) float. So, I bet if you imlement the same algorithm on the CPU to use both ints and floats, you’ll see a difference in your result between the two CPU implementations.

Third, be careful when comparing float on GPU vs float on CPU. By default, Intel chips use 80-bit representation for floats for computations (32-bits are stored in memory). You can use the compiler’s flags for strict (or stricter) IEEE-conformance, to force intermedite results to be stored in single precision on the CPU. Depending on your data set, this will give quite different results (I’ve seen a result change from -10 to 30 on a Core2 with and without the flag).\

Paulius