I am trying to add 64 float values in a thread. I have created 32400 threads for processing of 32400 8X8 blocks of an image. The range of float values varies from 100.000 to 50000.00. When I add all the 64 values for a 8x8 block and store them in a float array, I am not getting proper results. I have the same code running on Pentium processor. The results generated by GTX8800 are not matching with the result generated by Pentium processor.
The strange part to this is all the floating point variables don’t have any fractional part. For that reason the addition on Pentium processor is done in form in integer but as integer operations are not supported on GTX8800, I am doing this operation in float format.
Has anyone come across such a problem? Or does anyone know a solution to this?