Reduction sample in SDK

Hi,

I’m playing a bit with the reduction sample from the SDK. I’ve changed the size of data to be 1 << 26 and running kernel type 6 (the

most complex one - actually kernel type 2 on this size of input fails).

The test fails:

Reducing array of type int.

Using Device 0: "GeForce GTX 280"

67108864 elements

128 threads (max)

64 blocks

Average time: 2.631887 ms

Bandwidth:	101.993536 GB/s

GPU result = 261121.000000000000

CPU result = 261127.968750000000

TEST FAILED

Can anyone confirm? any idea why? precision differences?

A size of 1 << 22 passes…

thanks

eyal

Is this an integer or floating point reduction? The output say integer, but the results are floats…

Obviously you’re correct, as always :)

Yes its float reduction…

BTW - another strange thing… I get very different results for GB/s for repeated runs…

At first I thought it was the cutil measurements, however I see its done with QueryPerformanceFrequency (on windows) so that

is accurate…

I can confirm this problem, the result of datatype=reduce_float is always failed if size is large.

Seems the float reduction in the SDK sample is bugous or the threshold of 1e-8*size is set too low.

Here is my test result on an untouched SDK reduction of SDK 2.3:

[font=“Courier New”]reduction.exe -n=4194304 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

4194304 elements

128 threads (max)

64 blocks

Average time: 0.188263 ms

Bandwidth: 89.115625 GB/s

GPU result = 16314.394531250000

CPU result = 16314.394531250000

TEST PASSED

reduction.exe -n=8388608 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

8388608 elements

128 threads (max)

64 blocks

Average time: 0.353715 ms

Bandwidth: 94.862807 GB/s

GPU result = 32639.140625000000

CPU result = 32639.140625000000

TEST PASSED

reduction.exe -n=16777216 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

16777216 elements

128 threads (max)

64 blocks

Average time: 0.686262 ms

Bandwidth: 97.788991 GB/s

GPU result = 65281.000000000000

CPU result = 65281.992187500000

TEST FAILED

reduction.exe -n=33554432 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

33554432 elements

128 threads (max)

64 blocks

Average time: 1.359144 ms

Bandwidth: 98.751637 GB/s

GPU result = 130561.000000000000

CPU result = 130563.984375000000

TEST FAILED[/font]

Thanks – anyone can explain why the SDK sample fails??

thanks

eyal