Reduction sample in SDK

eyalhir74 · February 21, 2010, 4:06pm

Hi,

I’m playing a bit with the reduction sample from the SDK. I’ve changed the size of data to be 1 << 26 and running kernel type 6 (the

most complex one - actually kernel type 2 on this size of input fails).

The test fails:

Reducing array of type int.

Using Device 0: "GeForce GTX 280"

67108864 elements

128 threads (max)

64 blocks

Average time: 2.631887 ms

Bandwidth:	101.993536 GB/s

GPU result = 261121.000000000000

CPU result = 261127.968750000000

TEST FAILED

Can anyone confirm? any idea why? precision differences?

A size of 1 << 22 passes…

thanks

eyal

avidday · February 21, 2010, 4:11pm

Is this an integer or floating point reduction? The output say integer, but the results are floats…

eyalhir74 · February 21, 2010, 4:18pm

Obviously you’re correct, as always :)

Yes its float reduction…

BTW - another strange thing… I get very different results for GB/s for repeated runs…

At first I thought it was the cutil measurements, however I see its done with QueryPerformanceFrequency (on windows) so that

is accurate…

cuda2010 · February 22, 2010, 3:24am

I can confirm this problem, the result of datatype=reduce_float is always failed if size is large.

Seems the float reduction in the SDK sample is bugous or the threshold of 1e-8*size is set too low.

Here is my test result on an untouched SDK reduction of SDK 2.3:

[font=“Courier New”]reduction.exe -n=4194304 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

4194304 elements

128 threads (max)

64 blocks

Average time: 0.188263 ms

Bandwidth: 89.115625 GB/s

GPU result = 16314.394531250000

CPU result = 16314.394531250000

TEST PASSED

reduction.exe -n=8388608 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

8388608 elements

128 threads (max)

64 blocks

Average time: 0.353715 ms

Bandwidth: 94.862807 GB/s

GPU result = 32639.140625000000

CPU result = 32639.140625000000

TEST PASSED

reduction.exe -n=16777216 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

16777216 elements

128 threads (max)

64 blocks

Average time: 0.686262 ms

Bandwidth: 97.788991 GB/s

GPU result = 65281.000000000000

CPU result = 65281.992187500000

TEST FAILED

reduction.exe -n=33554432 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

33554432 elements

128 threads (max)

64 blocks

Average time: 1.359144 ms

Bandwidth: 98.751637 GB/s

GPU result = 130561.000000000000

CPU result = 130563.984375000000

TEST FAILED[/font]

eyalhir74 · February 23, 2010, 8:29pm

I can confirm this problem, the result of datatype=reduce_float is always failed if size is large.

Seems the float reduction in the SDK sample is bugous or the threshold of 1e-8*size is set too low.

Here is my test result on an untouched SDK reduction of SDK 2.3:

[font=“Courier New”]reduction.exe -n=4194304 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

4194304 elements

128 threads (max)

64 blocks

Average time: 0.188263 ms

Bandwidth: 89.115625 GB/s

GPU result = 16314.394531250000

CPU result = 16314.394531250000

TEST PASSED

reduction.exe -n=8388608 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

8388608 elements

128 threads (max)

64 blocks

Average time: 0.353715 ms

Bandwidth: 94.862807 GB/s

GPU result = 32639.140625000000

CPU result = 32639.140625000000

TEST PASSED

reduction.exe -n=16777216 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

16777216 elements

128 threads (max)

64 blocks

Average time: 0.686262 ms

Bandwidth: 97.788991 GB/s

GPU result = 65281.000000000000

CPU result = 65281.992187500000

TEST FAILED

reduction.exe -n=33554432 -type=float

Reducing array of type float.

Using Device 0: “GeForce GTX 295”

33554432 elements

128 threads (max)

64 blocks

Average time: 1.359144 ms

Bandwidth: 98.751637 GB/s

GPU result = 130561.000000000000

CPU result = 130563.984375000000

TEST FAILED[/font]

Thanks – anyone can explain why the SDK sample fails??

thanks

eyal

Topic		Replies	Views
float reduction, cpu and cuda answers differ CUDA Programming and Performance	4	3332	April 1, 2008
Basic reduction with CUDA CUDA Programming and Performance	1	517	March 22, 2018
Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! CUDA Programming and Performance	16	10499	December 8, 2009
Reduction from SDK CUDA Programming and Performance	2	11609	March 12, 2009
Sample reduction kernel 6 fails for threads=1024 maxblocks=32 n=33554432 CUDA Programming and Performance	0	448	November 25, 2018
Result of reduction in GPU do not match with the CPU's, also GPU's result vary with blocksize Legacy PGI Compilers	4	896	June 23, 2020
Understanding and adjusting Mark Harris's array reduction CUDA Programming and Performance	11	4444	August 26, 2018
Reduction kernel issues with data larger than 16M floats CUDA Programming and Performance	5	971	June 6, 2017
Reduction random errors Reduction kernel turns weird values CUDA Programming and Performance	2	774	February 7, 2011
My simple but speedy reduction code (runs 106.4GB/s on GTX 295) 106.4/111.9=95.1% to the peak bandwi CUDA Programming and Performance	32	28252	August 15, 2010

Reduction sample in SDK

Related topics