simpleMultiGPU sample

just what is this supposed to be doing? Why are CPU sums and GPU sums not the same?
What are we summing up there?
Even if I use only device0 I don’t understand what that summation is.


The example is going nothing more than an array reduction or summation. The reason why the results are different is the on the CPU, values are added together in a sequential manner. On the GPU, they are added together in a pairwair manner. As you can imagine this would produce different result on floating points.

If it helps, change the problem to use ints. Then change the initialize, at line 129 to equal 1 for all value. Then when you do your summation, the total should equal the size of your array (line 43).

A few links for reference

is it possible gpuBase serves no purpose?

I like this example. Ensure you include whatever compute capability you need if you use older cards. Like 3.7 for a K80 (or two or three or …)

Yes. It seems that gpuBase is dead code.

It’s still in there in CUDA 11. Doesn’t NVidia scan the forums? We have a bug report (from me), an employee confirmed it’s dead code, and nobody removes it from the samples?