If the failure of the reduction sample invoked as
./reduction threads=1024 maxblocks=32 n=33554432 kernel=6
is to be expected, it would help if the sample exited earlier without even trying to perform the reduction. Otherwise, what’s the bug?
$ ./reduction threads=1024 maxblocks=32 n=33554432 kernel=6 ./reduction Starting... GPU Device 0: "Quadro P5000" with compute capability 6.1 Using Device 0: Quadro P5000 Reducing array of type int 33554432 elements 1024 threads (max) 32 blocks Reduction, Throughput = 172.9497 GB/s, Time = 0.00078 s, Size = 33554432 Elements, NumDevsUsed = 1, Workgroup = 1024 GPU result = -9312 CPU result = -16317892 Test failed!